Named-entity recognitionNamed-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp.
Entity linkingIn natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris".
Text miningText mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al.
Cultural heritageCultural heritage is the heritage of tangible and intangible heritage assets of a group or society that is inherited from past generations. Not all heritages of past generations are "heritage"; rather, heritage is a product of selection by society. Cultural heritage includes tangible culture (such as buildings, monuments, landscapes, archive materials, books, works of art, and artifacts), intangible culture (such as folklore, traditions, language, and knowledge), and natural heritage (including culturally significant landscapes, and biodiversity).
Deep learningDeep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
Historical documentHistorical documents are original documents that contain important historical information about a person, place, or event and can thus serve as primary sources as important ingredients of the historical methodology. Significant historical documents can be deeds, laws, accounts of battles (often given by the victors or persons sharing their viewpoint), or the exploits of the powerful. Though these documents are of historical interest, they do not detail the daily lives of ordinary people, or the way society functioned.
Historical fictionHistorical fiction is a literary genre in which the plot takes place in a setting related to the past events, but is fictional. Although the term is commonly used as a synonym for historical fiction literature, it can also be applied to other types of narrative, including theatre, opera, cinema, and television, as well as video games and graphic novels. An essential element of historical fiction is that it is set in the past and pays attention to the manners, social conditions and other details of the depicted period.
Cultural heritage managementCultural heritage management (CHM) is the vocation and practice of managing cultural heritage. It is a branch of cultural resources management (CRM), although it also draws on the practices of cultural conservation, restoration, museology, archaeology, history and architecture. While the term cultural heritage is generally used in Europe, in the US the term cultural resources is in more general use specifically referring to cultural heritage resources.
Historical criticismHistorical criticism, also known as the historical-critical method or higher criticism, is a branch of criticism that investigates the origins of ancient texts in order to understand "the world behind the text". While often discussed in terms of Jewish and Christian writings from ancient times, historical criticism has also been applied to other religious and secular writings from various parts of the world and periods of history.
Transformer (machine learning model)A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.
Feature learningIn machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.
Deep reinforcement learningDeep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.
Heritage tourismCultural heritage tourism (or just heritage tourism) is a branch of tourism oriented towards the cultural heritage of the location where tourism is occurring. The National Trust for Historic Preservation in the United States defines heritage tourism as "traveling to experience the places, artifacts and activities that authentically represent the stories and people of the past", and "heritage tourism can include cultural, historic and natural resources".
Conservation and restoration of cultural propertyThe conservation and restoration of cultural property focuses on protection and care of cultural property (tangible cultural heritage), including artworks, architecture, archaeology, and museum collections. Conservation activities include preventive conservation, examination, documentation, research, treatment, and education. This field is closely allied with conservation science, curators and registrars.
Natural language processingNatural language processing (NLP) is an interdisciplinary subfield of linguistics and computer science. It is primarily concerned with processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.
HistoryHistory (derived ) is the systematic study and documentation of the human past. The period of events before the invention of writing systems is considered prehistory. "History" is an umbrella term comprising past events as well as the memory, discovery, collection, organization, presentation, and interpretation of these events. Historians seek knowledge of the past using historical sources such as written documents, oral accounts, art and material artifacts, and ecological markers.
Data miningData mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information (with intelligent methods) from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.
Concept miningConcept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents.
Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Textual criticismTextual criticism is a branch of textual scholarship, philology, and literary criticism that is concerned with the identification of textual variants, or different versions, of either manuscripts (mss) or of printed books. Such texts may range in dates from the earliest writing in cuneiform, impressed on clay, for example, to multiple unpublished versions of a 21st-century author's work. Historically, scribes who were paid to copy documents may have been literate, but many were simply copyists, mimicking the shapes of letters without necessarily understanding what they meant.