Publication

On the Effect of Word Order on Cross-lingual Sentiment Analysis

Related concepts (23)

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels.

Types of artificial neural networks

There are many types of artificial neural networks (ANN). Artificial neural networks are computational models inspired by biological neural networks, and are used to approximate functions that are generally unknown. Particularly, they are inspired by the behaviour of neurons and the electrical signals they convey between input (such as from the eyes or nerve endings in the hand), processing, and output from the brain (such as reacting to light, touch, or heat). The way neurons semantically communicate is an area of ongoing research.

Word order

In linguistics, word order (also known as linear order) is the order of the syntactic constituents of a language. Word order typology studies it from a cross-linguistic perspective, and examines how different languages employ different orders. Correlations between orders found in different syntactic sub-domains are also of interest. The primary word orders that are of interest are the constituent order of a clause, namely the relative order of subject, object, and verb; the order of modifiers (adjectives, numerals, demonstratives, possessives, and adjuncts) in a noun phrase; the order of adverbials.

Recurrent neural network

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

V2 word order

In syntax, verb-second (V2) word order is a sentence structure in which the finite verb of a sentence or a clause is placed in the clause's second position, so that the verb is preceded by a single word or group of words (a single constituent). Examples of V2 in English include (brackets indicating a single constituent): "Neither do I", "[Never in my life] have I seen such things" If English used V2 in all situations, then it would feature such sentences like: "[In school] learned I about animals", "[When she comes home from work] takes she a nap" V2 word order is common in the Germanic languages and is also found in Northeast Caucasian Ingush, Uto-Aztecan O'odham, and fragmentarily in Romance Sursilvan (a Rhaeto-Romansh variety) and Finno-Ugric Estonian.

Word embedding

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

Object–subject–verb word order

In linguistic typology, object–subject–verb (OSV) or object–agent–verb (OAV) is a classification of languages, based on whether the structure predominates in pragmatically neutral expressions. An example of this would be "Oranges Sam ate." OSV is rarely used in unmarked sentences, which use a normal word order without emphasis. Most languages that use OSV as their default word order come from the Amazon basin, such as Xavante, Jamamadi, Apurinã, Warao, Kayabí and Nadëb.

Verb–subject–object word order

In linguistic typology, a verb–subject–object (VSO) language has its most typical sentences arrange their elements in that order, as in Ate Sam oranges (Sam ate oranges). VSO is the third-most common word order among the world's languages, after SOV (as in Hindi and Japanese) and SVO (as in English and Mandarin Chinese).

Feedforward neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow.

Object–verb–subject word order

In linguistic typology, object–verb–subject (OVS) or object–verb–agent (OVA) is a rare permutation of word order. OVS denotes the sequence object–verb–subject in unmarked expressions: Oranges ate Sam, Thorns have roses. The passive voice in English may appear to be in the OVS order, but that is not an accurate description. In an active voice sentence like Sam ate the oranges, the grammatical subject, Sam, is the agent and is acting on the patient, the oranges, which are the object of the verb, ate.

Artificial neural network

Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.

Subject–verb–object word order

In linguistic typology, subject–verb–object (SVO) is a sentence structure where the subject comes first, the verb second, and the object third. Languages may be classified according to the dominant sequence of these elements in unmarked sentences (i.e., sentences in which an unusual word order is not used for emphasis). English is included in this group. An example is "Sam ate yogurt." SVO is the second-most common order by number of known languages, after SOV. Together, SVO and SOV account for more than 87% of the world's languages.

Verb–object–subject word order

In linguistic typology, a verb–object–subject or verb–object–agent language, which is commonly abbreviated VOS or VOA, is one in which most sentences arrange their elements in that order. That would be the equivalent in English to "Drank cocktail Sam." The relatively rare default word order accounts for only 3% of the world's languages. It is the fourth-most common default word order among the world's languages out of the six.

Language model

A language model is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora in one or multiple languages it was trained on. Large language models, as their most advanced form, are a combination of feedforward neural networks and transformers. They have superseded recurrent neural network-based models, which had previously superseded the pure statistical models, such as word n-gram language model.

SemEval

SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive. This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning.

Residual neural network

A Residual Neural Network (a.k.a. Residual Network, ResNet) is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper.

Catalan language

Catalan (ˈkætələn,_-æn,_ˌkætəˈlæn; autonym: català, kətəˈla), known in the Valencian Community and Carche as Valencian (autonym: valencià), is a Western Romance language. It is the official language of Andorra, and an official language of two autonomous communities in eastern Spain: Catalonia and the Balearic Islands. It is also an official language in Valencia, where it is called Valencian.

Spanish language

Spanish (español or idioma español) or Castilian (castellano) is a Romance language of the Indo-European language family that evolved from colloquial Latin spoken on the Iberian Peninsula of Europe. Today, it is a global language with about 474.7 million native speakers, mainly in the Americas and Spain. Spanish is the official language of 20 countries. It is the world's second-most spoken native language after Mandarin Chinese; the world's fourth most spoken language overall after English, Mandarin Chinese, and Hindustani (Hindi-Urdu); and the world's most widely spoken Romance language.

Semantic similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.

Word-sense disambiguation

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.