Publication

Collaborative Filtering Under a Sybil Attack: Similarity Metrics do Matter!

Related concepts (23)

In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. Though, in more broad terms, a similarity function may also satisfy metric axioms.

String metric

In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching. A requirement for a string metric (e.g. in contrast to string matching) is fulfillment of the triangle inequality. For example, the strings "Sam" and "Samuel" can be considered to be close.

Recommender system

A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. Typically, the suggestions refer to various decision-making processes, such as what product to purchase, what music to listen to, or what online news to read. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

Video quality

Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation (typically, compared to the original video). Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impacts the user's perception of a system. For many stakeholders in video production and distribution, assurance of video quality is an important task. Video quality evaluation is performed to describe the quality of a set of video sequences under study.

Similarity learning

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given together with a measure of their similarity .

Collaborative filtering

Collaborative filtering (CF) is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.

Semantic similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.

Jaccard index

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is frequently referred to as the Critical Success Index in meteorology. It was later developed independently by Paul Jaccard, originally giving the French name coefficient de communauté, and independently formulated again by T. Tanimoto. Thus, the Tanimoto index or Tanimoto coefficient are also used in some fields.

Quality of service

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

Quality of experience

Quality of experience (QoE) is a measure of the delight or annoyance of a customer's experiences with a service (e.g., web browsing, phone call, TV broadcast). QoE focuses on the entire service experience; it is a holistic concept, similar to the field of user experience, but with its roots in telecommunication. QoE is an emerging multidisciplinary field based on social psychology, cognitive science, economics, and engineering science, focused on understanding overall human quality requirements.

Identity management

Identity management (IdM), also known as identity and access management (IAM or IdAM), is a framework of policies and technologies to ensure that the right users (that are part of the ecosystem connected to or within an enterprise) have the appropriate access to technology resources. IdM systems fall under the overarching umbrellas of IT security and data management. Identity and access management systems not only identify, authenticate, and control access for individuals who will be utilizing IT resources but also the hardware and applications employees need to access.

Subjective video quality

Subjective video quality is video quality as experienced by humans. It is concerned with how video is perceived by a viewer (also called "observer" or "subject") and designates their opinion on a particular video sequence. It is related to the field of Quality of Experience. Measuring subjective video quality is necessary because objective quality assessment algorithms such as PSNR have been shown to correlate poorly with subjective ratings. Subjective ratings may also be used as ground truth to develop new algorithms.

Digital identity

Digital identity refers to the information utilized by computer systems to represent external entities, including a person, organization, application, or device. When used to describe an individual, it encompasses a person's compiled information and plays a crucial role in automating access to computer-based services, verifying identity online, and enabling computers to mediate relationships between entities. Digital identity for individuals is an aspect of a person's social identity and can also be referred to as online identity.

Information privacy

Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data privacy or data protection. Data privacy is challenging since attempts to use data while protecting an individual's privacy preferences and personally identifiable information. The fields of computer security, data security, and information security all design and use software, hardware, and human resources to address this issue.

Deep packet inspection

Deep packet inspection (DPI) is a type of data processing that inspects in detail the data being sent over a computer network, and may take actions such as alerting, blocking, re-routing, or logging it accordingly. Deep packet inspection is often used for baselining application behavior, analyzing network usage, troubleshooting network performance, ensuring that data is in the correct format, checking for malicious code, eavesdropping, and internet censorship, among other purposes.

K-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership.

High dynamic range

High dynamic range (HDR) is a dynamic range higher than usual, synonyms are wide dynamic range, extended dynamic range, expanded dynamic range. The term is often used in discussing the dynamic range of various signals such as s, videos, audio or radio. It may apply to the means of recording, processing, and reproducing such signals including analog and digitized signals. The term is also the name of some of the technologies or techniques allowing to achieve high dynamic range images, videos, or audio.

Information privacy law

Information privacy, data privacy or data protection laws provide a legal framework on how to obtain, use and store data of natural persons. The various laws around the world describe the rights of natural persons to control who is using its data. This includes usually the right to get details on which data is stored, for what purpose and to request the deletion in case the purpose is not given anymore. Over 80 countries and independent territories, including nearly every country in Europe and many in Latin America and the Caribbean, Asia, and Africa, have now adopted comprehensive data protection laws.

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Identity theft

Identity theft or identity infringement occurs when someone uses another person's personal identifying information, like their name, identifying number, or credit card number, without their permission, to commit fraud or other crimes. The term identity theft was coined in 1964. Since that time, the definition of identity theft has been statutorily defined throughout both the U.K. and the U.S. as the theft of personally identifiable information.