Entropy (information theory)In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to : where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannons"), while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys".
Rényi entropyIn information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions. The Rényi entropy is important in ecology and statistics as index of diversity.
File sharingFile sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include removable media, centralized servers on computer networks, Internet-based hyperlinked documents, and the use of distributed peer-to-peer networking. File sharing technologies, such as BitTorrent, are integral to modern media piracy, as well as the sharing of scientific data and other free content.
Mutual informationIn probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as shannons (bits), nats or hartleys) obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.
Conditional entropyIn information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable given that the value of another random variable is known. Here, information is measured in shannons, nats, or hartleys. The entropy of conditioned on is written as . The conditional entropy of given is defined as where and denote the support sets of and . Note: Here, the convention is that the expression should be treated as being equal to zero. This is because .
Cross-entropyIn information theory, the cross-entropy between two probability distributions and over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution . The cross-entropy of the distribution relative to a distribution over a given set is defined as follows: where is the expected value operator with respect to the distribution .
Distributed computingA distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. Distributed computing is a field of computer science that studies distributed systems. The components of a distributed system interact with one another in order to achieve a common goal. Three significant challenges of distributed systems are: maintaining concurrency of components, overcoming the lack of a global clock, and managing the independent failure of components.
Peer-to-peer file sharingPeer-to-peer file sharing is the distribution and sharing of digital media using peer-to-peer (P2P) networking technology. P2P file sharing allows users to access media files such as books, music, movies, and games using a P2P software program that searches for other connected computers on a P2P network to locate the desired content. The nodes (peers) of such networks are end-user computers and distribution servers (not required).
Schema matchingThe terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: schema matching is the process of identifying that two objects are semantically related (scope of this article) while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student (Name, SSN, Level, Major, Marks) and DB2.Grad-Student (Name, ID, Major, Grades); possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.
Peer-to-peerPeer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. This forms a peer-to-peer network of nodes. Peers make a portion of their resources, such as processing power, disk storage or network bandwidth, directly available to other network participants, without the need for central coordination by servers or stable hosts.
Commons-based peer productionCommons-based peer production (CBPP) is a term coined by Harvard Law School professor Yochai Benkler. It describes a model of socio-economic production in which large numbers of people work cooperatively; usually over the Internet. Commons-based projects generally have less rigid hierarchical structures than those under more traditional business models. One of the major characteristics of the commons-based peer production is its non-profit scope. Often—but not always—commons-based projects are designed without a need for financial compensation for contributors.
Binary entropy functionIn information theory, the binary entropy function, denoted or , is defined as the entropy of a Bernoulli process with probability of one of two values. It is a special case of , the entropy function. Mathematically, the Bernoulli trial is modelled as a random variable that can take on only two values: 0 and 1, which are mutually exclusive and exhaustive. If , then and the entropy of (in shannons) is given by where is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2.
Principle of maximum entropyThe principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition that expresses testable information). Another way of stating this: Take precisely stated prior data or testable information about a probability distribution function. Consider the set of all trial probability distributions that would encode the prior data.
Database schemaThe database schema is the structure of a database described in a formal language supported typically by a relational database management system (RDBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divided into database tables in the case of relational databases). The formal definition of a database schema is a set of formulas (sentences) called integrity constraints imposed on a database. These integrity constraints ensure compatibility between parts of the schema.
Image sharingImage sharing, or photo sharing, is the publishing or transfer of digital photos online. Image sharing websites offer services such as uploading, hosting, managing and sharing of photos (publicly or privately). This function is provided through both websites and applications that facilitate the upload and display of images. The term can also be loosely applied to the use of online photo galleries that are set up and managed by individual users, including photoblogs.
Comparison of file-sharing applicationsis a method of distributing electronically stored information such as computer programs and digital media. Below is a list of file sharing applications, most of them make use of technologies. This comparison contains also download managers that can be used as file sharing applications. For pure download managers see the comparison of download managers, and for BitTorrent-only clients the comparison of BitTorrent clients. Note that several applications had adware or spyware tied in during the past and may have it again in the future.
Virtual communityA virtual community is a social network of individuals who connect through specific social media, potentially crossing geographical and political boundaries in order to pursue mutual interests or goals. Some of the most pervasive virtual communities are online communities operating under social networking services. Howard Rheingold discussed virtual communities in his book, The Virtual Community, published in 1993. The book's discussion ranges from Rheingold's adventures on The WELL, computer-mediated communication, social groups and information science.
ScalabilityScalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a package delivery system is scalable because more packages can be delivered by adding more delivery vehicles.
Differential entropyDifferential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average (surprisal) of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP).
Social peer-to-peer processesSocial peer-to-peer processes are interactions with a peer-to-peer dynamic. These peers can be humans or computers. Peer-to-peer (P2P) is a term that originated from the popular concept of the P2P distributed computer application architecture which partitions tasks or workloads between peers. This application structure was popularized by systems like Napster, the first of its kind in the late 1990s. The concept has inspired new structures and philosophies in many areas of human interaction.