Digital audioDigital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form.
Advanced Audio CodingAdvanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate. AAC has been standardized by ISO and IEC as part of the MPEG-2 and MPEG-4 specifications. Part of AAC, HE-AAC ("AAC+"), is part of MPEG-4 Audio and is adopted into digital radio standards DAB+ and Digital Radio Mondiale, and mobile television standards DVB-H and ATSC-M/H.
Sound recording and reproductionSound recording and reproduction is the electrical, mechanical, electronic, or digital inscription and re-creation of sound waves, such as spoken voice, singing, instrumental music, or sound effects. The two main classes of sound recording technology are analog recording and digital recording. Sound recording is the transcription of invisible vibrations in air onto a storage medium such as a phonograph disc. The process is reversed in sound reproduction, and the variations stored on the medium are transformed back into sound waves.
PsychoacousticsPsychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how human auditory system perceives various sounds. More specifically, it is the branch of science studying the psychological responses associated with sound (including noise, speech, and music). Psychoacoustics is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.
Compact Disc Digital AudioCompact Disc Digital Audio (CDDA or CD-DA), also known as Digital Audio Compact Disc or simply as Audio CD, is the standard format for audio compact discs. The standard is defined in the Red Book, one of a series of Rainbow Books (named for their binding colors) that contain the technical specifications for all CD formats. The first commercially available audio CD player, the Sony CDP-101, was released in October 1982 in Japan. The format gained worldwide acceptance in 1983–84, selling more than a million CD players in those two years, to play 22.
FrequencyFrequency (symbol f) is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as temporal frequency for clarity and to distinguish it from spatial frequency. Frequency is measured in hertz (symbol Hz) which is equal to one event per second. Ordinary frequency is related to angular frequency (symbol ω, in radians per second) by a scaling factor of 2π. The period (symbol T) is the interval of time between events, so the period is the reciprocal of the frequency, f=1/T.
Least-squares spectral analysisLeast-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum based on a least-squares fit of sinusoids to data samples, similar to Fourier analysis. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in the long and gapped records; LSSA mitigates such problems. Unlike in Fourier analysis, data need not be equally spaced to use LSSA.
WAVWaveform Audio File Format (WAVE, or WAV due to its ; pronounced "wave" or "wæv" ) is an standard, developed by IBM and Microsoft, for storing an audio bitstream on personal computers. It is the main format used on Microsoft Windows systems for uncompressed audio. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format. WAV is an application of the (RIFF) bitstream format method for storing data in chunks, and thus is similar to the 8SVX and the (AIFF) format used on Amiga and Macintosh computers, respectively.
Spectral efficiencySpectral efficiency, spectrum efficiency or bandwidth efficiency refers to the information rate that can be transmitted over a given bandwidth in a specific communication system. It is a measure of how efficiently a limited frequency spectrum is utilized by the physical layer protocol, and sometimes by the medium access control (the channel access protocol). The link spectral efficiency of a digital communication system is measured in bit/s/Hz, or, less frequently but unambiguously, in (bit/s)/Hz.
Spatial frequencyIn mathematics, physics, and engineering, spatial frequency is a characteristic of any structure that is periodic across position in space. The spatial frequency is a measure of how often sinusoidal components (as determined by the Fourier transform) of the structure repeat per unit of distance. The SI unit of spatial frequency is cycles per meter (m). In applications, spatial frequency is often expressed in units of cycles per millimeter (mm) or equivalently per mm. In wave propagation, the spatial frequency is also known as wavenumber.
CodecA codec is a device or computer program that encodes or decodes a data stream or signal. Codec is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or data stream, and hence is a type of codec. Endec is a portmanteau of encoder/decoder. A coder or encoder encodes a data stream or a signal for transmission or storage, possibly in encrypted form, and the decoder function reverses the encoding for playback or editing.
Auditory maskingIn audio signal processing, auditory masking occurs when the perception of one sound is affected by the presence of another sound. Auditory masking in the frequency domain is known as simultaneous masking, frequency masking or spectral masking. Auditory masking in the time domain is known as temporal masking or non-simultaneous masking. The unmasked threshold is the quietest level of the signal which can be perceived without a masking signal present. The masked threshold is the quietest level of the signal perceived when combined with a specific masking noise.
Communication channelA communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for information transfer of, for example, a digital bit stream, from one or several senders to one or several receivers. A channel has a certain capacity for transmitting information, often measured by its bandwidth in Hz or its data rate in bits per second.
Audio editing softwareAudio editing software is any software or computer program which allows editing and generating of audio data. Audio editing software can be implemented completely or partly as a library, as a computer application, as a web application, or as a loadable kernel module. Wave editors are digital audio editors. There are many sources of software available to perform this function. Most can edit music, apply effects and filters, and adjust stereo channels.
Speech codingSpeech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP).
Radio waveRadio waves are a type of electromagnetic radiation with the longest wavelengths in the electromagnetic spectrum, typically with frequencies of 300 gigahertz (GHz) and below. At 300 GHz, the corresponding wavelength is 1mm, which is shorter than the diameter of a grain of rice. At 30 Hz the corresponding wavelength is ~, which is longer than the radius of the Earth. Wavelength of a radio wave is inversely proportional to its frequency, because its velocity is constant.
Data compressionIn information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.
Discrete wavelet transformIn numerical analysis and functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information (location in time). Haar wavelet The first DWT was invented by Hungarian mathematician Alfréd Haar. For an input represented by a list of numbers, the Haar wavelet transform may be considered to pair up input values, storing the difference and passing the sum.
Audio signalAn audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals, or a series of binary numbers for digital signals. Audio signals have frequencies in the audio frequency range of roughly 20 to 20,000 Hz, which corresponds to the lower and upper limits of human hearing. Audio signals may be synthesized directly, or may originate at a transducer such as a microphone, musical instrument pickup, phonograph cartridge, or tape head.
Frequency-shift keyingFrequency-shift keying (FSK) is a frequency modulation scheme in which digital information is encoded on a carrier signal by periodically shifting the frequency of the carrier between several discrete frequencies. The technology is used for communication systems such as telemetry, weather balloon radiosondes, caller ID, garage door openers, and low frequency radio transmission in the VLF and ELF bands. The simplest FSK is binary FSK (BFSK), in which the carrier is shifted between two discrete frequencies to transmit binary (0s and 1s) information.