From Fedora Project Wiki
This Is out of Date
The most recent revisions of the Musicians' Guide are now available from the git repository.

Everybody has a vague idea of what sound cards are, how they work, and what they do. Sometimes, especially when doing professional-quality audio work, a vague understanding is no longer sufficient. This chapter introduces the technical vocabulary used when discussing computer audio hardware.

What Sound Cards Are

Broadly defined, a sound card is any computer-connected device which allows the computer to process audio in some way. There are two general categories into which most sound cards fit, described below.

Audio Interface

This is a hardware device that allows audio equipment to be connected to your computer, including microphones and speakers. Typically audio entering or leaving an audio interface from/to an external device requires conversion between digital and analogue formats. However, with the rise of external digital audio equipment, there are an increasing number of devices that connect digitally to an audio interface.

The conversion between analogue and digital signals is a prerequisite for computers to be able to process audio signals, so it is the primary function of audio interfaces. The real world creates sound with a limitless range of possibilities for pitch, volume, and duration. The digital nature of computers requires these limitless possibilities to be reduced to finite limits. The best digital/analogue converters are capable of using these limits in such a way that humans don't notice anything missing - much like the best computer monitors and graphics adapters are able to disguise the fact that only about half of the colours our eyes can see are display-able on computers. This problem is discussed further in the "Bit Rates and Sample Rates" section.

Audio interfaces also amplify signals for directly-connected analogue devices (like headphones). Some offer power for microphones, too (pre-amplification and/or phantom power).

MIDI Interface

MIDI stands for "Musical Instrument Digital Interface," and is commonly associated with low-quality imitations of acoustic instruments. This association is unfortunate, since high-quality audio is indeed possible with MIDI, and MIDI-driven devices have played a part in many mainstream and non-mainstream audio environments. Whereas audio signals specify the sounds themselves, MIDI signals contain instructions on how to make the sounds. It is a synthesizer's responsibility to follow these instructions, turning them into sounds. Going even further, the MIDI specification allows for the control of many audio-related devices, like mixers, sequencers, and Digital Audio Workstations. Although the signals used to control these devices (or software applications) do not directly cause the generation of music, they still follow the definition of "MIDI signals": instructions on how to make sounds.

Whereas audio interfaces allow the input and output of audio signals ("normal sound") from a computer, MIDI interfaces allow the input and output of MIDI signals. Some audio interfaces have MIDI capabilities built-in, and some MIDI interfaces also transform MIDI signals into audio signals. The latter kind of device is performing "MIDI synthesis," a task for which there exist many software-only solutions. "FluidSynth," covered in this section of the Musicians' Guide, is one such software solution.

Having a hardware-based MIDI interface is not a requirement for working with MIDI signals and applications. The costly nature of most MIDI hardware makes it impractical for occasional or beginning MIDI users and computer music enthusiasts. Much of the software in this Guide is capable of working with MIDI signals, and supports but does not require MIDI-capable hardware.

Methods of Connection

The following connection methods can be used by either audio or MIDI interfaces, so they are collectively referred to as "sound cards," in this section.

Integrated on Motherboard

These sound cards are built into the computer's motherboard. In recent years, the quality of audio produced by these sound cards has greatly increased, but the best integrated solutions are still not as good as the best non-integrated solutions. Good integrated sound cards should be good enough for most audio work; if you want a professional-sounding sound card, or especially if you want to connect high-quality input devices, then an additional sound card is recommended.

Hardware MIDI interfaces are rarely, if ever, integrated into the motherboard.

PCI (Internal)

Sound cards connected to the motherboard by PCI (or PCI-Express, etc.) will probably offer higher performance, and lower latencies, than USB- or FireWire-connected devices. Professional-quality sound cards often have insufficient space for connectors on the card itself, so they often include a proprietary, external component specifically for adding connectors. The biggest disadvantage of PCI-connected sound cards is that they cannot be used with notebooks or netbooks, and that they are only as portable as the computer in which they're installed.

USB

USB-connected sound cards are becoming more popular, especially with the increasing bandwidth possibilities of USB connections. The quality can be as good as internally-connected sound cards, although the USB connection may add additional latency, which may or may not be a concern. The biggest advantages of USB-connected sound cards is that they can be used with notebooks and netbooks, and that they are usually easier to transport than an entire desktop computer.

FireWire

FireWire-connected sound cards are not as popular as USB sound cards, but they tend to be of higher quality. In addition, unlike USB-connected sound cards, FireWire-connected sound cards are able to take advantage of FireWire's "guaranteed bandwidth" and "bus-mastering" capabilities. Having guaranteed bandwidth ensures that the sound card will be able to send data when it chooses; the sound card will not have to compete with other devices connected with the same connection type. Using bus-mastering enables the FireWire-connected device to read and write directly to and from the computer's main memory, without first going through the CPU. High-speed FireWire connections are also available on older computers where a USB 2.0 connection is not available.

How to Choose

The method of connection should not by itself determine which sound card is appropriate for you. Which connection type is right for you will depend on a wide range of factors, but the actual sound quality is significantly more important than the theoretical advantages or disadvantages of the connection type. If possible, you should try out potential devices with your computer before you buy one.

Bit Rates and Sample Rates

As mentioned in the "Audio Interface," section, the primary job of audio interfaces is to carry out the transformation of audio signals between digital and analogue forms. This diagram from Wikipedia illustrates the "digital problem," when it comes to audio: here. Although the wave-shape of the analogue signal, which is what is produced by most acoustic instruments and by the human voice, is shown in red, computers cannot store that information. Instead, they usually store some approximation, which is represented in that diagram by the gray, shaded area. Note that the diagram is simply an example, and not meant to depict a particular real-world recording.

It is the conversion between digital and analogue signals that distinguishes low- and high-quality audio interfaces. High-quality convertors will be able to record and reproduce a signal that is nearly identical to the original. Bit and sample rates are tied to the closeness of approximation that an audio interface can make, and they are explained below. There are other factors involved in overall sound quality.

Bit Rate

This is the number of bits used to describe the audio in a length of time. The higher the number of bits, the greater the detail that will be stored. For most uses, the bit-rate is usually measured in "bits per second," as in the often-used 128kb/s bit-rate for MP3 audio. Professional audio is more often referred to as "bits per sample," which is usually simply called "bits." CDs have a 16bit/sample bit-rate, professional audio is usually recorded at a 24bit/sample bit-rate, and a 32bit/sample bit-rate is supported by some hardware and software, but not widely used. Due to technical limitations, 20-bit audio is also widely used. See Wikipedia for more information (get a link??)

Sample Rate

A sample is a collection of a number of bits, representing a sound at an instantaneous point in time. The number of bits contained in a sample is determined by the bit-rate (usually 16 or 24 bits per sample). The sample rate is a measure of how many samples occupy one second - that is, how many "instants" of sound are catalogued for each second. Theoretically, a higher sample rate results in a higher-quality audio signal. The sample rate is measured in Hertz, which means "samples per second." CDs have a 44,100 Hz sample rate, but audio is often recorded at 48,000 Hz, 96,000 Hz, or even 192,000 Hz. These are often indicated as 44.1 kHz, 48 kHz, 96 kHz, and 192 kHz, respectively.

Conclusions

Both of these factors have an impact on potential sound quality. Depending on the limitations and capabilities of your equipment, you may be more inclined to use particular settings than others. Here are some comparisons:

  • 16-bit bit rate, and 44.1 kHz sample rate (CD audio; good for wide distribution and maximum compatibility; 705.6 kb/s)
  • 24-bit bit rate, and 96 kHz sample rate (CDs are usually recorded at these rates, then "down-mixed" later; 2304 kb/s)
  • 24-bit bit rate, and 192 kHz sample rate (DVD Audio; not widely compatible; 4608 kb/s)
  • 1-bit bit rate, and 2822.4 kHz sample rate (Super Audio CD; not widely compatible; 2822.4 kb/s)

In the end, bit rate and sample rate are only part of what determines overall sound quality. Moreover, sound quality is subjective, and you will need to experiment to find the equipment and rates that work best for what you do.

Audio Vocabulary

These terms are used in many different audio contexts. Understanding them is important to knowing how to operate audio equipment in general, whether computer-based or not.

MIDI Sequencer

A sequencer is a device or software program that produces signals that a synthesizer turns into sound. You can also use a sequencer to arrange MIDI signals into music. The Musicians' Guide covers two digital audio workstations (DAWs) that are primarily MIDI sequencers, Qtractor and Rosegarden. All three DAWs in this guide use MIDI signals to control other devices or effects.

Busses, Master Bus, and Sub-master Bus

How audio busses work. The relationship between the master bus and sub-master busses.

An audio bus sends audio signals from one place to another. Many different signals can be inputted to a bus simultaneously, and many different devices or applications can read from a bus simultaneously. Signals inputted to a bus are mixed together, and cannot be separated after entering a bus. All devices or applications reading from a bus receive the same signal.

All audio routed out of a program passes through the master bus. The master bus combines all audio tracks, allowing for final level adjustments and simpler mastering. The primary purpose of the master bus is to mix all of the tracks into two channels.

A sub-master bus combines audio signals before they reach the master bus. Using a sub-master bus is optional. They allow you to adjust more than one track in the same way, without affecting all the tracks.

Audio busses are also used to send audio into effects processors.

Level (Volume/Loudness)

The perceived volume or loudness of sound is a complex phenomenon, not entirely understood by experts. One widely-agreed method of assessing loudness is by measuring the sound pressure level (SPL), which is measured in decibels (dB) or bels (B, equal to ten decibels). In audio production communities, this is called "level." The level of an audio signal is one way of measuring the signal's perceived loudness. The level is part of the information stored in an audio file.

There are many different ways to monitor and adjust the level of an audio signal, and there is no widely-agreed practice. One reason for this situation is the technical limitations of recorded audio. Most level meters are designed so that the average level is -6 dB on the meter, and the maximum level is 0 dB. This practice was developed for analog audio. We recommend using an external meter and the "K-system," described in a link below. The K-system for level metering was developed for digital audio.

In the Musicians' Guide, this term is called "volume level," to avoid confusion with other levels, or with perceived volume or loudness.

For more information, refer to these web pages:

Panning and Balance

The difference between adjusting panning and adjusting balance.
The difference between adjusting panning and adjusting balance.

Panning adjusts the portion of a channel's signal that is sent to each output channel. In a stereophonic (two-channel) setup, the two channels represent the "left" and the "right" speakers. Two channels of recorded audio are available in the DAW, and the default setup sends all of the "left" recorded channel to the "left" output channel, and all of the "right" recorded channel to the "right" output channel. Panning sends some of the left recorded channel's level to the right output channel, or some of the right recorded channel's level to the left output channel. Each recorded channel has a constant total output level, which is divided between the two output channels.

The default setup for a left recorded channel is for "full left" panning, meaning that 100% of the output level is output to the left output channel. An audio engineer might adjust this so that 80% of the recorded channel's level is output to the left output channel, and 20% of the level is output to the right output channel. An audio engineer might make the left recorded channel sound like it is in front of the listener by setting the panner to "center," meaning that 50% of the output level is output to both the left and right output channels.

Balance is sometimes confused with panning, even on commercially-available audio equipment. Adjusting the balance changes the volume level of the output channels, without redirecting the recorded signal. The default setting for balance is "center," meaning 0% change to the volume level. As you adjust the dial from "center" toward the "full left" setting, the volume level of the right output channel is decreased, and the volume level of the left output channel remains constant. As you adjust the dial from "center" toward the "full right" setting, the volume level of the left output channel is decreased, and the volume level of the right output channel remains constant. If you set the dial to "20% left," the audio equipment would reduce the volume level of the right output channel by 20%, increasing the perceived loudness of the left output channel by approximately 20%.

You should adjust the balance so that you perceive both speakers as equally loud. Balance compensates for poorly set up listening environments, where the speakers are not equal distances from the listener. If the left speaker is closer to you than the right speaker, you can adjust the balance to the right, which decreases the volume level of the left speaker. This is not an ideal solution, but sometimes it is impossible or impractical to set up your speakers correctly. You should adjust the balance only at final playback.

Time, Timeline and Time-Shifting

There are many ways to measure musical time. The four most popular time scales for digital audio are:

  • Bars and Beats: Usually used for MIDI work, and called "BBT," meaning "Bars, Beats, and Ticks." A tick is a partial beat.
  • Minutes and Seconds: Usually used for audio work.
  • SMPTE Timecode: Invented for high-precision coordination of audio and video, but can be used with audio alone.
  • Samples: Relating directly to the format of the underlying audio file, a sample is the shortest possible length of time in an audio file. See this section for more information on samples.

Most audio software, particularly digital audio workstations (DAWs), allow the user to choose which scale they prefer. DAWs use a timeline to display the progression of time in a session, allowing you to do time-shifting; that is, adjust the time in the timeline when a region starts to be played.

Time is represented horizontally, where the leftmost point is the beginning of the session (zero, regardless of the unit of measurement), and the rightmost point is some distance after the end of the session.

Synchronization

Synchronization is synchronizing the operation of multiple tools, frequently the movement of the transport. Synchronization also controls automation across applications and devices. MIDI signals are usually used for synchronization.

Routing and Multiplexing

Illustration of routing and multiplexing in the "Connections" window of the QjackCtl interface.
Illustration of routing and multiplexing in the "Connections" window of the QjackCtl interface.

Routing audio transmits a signal from one place to another - between applications, between parts of applications, or between devices. On Linux systems, the JACK Audio Connection Kit is used for audio routing. JACK-aware applications (and PulseAudio ones, if so configured) provide inputs and outputs to the JACK server, depending on their configuration. The QjackCtl application can adjust the default connections. You can easily reroute the output of a program like FluidSynth so that it can be recorded by Ardour, for example, by using QjackCtl.

Multiplexing allows you to connect multiple devices and applications to a single input or output. QjackCtl allows you to easily perform multiplexing. This may not seem important, but remember that only one connection is possible with a physical device like an audio interface. Before computers were used for music production, multiplexing required physical devices to split or combine the signals.

Multichannel Audio

An audio channel is a single path of audio data. Multichannel audio is any audio which uses more than one channel simultaneously, allowing the transmission of more audio data than single-channel audio.

Audio was originally recorded with only one channel, producing "monophonic," or "mono" recordings. Beginning in the 1950s, stereophonic recordings, with two independent channels, began replacing monophonic recordings. Since humans have two independent ears, it makes sense to record and reproduce audio with two independent channels, involving two speakers. Most sound recordings available today are stereophonic, and people have found this mostly satisfying.

There is a growing trend toward five- and seven-channel audio, driven primarily by "surround-sound" movies, and not widely available for music. Two "surround-sound" formats exist for music: DVD Audio (DVD-A) and Super Audio CD (SACD). The development of these formats, and the devices to use them, is held back by the proliferation of headphones with personal MP3 players, a general lack of desire for improvement in audio quality amongst consumers, and the copy-protection measures put in place by record labels. The result is that, while some consumers are willing to pay higher prices for DVD-A or SACD recordings, only a small number of recordings are available. Even if you buy a DVD-A or SACD-capable player, you would need to replace all of your audio equipment with models that support proprietary copy-protection software. Without this equipment, the player is often forbidden from outputting audio with a higher sample rate or sample format than a conventional audio CD. None of these factors, unfortunately, seem like they will change in the near future.