Next Previous Contents

6. Overview of file formats.

Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats -- sometimes losing information, however.

File formats are a separate issue from device characteristics. There are two types of file formats: self-describing formats, where the device parameters and encoding are made explicit in some form of header, and headerless formats (sometimes called "raw"), where the device parameters and encoding are fixed.

6.1 Self-describing file formats.

Self-describing file formats generally define a family of data encodings, where a header field indicates the particular encoding variant used.

The header of self-describing formats contains the parameters of the sampling device and sometimes other information (e.g. a human-readable description of the sound, or a copyright notice). Most headers begin with a simple "magic word". (Some formats do not simply define a header format, but may contain chunks of data intermingled with chunks of encoding info.) The data encoding defines how the actual samples are stored in the file, e.g. signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Strictly speaking, channel interleaving is also part of the encoding, although so far I have seen little variation in this area.

Here's an overview of popular file formats.

 extension, name   origin          variable parameters (fixed; comments)

 .au or .snd       NeXT, Sun       rate, #channels, encoding, info string
 .aif(f), AIFF     Apple, SGI      rate, #channels, sample width, lots of info
 .aif(f), AIFC     Apple, SGI      same (extension of AIFF with compression)
 .iff, IFF/8SVX    Amiga           rate, #channels, instrument info (8 bits)
 .mp2, .mp3        MPEG standard   rate, #channels, sample quality
 .ra               Real Networks   rate, #channels, sample quality
 .sf               IRCAM           rate, #channels, encoding, info
 .smp              Turtle Beach    loops, cues, (16 bits/1 ch)
 .voc              Soundblaster    rate (8 bits/1 ch; can use silence deletion)
 .wav, WAVE        Microsoft       rate, #channels, sample width, lots of info
 .wve              Psion           (8 bits, 1 ch, a-law, 8khz)
 none, HCOM        Mac             rate (8 bits/1 ch; uses Huffman compression)
 none, MIME        Internet        (see below)
 none, NIST SPHERE DARPA speech community (see below)
 .mod or .nst      Amiga           (see below)

Note that the filename extension ".snd" is ambiguous: it can be either the self-describing NeXT format or the headerless Mac/PC format, or even a headerless Amiga format.

I know nothing for sure about the origin of HCOM files. The filenames usually don't have a ".hcom" extension, but this is what SOX (see the section File conversion) uses. The file format recognized by SOX includes a MacBinary header, where the file type field is "FSSD". The data fork begins with the magic word "HCOM" and contains Huffman compressed data; after decompression it it is 8 bits unsigned data.

IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc). Compression is optional (and extensible); volume is variable; author, notes and copyright properties; etc.

AIFF, AIFC and WAVE are similar in spirit but allow more freedom in encoding style (other than 8 bit/sample), amongst others.

There are other sound formats in use on Amiga by digitizers and music programs, such as IFF/SMUS.

An interesting "interchange format" for audio data is described in the proposed Internet Standard "MIME", which describes a family of transport encodings and structuring devices for electronic mail. This is an extensible format, and initially standardizes a type of audio data dubbed "audio/basic", which is 8-bit u-law data sampled at 8000 samples/sec.

The "IRCAM" sound file system has now been superseded by the so-called "BICSF" (for Berkeley/IRCAM/CARL Sound File system) software release.

More recently, there has been an effort at Princeton (Prof. Paul Lansky) and Stanford (Stephen Travis Pope) to standardize several extensions to BICSF. A description of BICSF and the Princeton/Stanford extensions is available by anonymous ftp at ftp://ftp.cwi.nl/pub/audio/BICSF-info. This file contains further ftp pointers to software.

6.2 Headerless file formats.

Headerless formats define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample).

 extension       origin          parameters
 or name

 .snd, .fssd     Mac, PC         variable rate, 1 channel, 8 bits unsigned
 .ul             US telephony    8 k, 1 channel, 8 bit "u-law" encoding
 .snd?           Amiga           variable rate, 1 channel, 8 bits signed

It is usually easy to distinguish 8-bit signed formats from unsigned by looking at the beginning of the data with 'od -b <file | head'; since most sounds start with a little bit of silence containing small amounts of background noise, the signed formats will have an abundance of bytes with values 0376, 0377, 0, 1, 2, while the unsigned formats will have 0176, 0177, 0200, 0201, 0202 instead. (Using "od -c" will also show any headers that are tacked in front of the file.)

The Apple IIgs records raw data in the same format as the Mac, but uses a 0 byte as a terminator; samples with value 0 are replaced by 1.


Next Previous Contents