Audio File Formats FAQ Chris Bagwell, chris@cnpbagwell.com v4.0, 14 Nov 1998 This document provides a description of how audio files are used on various computer platforms and provides a detailed description of their internal formats. ______________________________________________________________________ Table of Contents 1. Introduction. 2. Device characteristics. 3. Popular sampling rates. 4. Compression schemes. 4.1 ITU-T G.711, u-law and A-law. 4.2 CCITT G.721, G.723, and ITU-T G.726. 4.3 IMA/DVI ADPCM 4.4 Microsoft ADPCM 4.5 LPC-10E 4.6 CELP 4.7 GSM 06.10. 4.8 shorten. 4.9 Real Audio 4.10 MPEG 4.11 Misc. 5. Current hardware. 6. Overview of file formats. 6.1 Self-describing file formats. 6.2 Headerless file formats. 7. File conversions. 7.1 SOX (UNIX, PC, Amiga) 7.2 Sun Sparc. 7.3 NeXT. 7.4 SGI Indigo, Indigo2, Indy and Personal IRIS. 7.5 Amiga. 7.6 Apple Macintosh 8. Playing audio files on UNIX. 8.1 Sun Sparcstation running SunOS 4.x. 8.2 Solaris. 8.3 NeXT 8.4 SGI Indigo, Indigo2, Indy and Personal IRIS. 8.5 Linux 8.6 Others. 9. Playing audio files on the Vaxstation 4000 (VMS). 9.1 Without DECsound. 9.2 With DECsound (bundled with motif). 9.3 Audio port. 10. Playing audio files on a PC. 10.1 PC or compatible. 10.2 IBM PC and compatibles. 10.3 Atari. 10.4 Tandy. 10.5 Amiga. 10.6 Apple Macintosh. 11. File Formats. 11.1 AIFF Format (Audio IFF) and AIFC. 11.2 The NeXT/Sun audio file format. 11.3 IFF/8SVX Format. 11.4 US Federal Standard 1016 availability. 11.5 Creative Voice (VOC) file format. 11.6 RIFF WAVE (.WAV) file format. 11.7 u-law and A-law definitions. 11.8 AVR File Format. 11.9 The Amiga MOD Format. 11.10 The Sample Vision Format. 11.11 Tandy Deskmate .snd Format Notes. 11.11.1 DeskMate .snd File Format - Fixed Header. 11.11.2 DeskMate .snd File Format - Note Record. 11.11.3 New Tandy .Snd File Format. 11.12 Miscellaneous Formats. ______________________________________________________________________ 1. Introduction. This is version 4 of this FAQ. This FAQ was started in November 1991 under the name "The audio formats guide" by Guido van Rossum . I have taken over from his fine work in order to keep the information readily available to those interested in it. Lots of the links to software in this document were last updated in 1995 and so may be outdated. Any help is appreciated with correcting mistakes, whether grammar or technical, in this document. This FAQ is occasionally posted either unchanged (just to inform new readers), or updated (if I learn more or when new hardware or software becomes popular). I post to alt.binaries.sounds.{misc,d} and to comp.dsp, for maximal coverage of people interested in audio, and to {news,comp}.answers, for easy reference. The most recent version of this FAQ can be found on the web at http://www.cnpbagwell.com/audio.html along with various other audio information. Send updates, comments and questions to . I'd like to thank everyone who sent updates in the past and most of all thank Guido van Rossum for starting this FAQ in the first place. 2. Device characteristics. In this text, I will only use the term "sample" to refer to a single output value from an A/D converter, i.e., a small integer number (usually 8 or 16 bits). Audio data is characterized by the following parameters, which correspond to settings of the A/D converter when the data was recorded. Naturally, the same settings must be used to play the data. o sampling rate (in samples per second), e.g. 8000 or 44100 o number of bits per sample, e.g. 8 or 16 o number of channels (1 for mono, 2 for stereo, etc.) Approximate sampling rates are often quoted in Hz or kHz ([kilo-] Hertz), however, the politically correct term is samples per second (samples/sec). Sampling rates are always measured per channel, so for stereo data recorded at 8000 samples/sec, there are actually 16000 samples in a second. I will sometimes write 8 k as a shorthand for 8000 samples/sec. Multi-channel samples are generally interleaved on a frame-by-frame basis: if there are N channels, the data is a sequence of frames, where each frame contains N samples, one from each channel. (Thus, the sampling rate is really the number of *frames* per second.) For stereo, the left channel usually comes first. The specification of the number of bits for audio in a compressed format, such as u-law samples, is somewhat problematic. u-law samples are logarithmically encoded in 8 bits, like a tiny floating point number; however, their dynamic range is that of 14 bit linear data. There are various other techniques for encoding linear data in to less bits. See the section ``Compression Schemes'' for further information. 3. Popular sampling rates. Some sampling rates are more popular than others, for various reasons. Some recording hardware is restricted to (approximations of) some of these rates, some playback hardware has direct support for some. The popularity of divisors of common rates can be explained by the simplicity of clock frequency dividing circuits :-). Samples/sec Description 5500 One fourth of the Mac sampling rate (rarely seen). 7333 One third of the Mac sampling rate (rarely seen). 8000 Exactly 8000 samples/sec is a telephony standard that goes together with u-law (and also A-law) encoding. Some systems use an slightly different rate; in particular, the NeXT workstation uses 8012.8210513, apparently the rate used by Telco CODECs. 11 k Either 11025, a quarter of the CD sampling rate, or half the Mac sampling rate (perhaps the most popular rate on the Mac). 16000 Used by the G.722 compression standard. 18.9 k CD-ROM/XA standard. 22 k Either 22050, half the CD sampling rate, or the Mac rate; the latter is precisely 22254.545454545454 but usually misquoted as 22000. (Historical note: 22254.5454... was the horizontal scan rate of the original 128k Mac.) 32000 Used in digital radio, NICAM (Nearly Instantaneous Compandable Audio Matrix [IBA/BREMA/BBC]) and other TV work, at least in the UK; also long play DAT and Japanese HDTV. 37.8 k CD-ROM/XA standard for higher quality. 44056 This weird rate is used by professional audio equipment to fit an integral number of samples in a video frame. 44100 The CD sampling rate. (DAT players recording digitally from CD also use this rate.) 48000 The DAT (Digital Audio Tape) sampling rate for domestic use. While professional musicians disagree, most people don't have a problem if recorded sound is played at a slightly different rate, say, 1-2%. On the other hand, if recorded data is being fed into a playback device in real time (say, over a network), even the smallest difference in sampling rate can frustrate the buffering scheme used. There may be an emerging tendency to standardize on only a few sampling rates and encoding styles, even if the file formats may differ. The suggested rates and styles are: rate (samp/sec) style mono/stereo 8000 8-bit u-law mono 22050 8-bit linear unsigned mono and stereo 44100 16-bit linear signed mono and stereo 4. Compression schemes. Strange though it seems, audio data is remarkably hard to compress effectively. For 8-bit data, a Huffman encoding of the deltas between successive samples is relatively successful. For 16-bit data, companies like Sony, Philips and tons of others have spent millions to develop proprietary schemes. (Note that silence detection can also be considered compression schemes.) 4.1. ITU-T G.711, u-law and A-law. u-law (pronounced mu-law -- the u really stands for the Greek letter mu) is an encoding commonly used in North America and Japan for digital telephony. u-law samples are logarithmically encoded in 8 bits, like a tiny floating point number; however, their dynamic range is that of 14 bit linear data. When you convert u-law back into 16-bit data you will lose some quality because of the reduced dynamic range. There exists another encoding similar to u-law, called A-law, which is used as a European telephony standard. See the section ``File Formats'' for some formula describing u-law and A-law. This encoding method comes out to be 60 kbits/sec at 8kHz. Source for converting to/from u-law/A-law (written by Jef Poskanzer) is distributed as part of the SOX package mentioned later; it can easily be ripped apart to serve in other applications. The official definition is the ITU-T standard G.711 (formally CCITT G.711). 4.2. CCITT G.721, G.723, and ITU-T G.726. CCITT defined public standards for compressing voice data in CCITT G.721 (ADPCM at 32 kbits/sec) and G.723 (ADPCM at 24 and 40 kbits/sec). ADPCM stands for Adaptive Differential Pulse Code Modulation and is a common method for compressing audio data. It takes advantage of the fact that you can generally predict the value of the next sound sample based on the previous sound sample. Most ADPCM implementations are a good compromise between fast processing, good compression rates, and good quality decoding. Sun Microsystems has placed the source code of a portable implementation of the CCITT ADPCM algorithms (as well as G.711, which defines A-law and u-law) in the public domain (needless to say, their proprietary implementation distributed in binary form with Solaris is better :-). One place to ftp this source code from is ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z. ITU (which is now the name for CCITT) put out a replacement standard G.726 for both G.721 and G.723 to define standards for digitalization of audio signals at 16, 24, 32 and 40 kbits/second using ADPCM. These rates are often referred to by the bit size of a sample which are 2-bits, 3-bits, 4-bits, and 5-bits respectively. 4.3. IMA/DVI ADPCM IMA/DVI ADPCM is a standard that compresses 16-bit sound data into only 4-bits. It is thought to be faster then Microsoft's ADPCM implementation. Source for a 32 kbits/sec ADPCM implementation, assumed to be compatible with Intel's DVI audio format, can be ftp'ed from ftp://ftp.cwi.nl/pub/audio/adpcm.shar. Source to handle IMA/DVI ADPCM formats in .WAV files is included in SOX, mentioned later. 4.4. Microsoft ADPCM Microsoft, as usual, thought it was important to create their own variant of ADPCM for use in their .WAV file format. It also compresses 16-bit sound data into 4-bit data. It should be very similar in quality to IMA's ADPCM. Source for MS ADPCM used in Microsoft WAVE files can be found in SOX, mentioned later. 4.5. LPC-10E LPC-10E is defined by US DOD Federal Standard 1015 and stands for Linear Prediction Coder (Enhanced) and has a 2400 bits/s rate. Here's a note about LPC and CELP audio codings by Van Jacobson : Several people used the words "LPC" and "CELP" interchangeably. They are very different. An LPC (Linear Predictive Coding) coder fits speech to a simple, analytic model of the vocal tract, then throws away the speech & ships the parameters of the best- fit model. An LPC decoder uses those parameters to generate synthetic speech that is usually more-or-less similar to the original. The result is intelligible but sounds like a machine is talking. 4.6. CELP CELP is defined by US DOD Federal Standard 1016 and stands for Code Excited Linear Prediction and has a 4800 bits/s rate. It is important to understand LPC-10E to understand CELP. Van Jacobson also provided the following information about CELP: A CELP (Code Excited Linear Predictor) coder does the same LPC modeling but then computes the errors between the original speech & the synthetic model and transmits both model parameters and a very compressed representation of the errors (the compressed representation is an index into a 'code book' shared between coders & decoders -- this is why it's called "Code Excited"). A CELP coder does much more work than an LPC coder (usually about an order of magnitude more) but the result is much higher quality speech: The FIPS-1016 CELP we're working on is essentially the same quality as the 32Kb/s ADPCM coder but uses only 4.8Kb/s (the same as the LPC coder). The Real Audio streaming audio players use CELP for the original Version 2 28.8k audio codec (audio coder/decoder routines) but they have since concentrated on codecs that are patented and proprietary methods. 4.7. GSM 06.10. GSM 06.10 stands for Global System for Mobile Communications and is a variant of LPC called RPE-LPC (Regular Pulse Excited - Linear Predictive Coder) and is a European standard originally for use in encoding speech for satellite distribution to mobile phones. It can be found in use in various telephony products such as voice mail applications. It compresses 160 13-bit samples into 260 bits (or 33 bytes), i.e. 1650 bytes/sec (at 8000 samples/sec). It results in very good compression with good quality output but is very costly in terms of performance. You may read more information about it and a free implementation of it at http://kbs.cs.tu-berlin.de/~jutta/toast.html and grab its source from ftp://tub.cs.tu-berlin.de/pub/tubmik/gsm-1.0.10.tar.gz. 4.8. shorten. Tony Robinson has written a good FAST loss-less compression for lots of different audio formats (particularly good for WAV and MOD files). You can obtain the latest version of shorten from http://www.softsound.com/. It has a free license for non-commercial use. Because of its license though you don't see support for it and many programs. 4.9. Real Audio Enough people ask about what compression schemes that Real Audio uses that I've created a section for it. The latest software supports a multitude of different compression schemes by using plug-in codecs (audio coder/decoder routines). In version 2.0 of the Real Audio player there were two codecs. The first was a 14.4k codec that used a modified version of GSM to compress the data. A 28.8k codec used CELP that was described above. 4.10. MPEG MPEG is an audio/video compression standard that has gained wide acceptance across industries. It has become popular to use the audio portion of the standard to store audio files since it provides near CD quality output at relatively low bit rates. It is very computational intensive, especially during the encoding phase. There are 3 layers supported, with the 3rd layer the most popular, which include: o Layer-1: From 32 kbps to 448 kbps - target bit rate of 192 kbps o Layer-2: From 32 kbps to 384 kbps - target bit rate of 128 kbps o Layer-3: From 32 kbps to 320 kbps - target bit rate of 64 kbps 4.11. Misc. Apple has an Audio Compression/Expansion scheme called ACE (on the GS) / MACE (on the Macintosh). It's a lossy scheme that attempts to predict where the wave will go on the next sample. There's very little quality change on 8:4 compression, somewhat more for 8:3. It does guarantee exactly 50% or 62.5% compression, though. I believe MACE uses larger ratios/more loss, but I'm unsure of the specific numbers. (Marc Sira) 5. Current hardware. I am aware of the following computer systems that can play back and (sometimes) record audio data, with their characteristics. Note that for most systems you can also buy "professional" sampling hardware, which supports much better quality, e.g. >= 44.1 k 16 bits stereo. The characteristics listed here are a rough estimate of the capabilities of the basic hardware only (and even here I am on thin ice, with systems becoming ever more powerful). machine bits max sampling rate #output channels Mac (all types) 8 22k 1 Mac (newer ones) 16 64k 4(128) Apple IIgs 8 32k / >70k 16(st) PC/soundblaster pro 8 (22k st, 44.1k mo 1(st) & compatibles PC/soundblaster 16 16 44.1k 1(st) & later & compatibles Atari ST 8 22k 1 Atari STE,TT 8 50k 2 Atari Falcon 030 16 50k 8(st) Amiga 8 varies above 29k 4(st) Sun Sparc u-law 8k 1 Sun Sparcst. 10 u-law,8,16 48k 1(st) NeXT u-law,8,16 44.1k 1(st) SGI Indigo 8,16 48k 4(st) SGI Indigo2,Indy 8,16 48k 16(st,4-channel) Acorn Archimedes ~u-law ~180k 8(st) Sony NWS-3xxx u,A,8,16 8-37.8k 1(st) Sony NWS-5xxx u,A,8,16 8-48k 1(st) VAXstation 4000 u-law 8k 1 DEC 3000 u-law 8k 1 DEC 5000/20-25 u-law 8k 1 Tandy 1000/*L* 8 >=44k 1 Tandy 2500 8 >=44k 1 HP9000/705,710,425e u,A-law,16 8k 1 HP9000/715,725,735 u,A-law,16 48k 1(st) HP9000/755 option: u,A-law,16 48k 1(st) NCD MCX terminal u,A,8,16 52k 1(st) 4(st) means "four voices, stereo"; sampling rates xx/yy are different recording/playback rates; *L* is any type with 'L' in it. All these machines can play back sound without additional hardware, although the needed software is not always standard; also, some machines need external hardware to record sound (or to record at higher quality, like the NeXT, whose built-in sampling hardware only does 8000 samples/sec in u-law). Please don't send me details on optional or 3rd party hardware, there is too much and it is really beyond the scope of this FAQ. In particular, there is a separate newsgroup devoted to PC sound cards: comp.sys.ibm.pc.soundcard, which includes FAQ of its own (also posted to comp.answers and news.answers). The new VAXstation 4000 (VLC and model 60) series lets you PLAY audio (.au) files, and the package DECsound will let you do the recording. In fact, DECsound is given away free with Motif 1.1 and supports the VAXstation, Sun SPARCstation, DECvoice, and DECaudio devices. Sun sound files work without change. The Alpha systems also have DECsound bundled with Motif. Also, the DEC2000/300 (aka DECpc AXP 150) can use a Microsoft Sound Card, with AudioFile (see below) for sound. The SGI Personal IRIS 4D/30 and 4D/35 have the same capabilities as the Indigo. The audio board was optional on the 4D/30. The Indigo2 and Indy features are a superset of the Indigo features. The new Apple Macs have more powerful audio hardware; the latest models have built-in microphones. Software exists for the PC that can play sound on its 1-bit speaker using pulse width modulation; Older Soundblaster boards record at rates up to 13 k and plays back up to 22 k (weird combination, but that's the way it is). Newer ones can record at 22k and play at 44k. Here's some info about the newest Atari machine, the Falcon030. This machine has stereo 16 bit CODECs and a 32 MHz Motorola 56001 that can handle 8 channels of 16 bit audio, up to 50 khz/channel with simultaneous playback and record. The Falcon DMA sound engine is also compatible with the 8 bit stereo DMA used on the STe and TT. All of these systems use signed data. On the NeXT, the Motorola 56001 DSP chip is programmable and you can (in principle) do what you want. The SGI Indigo uses the same DSP chip but it can't be programmed by users -- SGI prefers to offer it as a shared system resource to multiple applications, thus enabling developers to program audio with their Audio Library and avoid code modifications for execution on future machines with different audio hardware, i.e. a different DSP. For example, the Indigo2 and Indy do not have a DSP chip. The Amiga also has a 6-bit volume, which can be used to produce something like a 14-bit output for each voice. The hardware can also use one of each voice-pair to modulate the other in FM (period) or AM (volume, 6-bits). The Acorn Archimedes uses a variation on u-law with the bit order reversed and the sign bit in bit 0. Being a 'minority' architecture, Arc owners are quite adept at converting sound/image formats from other machines, and it is unlikely that you'll ever encounter sound in one of the Arc's own formats (there are several). Tandy notes (Jeffrey L. Hayes ): The maximum sampling rate for output is at least 44k. (I don't know the maximum rates; I have recorded at 22k and played at 44k. Higher rates are probably possible.) There is one output channel, not three. The belief that there are 3 channels probably stems from the fact that Music.pdm, bundled with these machines, can create 3- channel music modules (analogous to Amiga .mod's). Music.pdm probably does that because it is designed to work with the Tandy's 3-voice tone generator circuitry (compatible with the Texas Instruments SN76496 in the IBM PC-Jr) if there is insufficient RAM to load sound samples. The Tandy chip is able to record at lower rates than it is able to play back, as is the Soundblaster (i.e., the divider used to program the chip to record is lower than that used to program the chip to play back). The Tandy DAC can go faster than the original Soundblaster, however. The NCD MCX terminal has audio integrated with its X server. The NCDAudio server is an extension of the X server, working together with it, with stress on the networking capability of sound transmission. The NCDAudio API provides format handling (ULAW8, Linear Unsig 8, Linear Sig 8, Linear Sig 16 MSB, Linear Unsig 16 MSB), flowing (to the server, from the server, to the i/o, from the i/o), wave form generators (Square, Sine, Saw, Constant) and the capability of area broadcast using UDP. Provision for manipulating data files (SND, WAV, VOC & AU) is also provided. CD-I machines form a special category. The following formats are used: o PCM 44.1 kHz standard CD format o ADPCM - Addaptive Delta PCM 1. Level A 37.8 kHz 8-bit 2. Level B 37.8 kHz 4-bit 3. Level C 18.9 kHz 4-bit 6. Overview of file formats. Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats -- sometimes losing information, however. File formats are a separate issue from device characteristics. There are two types of file formats: self-describing formats, where the device parameters and encoding are made explicit in some form of header, and headerless formats (sometimes called "raw"), where the device parameters and encoding are fixed. 6.1. Self-describing file formats. Self-describing file formats generally define a family of data encodings, where a header field indicates the particular encoding variant used. The header of self-describing formats contains the parameters of the sampling device and sometimes other information (e.g. a human-readable description of the sound, or a copyright notice). Most headers begin with a simple "magic word". (Some formats do not simply define a header format, but may contain chunks of data intermingled with chunks of encoding info.) The data encoding defines how the actual samples are stored in the file, e.g. signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Strictly speaking, channel interleaving is also part of the encoding, although so far I have seen little variation in this area. Here's an overview of popular file formats. extension, name origin variable parameters (fixed; comments) .au or .snd NeXT, Sun rate, #channels, encoding, info string .aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info .aif(f), AIFC Apple, SGI same (extension of AIFF with compression) .iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits) .mp2, .mp3 MPEG standard rate, #channels, sample quality .ra Real Networks rate, #channels, sample quality .sf IRCAM rate, #channels, encoding, info .smp Turtle Beach loops, cues, (16 bits/1 ch) .voc Soundblaster rate (8 bits/1 ch; can use silence deletion) .wav, WAVE Microsoft rate, #channels, sample width, lots of info .wve Psion (8 bits, 1 ch, a-law, 8khz) none, HCOM Mac rate (8 bits/1 ch; uses Huffman compression) none, MIME Internet (see below) none, NIST SPHERE DARPA speech community (see below) .mod or .nst Amiga (see below) Note that the filename extension ".snd" is ambiguous: it can be either the self-describing NeXT format or the headerless Mac/PC format, or even a headerless Amiga format. I know nothing for sure about the origin of HCOM files. The filenames usually don't have a ".hcom" extension, but this is what SOX (see the section ``File conversion'') uses. The file format recognized by SOX includes a MacBinary header, where the file type field is "FSSD". The data fork begins with the magic word "HCOM" and contains Huffman compressed data; after decompression it it is 8 bits unsigned data. IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc). Compression is optional (and extensible); volume is variable; author, notes and copyright properties; etc. AIFF, AIFC and WAVE are similar in spirit but allow more freedom in encoding style (other than 8 bit/sample), amongst others. There are other sound formats in use on Amiga by digitizers and music programs, such as IFF/SMUS. An interesting "interchange format" for audio data is described in the proposed Internet Standard "MIME", which describes a family of transport encodings and structuring devices for electronic mail. This is an extensible format, and initially standardizes a type of audio data dubbed "audio/basic", which is 8-bit u-law data sampled at 8000 samples/sec. The "IRCAM" sound file system has now been superseded by the so-called "BICSF" (for Berkeley/IRCAM/CARL Sound File system) software release. More recently, there has been an effort at Princeton (Prof. Paul Lansky) and Stanford (Stephen Travis Pope) to standardize several extensions to BICSF. A description of BICSF and the Princeton/Stanford extensions is available by anonymous ftp at ftp://ftp.cwi.nl/pub/audio/BICSF-info. This file contains further ftp pointers to software. 6.2. Headerless file formats. Headerless formats define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample). extension origin parameters or name .snd, .fssd Mac, PC variable rate, 1 channel, 8 bits unsigned .ul US telephony 8 k, 1 channel, 8 bit "u-law" encoding .snd? Amiga variable rate, 1 channel, 8 bits signed It is usually easy to distinguish 8-bit signed formats from unsigned by looking at the beginning of the data with 'od -b Mac [Victor J. Heinz, vic:wbst128@xerox.com] Converts Sun uLaw to Mac 'snd'. ULAW [Rod Kennedy, rod@faceng.anu.edu.au] Converts 'snd' to Sun uLaw. UUTool [Bernie Wieser, wieser@acs.ucalgary.ca] Primarily a uuencode/decode program, but in true Swiss Army Knife fashion can also read/write Sun uLaw, AIFF, and 'snd' files. ModVoicer [Kip Walker, Kip_Walker@mcimail.com] Converts Amiga MOD voices into SoundEdit files or 'snd' resources. Music 5 Mac [Simone Bettini, space@maya.dei.unipd.it] Primarily a Music Synthesis system, but can also convert between 'snd', AIFF, and IBM .DAT(?). See also the section on players -- some players also do conversions. 8. Playing audio files on UNIX. The commands needed to play an audio file depend on the file format and the available hardware and software. Most systems can only directly play sound in their native format; use a conversion program to play other formats. In general, UNIX systems that support audio will have a deviece that you can send RAW audio data to and it will be played. For those systems it is possible to use SOX to convert any sound file into the default format that the system excepts and send that data to the device. For example: sox soundfile.wav -t raw -r 8012 -u -b -c 1 | cat > /dev/audio Later versions of SOX will now how to open several UNIX audio devices and play any audio format at the highest possible sound quality. 8.1. Sun Sparcstation running SunOS 4.x. Raw u-law files can be played using "cat file >/dev/audio". A whole package for dealing with ".au" files is provided by Sun on an experimental basis, in /usr/demo/SOUND. You may have to compile the programs first. (If you can't find this directory, either you are not running SunOS 4.1 yet, or your system administrator hasn't installed it -- go ask him for it, not me!) The program "play" in this directory recognizes all files in Sun/NeXT format. A strange thing Sun did is that most older Sparc hardware only play audio files using u-law encoding at 8 k -- newer hardware plays can play linear PCM data up to 44k but can't play u-law files. If you can't find "play", you can also cat a ".au" file to /dev/audio, if it uses u-law; the header will sound like a short burst of noise but the rest of the data will sound OK (really, the only difference in this case between raw u-law and ".au" files is the header; the u-law data is exactly the same). OpenWindows 3.0 has a full-fledged audio tool (called "audiotool"). You can drop audio file icons into it, edit them, etc. Finally, current versions of SoX include the ability to convert and play sounds directly to the Sun /dev/audio device. It should include a script called "play" to do this. 8.2. Solaris. Apparently, under older versions of Solaris, writing to /dev/audio from the shell is a bad idea, because the device driver will flush its queue as soon as the file is closed. Use "audioplay" instead or a newer version of Solaris. The supported formats and sampling rates are the same as above for SunOS. Since Solaris uses the same hardware as SunOS, most all that applies to SunOS applies to Solaris as well. For example, Solaris also includes an AudioTool program. Also, as with SunOS you can use the Sox package to play files to /dev/audio. 8.3. NeXT On NeXT machines, the standard "sndplay" program can play all NeXT format files (this include Sun ".au" files). It supports at least u- law at 8 k and 16 bits samples at 22 or 44.1 k. It attempts on-the- fly conversions for other formats. Sound files are also played if you double-click on them in the file browser. 8.4. SGI Indigo, Indigo2, Indy and Personal IRIS. On SGI Indigo, Indigo2, Indy and the 4D/30 and /35 Personal IRIS workstations, "WorkSpace" plays audio files in .aiff, .aifc, .au, and .wav formats if you double click them and the sampling rate is one of 8000, 11025, 16000, 22050, 32000, 44100, or 48000. On the Personal IRIS, you need to have the audio board installed (check the output from hinv) and you must run IRIX 3.3.2 or 4.0 or higher. These files can also be played with "soundfiler" and "sfplay". ".aiff" and ".aifc" files at the above sampling rates can also be played with playaifc. (All in /usr/sbin) There is no simple /dev/audio interface on these SGI machines. (There was one on 4D/25 machines, reading and writing signed linear 8-bit samples at rates of 8, 16 and 32 k.) A program "playulaw" was posted as part of the "radio 2.0" release. It plays raw u-law files on the Indigo, Indigo2, Indy or Personal IRIS audio hardware. 8.5. Linux Linux has an entire document devoted to playing sound files that give much more detail than this document can provide. It can be obtained from Sunsite's Linux HOWTO directory. The quickest way to play all sound files under Linux is to install Sox as it includes support for playing directly to Linux's /dev/dsp device. 8.6. Others. Most other UNIX boxes don't have audio hardware and thus can't play audio data. This is actually rapidly changing and most new hardware that hits the market has some form of audio support. Unfortunately there is no single portable interface for audio that comes near the acceptance and functionality (let alone code size :-) of X11 for graphics. There are at least two network-transparent packages, both in some way based on the X11 architecture, that attempt to fill the gap: DEC CRL's AudioFile supports Digital RISC systems running Ultrix, Digital Alpha AXP systems running OSF/1, Sun Sparcs, and SGI AL- capable systems (e.g., Indigo, Indy). The source kit is located at ftp site crl.dec.com in /pub/DEC/AF. NCD's NetAudio supports NCD's MCX line of X terminals as well as Sparcs running either SunOS 4.1.3 or Solaris 2.2, using the /dev/audio interface (they claim it should be easy to port). The source it located at ftp.x.org in contrib/netaudio. It is also ported to SGI (tested on IRIX 5.x), and there are unconfirmed rumors that it is being ported to SCI and Linux. 9. Playing audio files on the Vaxstation 4000 (VMS). The section describes various ways of playing sounds under VMS. 9.1. Without DECsound. ".au" files can be played by COPYING them to device "SOA0:". This device is set up by enabling the driver SODRIVER. You can use the following command file: $!---------------- cut here ------------------------------- $! sound_setup.com enable SOUND driver $ run sys$system:sysgen connect soa0 /adapter=0 /csr=%x0e00 /vector=%o304 /driver=sodriver exit $ exit $!----------------- cut here ------------------------------------ 9.2. With DECsound (bundled with motif). Just start DECsound by selecting it from the session manager in the applications menu. (Not there use "@vue$library:sound$vue_startup"). Make sure settings; device type (vaxstation 4000) and play settings (headphone jack) are selected. To play files from the DCL prompt (handy if you want to play sounds on a remote workstation) set a symbol up as follows; PLAY == "$DECSOUND -VOLUME 50 -PLAY" usage; DCL> play sound.au 9.3. Audio port. The external audio port comes with a telephone-jack-like port. For starters, you can plug a telephone RECEIVER right into this port to hear your first sound files. After that, you can use the adapter (that came with the VaxStation), and plug in a small set of stereo speakers or headphones (the kind you'd plug into a WALKMAN, for example), for more volume. The adapter also has a microphone plug so that you can record sounds if DECsound is installed. 10. Playing audio files on a PC. This section gives a quick overview of playing audio files on PC type computers (not just those based on Intel chips). 10.1. PC or compatible. 10.2. IBM PC and compatibles. Most PCs have at least a speaker built in, so theoretically all you need is the right software. Unfortunately when this is used the sound quality suffers greatly. There are several DOS programs avaliable that allow you to play a certain sound files through it. There is also an unsupported Microsoft Windows driver that allows all multimedia windows programs to play sound out of the speaker. The best way to get this driver is to goto Microsofts web site (http://www.microsoft.com) and search for speaker.exe. Most modern PC's come with a sound card and Microsoft Windows installed. There should be a program called "mediaplayer" that will allow you to play .wav files. 10.3. Atari. I currently do not have any information on programs to play audio on Atari platforms. 10.4. Tandy. On a Tandy 1000 or 2500, sounds can be played and recorded with DeskMate Sound (SOUND.PDM), or if they are not stored in compressed format, they can also be played by a program called PLAYSND. Playsnd also plays ".voc", ".wav", ".iff", ".mod" samples, and headerless 8-bit PCM (signed or unsigned). The author, John Ball (john.ball@two- t.com) has decided to place the program and source code in the public domain. Playsnd will also play on the PC speaker. Also, Tspak (see above) contains programs to record and play ".wav" files. 10.5. Amiga. Players for Amiga's can be found at ftp://wuarchive.wustl.edu/systems/amiga/aminet/mus/play. 10.6. Apple Macintosh. Malcolm Slaney from Apple writes: [...] there are MANY tools for working with sound on the Macintosh. Three applications that come to mind immediately are SoundEdit (formerly by Farralon and now by MacroMind/Paracomp), Alchemy and Eric Keller's Signalyze. There are lots of other tools available for sound editing (including some of the QuickTime Movie tools.)" Bill Houle sent the following lists: Popular commercial apps are indicated with a [*]. All other programs mentioned are shareware/freeware available from SUMEX and the various mirror sites, or check archie for the nearest FTP location. MAC SOUND EDITORS Sample Editor [Garrick McFarlane, McFarlaneGA@Kirk.Vax.Aston.Ac.UK] Plays AIFF and 'snd' sounds. Can convert between AIFF and 'snd'. Can record from built-in mic. Can add effects such as fade, normalize, delay, etc. Wavicle [Lee Fyock] Plays SoundEdit files. Can convert to 'snd'. Can record from built-in mic. Can add effects such as fade, filter, reverb, etc. [*]SoundEdit/SoundEdit Pro [Farallon/MacroMind*Paracomp] Plays SoundEdit and 'snd' sounds. Can read/write SoundEdit files and 'snd' sounds. Can record from built-in mic. Can add effects such as echo, filter, reverb, etc. MAC SOUND PLAYERS Sound-Tracker [Frank Seide] Plays Amiga SoundTracker files in foreground or background. Macintosh Tracker [Thomas R. Lawrance, tomlaw@world.std.com] Plays Amiga SoundTracker files in foreground or background. A port of Marc Espie's Unix Tracker version with Frank Seide's core player thrown in for good measure. The Player [Antoine Rosset & Mike Venturi] Plays AIFF, SoundEdit, MOD, and 'snd' files. SoundMaster (aka [*]Kaboom!) [Bruce Tomlin] Associates SoundEdit files to MacOS events. SndControl [Riccardo Ettore, 72277.1344@compuserve.com] Associates 'snd' sounds to MacOS events. Canon 2 [Glenn Anderson, glenn@otago.ac.nz; Jeff Home, jeff@otago.ac.nz] Plays AIFF or 'snd' files in foreground or background. Another Mac play/convert program: "It's called SoundApp. I wrote it, (franke1@llnl.gov) and it's FreeWare. It will play: SoundCap, SoundEdit, WAVE, VOC, MOD, Amiga IFF (8SVX), Sound Designer, AIFF, AU, Mac Resource, and DVI ADPCM. It can convert all the above to System 7 sound resources (except MOD where just the samples are extracted.) And it will double buffer." 11. File Formats. Here are some more detailed pieces of info that I received by e-mail. They are reproduced here virtually without much editing. 11.1. AIFF Format (Audio IFF) and AIFC. This format was developed by Apple for storing high-quality sampled sound and musical instrument info; it is also used by SGI and several professional audio packages (sorry, I know no names). An extension, called AIFC or AIFF-C, supports compression (see below). The specification is very long and allows for lots of different features. It is beyond the scope of this FAQ to list its format here but there are pointers listed below for further information. If someone would like to make a short sumary of the file format for simple linear data type I would be happy to place it here. There is a BinHex'ed MacWrite version of the AIFF spec available by anonymous ftp at ftp://ftp.cwi.nl/pub/audio/AudioIFF1.3.hqx. But you may be better off with the AIFF-C specs, see below. I have made avaliable a text version of the AIFF-C specification on my web page at http://www.cnpbagwell.com/aiff-c.txt and a postscript version is available from ftp://ftp.sgi.com/sgi/aiff- c.9.26.91.ps.Z. 11.2. The NeXT/Sun audio file format. Here's the complete story on the file format, from the NeXT documentation. (Note that the "magic" number is ((int)0x2e736e64), which equals ".snd".) Also, at the end, I've added a litte document that someone posted to the net a couple of years ago, that describes the format in a bit-by-bit fashion rather than from C. I received this from Doug Keislar, NeXT Computer. This is also the Sun format, except that Sun doesn't recognize as many format codes. I added the numeric codes to the table of formats and sorted it. SNDSoundStruct: How a NeXT Computer Represents Sound The NeXT sound software defines the SNDSoundStruct structure to represent sound. This structure defines the soundfile and Mach-O sound segment formats and the sound pasteboard type. It's also used to describe sounds in Interface Builder. In addition, each instance of the Sound Kit's Sound class encapsulates a SNDSoundStruct and provides methods to access and modify its attributes. Basic sound operations, such as playing, recording, and cut-and-paste editing, are most easily performed by a Sound object. In many cases, the Sound Kit obviates the need for in-depth understanding of the SNDSoundStruct architecture. For example, if you simply want to incorporate sound effects into an application, or to provide a simple graphic sound editor (such as the one in the Mail application), you needn't be aware of the details of the SNDSoundStruct. However, if you want to closely examine or manipulate sound data you should be familiar with this structure. The SNDSoundStruct contains a header, information that describes the attributes of a sound, followed by the data (usually samples) that represents the sound. The structure is defined (in sound/soundstruct.h) as: typedef struct { int magic; /* magic number SND_MAGIC */ int dataLocation; /* offset or pointer to the data */ int dataSize; /* number of bytes of data */ int dataFormat; /* the data format code */ int samplingRate; /* the sampling rate */ int channelCount; /* the number of channels */ char info[4]; /* optional text information */ } SNDSoundStruct; SNDSoundStruct Fields magic magic is a magic number that's used to identify the structure as a SNDSoundStruct. Keep in mind that the structure also defines the soundfile and Mach-O sound segment formats, so the magic number is also used to identify these entities as containing a sound. dataLocation It was mentioned above that the SNDSoundStruct contains a header followed by sound data. In reality, the structure only contains the header; the data itself is external to, although usually contiguous with, the structure. (Nonetheless, it's often useful to speak of the SNDSoundStruct as the header and the data.) dataLocation is used to point to the data. Usually, this value is an offset (in bytes) from the beginning of the SNDSoundStruct to the first byte of sound data. The data, in this case, immediately follows the structure, so dataLocation can also be thought of as the size of the structure's header. The other use of dataLocation, as an address that locates data that isn't contiguous with the structure, is described in "Format Codes," below. dataSize It is its size in bytes (not including the size of the SNDSoundStruct). dataFormat It is a code that identifies the type of sound. For sampled sounds, this is the quantization format. However, the data can also be instructions for synthesizing a sound on the DSP. The codes are listed and explained in "Format Codes," below. samplingRate It is the sampling rate (if the data is samples). Three sampling rates, represented as integer constants, are supported by the hardware: Constant Sampling Rate (samples/sec) SND_RATE_CODEC 8012.821 (CODEC input) SND_RATE_LOW 22050.0 (low sampling rate output) SND_RATE_HIGH 44100.0 (high sampling rate output) channelCount It is the number of channels of sampled sound. info info is a NULL-terminated string that you can supply to provide a textual description of the sound. The size of the info field is set when the structure is created and thereafter can't be enlarged. It's at least four bytes long (even if it's unused). Format Codes A sound's format is represented as a positive 32-bit integer. NeXT reserves the integers 0 through 255; you can define your own format and represent it with an integer greater than 255. Most of the formats defined by NeXT describe the amplitude quantization of sampled sound data: Value Code Format 0 SND_FORMAT_UNSPECIFIED unspecified format 1 SND_FORMAT_MULAW_8 8-bit mu-law samples 2 SND_FORMAT_LINEAR_8 8-bit linear samples 3 SND_FORMAT_LINEAR_16 16-bit linear samples 4 SND_FORMAT_LINEAR_24 24-bit linear samples 5 SND_FORMAT_LINEAR_32 32-bit linear samples 6 SND_FORMAT_FLOAT floating-point samples 7 SND_FORMAT_DOUBLE double-precision float samples 8 SND_FORMAT_INDIRECT fragmented sampled data 9 SND_FORMAT_NESTED ? 10 SND_FORMAT_DSP_CORE DSP program 11 SND_FORMAT_DSP_DATA_8 8-bit fixed-point samples 12 SND_FORMAT_DSP_DATA_16 16-bit fixed-point samples 13 SND_FORMAT_DSP_DATA_24 24-bit fixed-point samples 14 SND_FORMAT_DSP_DATA_32 32-bit fixed-point samples 15 ? 16 SND_FORMAT_DISPLAY non-audio display data 17 SND_FORMAT_MULAW_SQUELCH ? 18 SND_FORMAT_EMPHASIZED 16-bit linear with emphasis 19 SND_FORMAT_COMPRESSED 16-bit linear with compression 20 SND_FORMAT_COMPRESSED_EMPHASIZED A combination of the two above 21 SND_FORMAT_DSP_COMMANDS Music Kit DSP commands 22 SND_FORMAT_DSP_COMMANDS_SAMPLES ? [Some new ones supported by Sun. This is all I currently know. --GvR] 23 SND_FORMAT_ADPCM_G721 24 SND_FORMAT_ADPCM_G722 25 SND_FORMAT_ADPCM_G723_3 26 SND_FORMAT_ADPCM_G723_5 27 SND_FORMAT_ALAW_8 Most formats identify different sizes and types of sampled data. Some deserve special note: SND_FORMAT_DSP_CORE format contains data that represents a loadable DSP core program. Sounds in this format are required by the SNDBootDSP() and SNDRunDSP() functions. You create a SND_FORMAT_DSP_CORE sound by reading a DSP load file (extension ".lod") with the SNDReadDSPfile() function. SND_FORMAT_DSP_COMMANDS is used to distinguish sounds that contain DSP commands created by the Music Kit. Sounds in this format can only be created through the Music Kit's Orchestra class, but can be played back through the SNDStartPlaying() function. SND_FORMAT_DISPLAY format is used by the Sound Kit's SoundView class. Such sounds can't be played. SND_FORMAT_INDIRECT indicates data that has become fragmented, as described in a separate section, below. SND_FORMAT_UNSPECIFIED is used for unrecognized formats. Fragmented Sound Data Sound data is usually stored in a contiguous block of memory. However, when sampled sound data is edited (such that a portion of the sound is deleted or a portion inserted), the data may become discontiguous, or fragmented. Each fragment of data is given its own SNDSoundStruct header; thus, each fragment becomes a separate SNDSoundStruct structure. The addresses of these new structures are collected into a contiguous, NULL-terminated block; the dataLocation field of the original SNDSoundStruct is set to the address of this block, while the original format, sampling rate, and channel count are copied into the new SNDSoundStructs. Fragmentation serves one purpose: It avoids the high cost of moving data when the sound is edited. Playback of a fragmented sound is transparent-you never need to know whether the sound is fragmented before playing it. However, playback of a heavily fragmented sound is less efficient than that of a contiguous sound. The SNDCompactSamples() C function can be used to compact fragmented sound data. Sampled sound data is naturally unfragmented. A sound that's freshly recorded or retrieved from a soundfile, the Mach-O segment, or the pasteboard won't be fragmented. Keep in mind that only sampled data can become fragmented. >From mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps Wed Apr 4 23:56:23 EST 1990 Article 5779 of comp.sys.next: Path: mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps >From: eps@toaster.SFSU.EDU (Eric P. Scott) Newsgroups: comp.sys.next Subject: Re: Format of NeXT sndfile headers? Message-ID: <445@toaster.SFSU.EDU> Date: 31 Mar 90 21:36:17 GMT References: <14978@phoenix.Princeton.EDU> Reply-To: eps@cs.SFSU.EDU (Eric P. Scott) Organization: San Francisco State University Lines: 42 In article <14978@phoenix.Princeton.EDU> bskendig@phoenix.Princeton.EDU (Brian Kendig) writes: >I'd like to take a program I have that converts Macintosh sound files >to NeXT sndfiles and polish it up a bit to go the other direction as >well. Two people have already submitted programs that do this (Christopher Lane and Robert Hood); check the various NeXT archive sites. > Could someone please give me the format of a NeXT sndfile >header? "big-endian" 0 1 2 3 +-------+-------+-------+-------+ 0 | 0x2e | 0x73 | 0x6e | 0x64 | "magic" number +-------+-------+-------+-------+ 4 | | data location +-------+-------+-------+-------+ 8 | | data size +-------+-------+-------+-------+ 12 | | data format (enum) +-------+-------+-------+-------+ 16 | | sampling rate (int) +-------+-------+-------+-------+ 20 | | channel count +-------+-------+-------+-------+ 24 | | | | | (optional) info string 28 = minimum value for data location data format values can be found in /usr/include/sound/soundstruct.h Most common combinations: sampling channel data rate count format voice file 8012 1 1 = 8-bit mu-law system beep 22050 2 3 = 16-bit linear CD-quality 44100 2 3 = 16-bit linear 11.3. IFF/8SVX Format. The following email describes the IFF/8SVX format: Newsgroups: alt.binaries.sounds.d,alt.sex.sounds Subject: Format of the IFF header (Amiga sounds) Message-ID: <2509@tardis.Tymnet.COM> >From: jms@tardis.Tymnet.COM (Joe Smith) Date: 23 Oct 91 23:54:38 GMT Followup-To: alt.binaries.sounds.d Organization: BT North America (Tymnet) The first 12 bytes of an IFF file are used to distinguish between an Amiga picture (FORM-ILBM), an Amiga sound sample (FORM-8SVX), or other file conforming to the IFF specification. The middle 4 bytes is the count of bytes that follow the "FORM" and byte count longwords. (Numbers are stored in M68000 form, high order byte first.) ------------------------------------------ FutureSound audio file, 15000 samples at 10.000KHz, file is 15048 bytes long. 0000: 464F524D 00003AC0 38535658 56484452 FORM..:.8SVXVHDR F O R M 15040 8 S V X V H D R 0010: 00000014 00003A98 00000000 00000000 ......:......... 20 15000 0 0 0020: 27100100 00010000 424F4459 00003A98 '.......BODY..:. 10000 1 0 1.0 B O D Y 15000 0000000..03 = "FORM", identifies this as an IFF format file. FORM+00..03 (ULONG) = number of bytes that follow. (Unsigned long int.) FORM+03..07 = "8SVX", identifies this as an 8-bit sampled voice. ????+00..03 = "VHDR", Voice8Header, describes the parameters for the BODY. VHDR+00..03 (ULONG) = number of bytes to follow. VHDR+04..07 (ULONG) = samples in the high octave 1-shot part. VHDR+08..0B (ULONG) = samples in the high octave repeat part. VHDR+0C..0F (ULONG) = samples per cycle in high octave (if repeating), else 0. VHDR+10..11 (UWORD) = samples per second. (Unsigned 16-bit quantity.) VHDR+12 (UBYTE) = number of octaves of waveforms in sample. VHDR+13 (UBYTE) = data compression (0=none, 1=Fibonacci-delta encoding). VHDR+14..17 (FIXED) = volume. (The number 65536 means 1.0 or full volume.) ????+00..03 = "BODY", identifies the start of the audio data. BODY+00..03 (ULONG) = number of bytes to follow. BODY+04..NNNNN = Data, signed bytes, from -128 to +127. 0030: 04030201 02030303 04050605 05060605 0040: 06080806 07060505 04020202 01FF0000 0050: 00000000 FF00FFFF FFFEFDFD FDFEFFFF 0060: FDFDFF00 00FFFFFF 00000000 00FFFF00 0070: 00000000 00FF0000 00FFFEFF 00000000 0080: 00010000 000101FF FF0000FE FEFFFFFE 0090: FDFDFEFD FDFFFFFC FDFEFDFD FEFFFEFE 00A0: FFFEFEFE FEFEFEFF FFFFFEFF 00FFFF01 This small section of the audio sample shows the number ranging from -5 (0xFD) to +8 (0x08). Warning: Do not assume that the BODY starts 48 bytes into the file. In addition to "VHDR", chunks labeled "NAME", "AUTH", "ANNO", or "(c) " may be present, and may be in any order. You will have to check the byte count in each chunk to determine how many bytes to skip. 11.4. US Federal Standard 1016 availability. >From: jpcampb@afterlife.ncsc.mil (Joe Campbell) The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C simulation source codes are available for worldwide distribution (on DOS diskettes, but configured to compile on Sun SPARC stations) from NTIS and DTIC. Example input and processed speech files are included. A Technical Information Bulletin (TIB), "Details to Assist in Implementation of Federal Standard 1016 CELP," and the official standard, "Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP)," are also available. This is available through the National Technical Information Service: NTIS U.S. Department of Commerce 5285 Port Royal Road Springfield, VA 22161 USA (703) 487-4650 The "AD" ordering number for the CELP software is AD M000 118 (US$ 90.00) and for the TIB it's AD A256 629 (US$ 17.50). The LPC-10 standard, described below, is FIPS Pub 137 (US$ 12.50). There is a $3.00 shipping charge on all U.S. orders. The telephone number for their automated system is 703-487-4650, or 703-487-4600 if you'd prefer to talk with a real person. (U.S. DoD personnel and contractors can receive the package from the Defense Technical Information Center: DTIC, Building 5, Cameron Station, Alexandria, VA 22304-6145. Their telephone number is 703-274-7633.) The following articles describe the Federal-Standard-1016 4.8-kbps CELP coder (it's unnecessary to read more than one): Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155. Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard 1016)," in Advances in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133. Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech Technology Magazine, April/May 1990, p. 58-64. The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400 bps linear prediction coder (LPC-10) was republished as a Federal Information Processing Standards Publication 137 (FIPS Pub 137). It is described in: Thomas E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10," Speech Technology Magazine, April 1982, p. 40-49. There is also a section about FS-1015 in the book: Panos E. Papamichalis, Practical Approaches to Speech Coding, Prentice-Hall, 1987. The voicing classifier used in the enhanced LPC-10 (LPC-10e) is described in: Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced Classification of Speech with Applications to the U.S. Government LPC-10E Algorithm," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986, p. 473-6. Copies of the official standard "Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP)" are available for US$ 5.00 each from: GSA Federal Supply Service Bureau Specification Section, Suite 8100 470 E. L'Enfant Place, S.W. Washington, DC 20407 (202)755-0325 Realtime DSP code for FS-1015 and FS-1016 is sold by: John DellaMorte DSP Software Engineering 165 Middlesex Tpk, Suite 206 Bedford, MA 01730 USA 1-617-275-3733 1-617-275-4323 (fax) dspse.bedford@channel1.com DSP Software Engineering's FS-1016 code can run on a DSP Research's Tiger 30 (a PC board with a TMS320C3x and analog interface suited to development work). DSP Research 1095 E. Duane Ave. Sunnyvale, CA 94086 USA (408)773-1042 (408)736-3451 (fax) From: tobiasr@monolith.lrmsc.loral.com (Richard Tobias) For U.S. FED-STD-1016 (4800 bps CELP) _realtime_ DSP code and information about products using this code using the AT&T DSP32C and AT&T DSP3210, contact: White Eagle Systems Technology, Inc. 1123 Queensbridge Way San Jose, CA 95120 (408) 997-2706 (408) 997-3584 (fax) rjjt@netcom.com From: Cole Erskine [paraphrased] Analogical Systems has a _real-time_ multirate implementation of U.S. Federal Standard 1016 CELP operating at bit rates of 4800, 7200, and 9600 bps on a single 27MHz Motorola DSP56001. Source and object code is available for a one-time license fee. FREE, _real-time_ demonstration software for the Ariel PC-56D is available for those who already have such a board by contacting Analogical Systems. The demo software allows you to record and playback CELP files to and from the PC's hard disk. Analogical Systems 2916 Ramona Street Palo Alto, CA 94306 Tel: +1 (415) 323-3232 FAX: +1 (415) 323-4222 11.5. Creative Voice (VOC) file format. Creative Voice (VOC) file format. >From: galt@dsd.es.com (byte numbers are hex!) HEADER (bytes 00-19) Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block] - --------------------------------------------------------------- HEADER: ------- byte # Description ------ ------------------------------------------ 00-12 "Creative Voice File" 13 1A (eof to abort printing of file) 14-15 Offset of first datablock in .voc file (std 1A 00 in Intel Notation) 16-17 Version number (minor,major) (VOC-HDR puts 0A 01) 18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11) - --------------------------------------------------------------- DATA BLOCK: ----------- Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes) NOTE: Terminator Block is an exception -- it has only the TYPE byte. TYPE Description Size (3-byte int) Info ---- ----------- ----------------- ----------------------- 00 Terminator (NONE) (NONE) 01 Sound data 2+length of data * 02 Sound continue length of data Voice Data 03 Silence 3 ** 04 Marker 2 Marker# (2 bytes) 05 ASCII length of string null terminated string 06 Repeat 2 Count# (2 bytes) 07 End repeat 0 (NONE) 08 Extended 4 *** *Sound Info Format: **Silence Info Format: --------------------- ---------------------------- 00 Sample Rate 00-01 Length of silence - 1 01 Compression Type 02 Sample Rate 02+ Voice Data ***Extended Info Format: --------------------- 00-01 Time Constant: Mono: 65536 - (256000000/sample_rate) Stereo: 65536 - (25600000/(2*sample_rate)) 02 Pack 03 Mode: 0 = mono 1 = stereo Marker# -- Driver keeps the most recent marker in a status byte Count# -- Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD repetitions or FFFF for endless repetitions Sample Rate -- SR byte = 256-(1000000/sample_rate) Length of silence -- in units of sampling cycle Compression Type -- of voice data 8-bits = 0 4-bits = 1 2.6-bits = 2 2-bits = 3 Multi DAC = 3+(# of channels) [interesting-- this isn't in the developer's manual] Detailed description of new data blocks (VOC files version 1.20 and above): (Source is fax from Barry Boone at Creative Labs, 405/742-6622) BLOCK 8 - digitized sound attribute extension, must preceed block 1. Used to define stereo, 8 bit audio BYTE bBlockID; // = 8 BYTE nBlockLen[3]; // 3 byte length WORD wTimeConstant; // time constant = same as block 1 BYTE bPackMethod; // same as in block 1 BYTE bVoiceMode; // 0-mono, 1-stereo Data is stored left, right BLOCK 9 - data block that supersedes blocks 1 and 8. Used for stereo, 16 bit. BYTE bBlockID; // = 9 BYTE nBlockLen[3]; // length 12 plus length of sound DWORD dwSamplesPerSec; // samples per second, not time const. BYTE bBitsPerSample; // e.g., 8 or 16 BYTE bChannels; // 1 for mono, 2 for stereo WORD wFormat; // see below BYTE reserved[4]; // pad to make block w/o data // have a size of 16 bytes Valid values of wFormat are: 0x0000 8-bit unsigned PCM 0x0001 Creative 8-bit to 4-bit ADPCM 0x0002 Creative 8-bit to 3-bit ADPCM 0x0003 Creative 8-bit to 2-bit ADPCM 0x0004 16-bit signed PCM 0x0006 CCITT a-Law 0x0007 CCITT u-Law 0x02000 Creative 16-bit to 4-bit ADPCM Data is stored left, right 11.6. RIFF WAVE (.WAV) file format. RIFF is a format by Microsoft and IBM that is in little-endian byte order. WAVE is RIFF's equivalent of AIFF, and its inclusion in Microsoft Windows 3.1 has made it important to know about. Rob Ryan was kind enough to send me a description of the RIFF format. Unfortunately, it is too big to include here (27 k), but I've made it available for anonymous ftp at ftp://ftp.cwi.nl/pub/audio/RIFF-format. Conor Frederick Prischmann points to ftp://ftp.ircam.fr/pub/music/. The following is a overly simple description of a WAV file and will generally only work when it contains PCM data. This isn't so bad since thats what 90% want to work with. WAVe file format (Microsoft) ---------------------------- Wave files are a part of a file interchange format, called RIFF, created by Microsoft. The format basically is composed of a collection of data chunks. Each chunk has a 32-bit Id field, followed by a 32-bit chunk length, followed by the chunk data. Note that values are in Intel form (ie: big- endian notation). The format for a wave file is as follows: Offset Description ------ ----------- 0x00 chunk id 'RIFF' 0x04 chunk size (32-bits) 0x08 wave chunk id 'WAVE' 0x0C format chunk id 'fmt ' 0x10 format chunk size (32-bits) 0x14 format tag (currently pcm) 0x16 number of channels 1=mono, 2=stereo 0x18 sample rate in hz 0x1C average bytes per second 0x20 number of bytes per sample 1 = 8-bit mono 2 = 8-bit stereo or 16-bit mono 4 = 16-bit stereo 0x22 number of bits in a sample 0x24 data chunk id 'data' 0x28 length of data chunk (32-bits) 0x2C Sample data Notes ----- 1. Lengths do not include the chunk Id or the length bytes. e.g.: if the data length is 1204 then the length of sample data is 1204 and not 1204-(4+4) 2. For samples with more than 1 channel, channel 0 data will start and be followed by channel 1 for a given sample then the next sample will follow. e.g.: for 8-bit stereo the samples will sample0left, sample0right, sample1left, sample1right, etc. 3. 8-bit samples are stored in excess-128 notation. This means that the value 0 is stored as 128, a value 1 is stored as 129, a value of -1 is stored as 127 and so on. 4. 16-bit samples are stored as 2's compliment signed numbers. Sample C Structure ------------------ typedef unsigned word; typedef unsigned long dword; struct WAVEheader { char ckID[4]; /* chunk id 'RIFF' */ dword ckSize; /* chunk size */ char wave_ckID[4]; /* wave chunk id 'WAVE' */ char fmt_ckID[4]; /* format chunk id 'fmt ' */ dword fmt_ckSize; /* format chunk size */ word formatTag; /* format tag currently pcm */ word nChannels; /* number of channels */ dword nSamplesPerSec; /* sample rate in hz */ dword nAvgBytesPerSec; /* average bytes per second */ word nBlockAlign; /* number of bytes per sample */ word nBitsPerSample; /* number of bits in a sample */ char data_ckID[4]; /* data chunk id 'data' */ dword data_ckSize; /* length of data chunk */ }; 11.7. u-law and A-law definitions. [Adapted from information provided by duggan@cc.gatech.edu (Rick Duggan) and davep@zenobia.phys.unsw.EDU.AU (David Perry)] u-LAW (really mu-LAW) is sgn(m) ( |m |) |m | y= ------- ln( 1+ u|--|) |--| =< 1 ln(1+u) ( |mp|) |mp| A-law is | A (m ) |m | 1 | ------- (--) |--| =< - | 1+ln A (mp) |mp| A y=| | sgn(m) ( |m |) 1 |m | | ------ ( 1+ ln A|--|) - =< |--| =< 1 | 1+ln A ( |mp|) A |mp| Values of u=100 and 255, A=87.6, mp is the Peak message value, m is the current quantised message value. (The formulae get simpler if you substitute x for m/mp and sgn(x) for sgn(m); then -1 <= x <= 1.) Converting from u-law to A-law is in a sense "lossy" since there are quantizing errors introduced in the conversion. "..the u-LAW used in North America and Japan, and the A-law used in Europe and the rest of the world and international routes.." References: Modern Digital and Analog Communication Systems, B.P.Lathi., 2nd ed. ISBN 0-03-027933-X Transmission Systems for Communications Fifth Edition by Members of the Technical Staff at Bell Telephone Laboratories Bell Telephone Laboratories, Incorporated Copyright 1959, 1964, 1970, 1982 A note on the resolution of u-law by Frank Klemm : 8 bit u-law has the same lowest magnitude like 12 bit linear and 12 bit u-law like 16 linear. Device/Coding Resolution Resolution on maximal level on low level 8 bit linear 8 8 8 bit ulaw 6 12 (used for digital telephone) 12 bit linear 12 12 12 bit ulaw 10 16 (used in DAT/Longplay) 16 bit linear 16 16 estimated for some analoge technique: tape recorder (HiFi DIN) 8 9 (no Problem today) tape recorder (semiprofessional) 10.5 13.5 11.8. AVR File Format. AVR File Format. From: hyc@hanauma.Jpl.Nasa.Gov (Howard Chu) A lot of PD software exists to play Mac .snd files on the ST. One other format that seems pretty popular (used by a number of commercial packages) is the AVR format (from Audio Visual Research). This format has a 128 byte header that looks like this: char magic[4]="2BIT"; char name[8]; /* null-padded sample name */ short mono; /* 0 = mono, 0xffff = stereo */ short rez; /* 8 = 8 bit, 16 = 16 bit */ short sign; /* 0 = unsigned, 0xffff = signed */ short loop; /* 0 = no loop, 0xffff = looping sample */ short midi; /* 0xffff = no MIDI note assigned, 0xffXX = single key note assignment 0xLLHH = key split, low/hi note */ long rate; /* sample frequency in hertz */ long size; /* sample length in bytes or words (see rez) */ long lbeg; /* offset to start of loop in bytes or words. set to zero if unused. */ long lend; /* offset to end of loop in bytes or words. set to sample length if unused. */ short res1; /* Reserved, MIDI keyboard split */ short res2; /* Reserved, sample compression */ short res3; /* Reserved */ char ext[20]; /* Additional filename space, used if (name[7] != 0) */ char user[64]; /* User defined. Typically ASCII message. */ 11.9. The Amiga MOD Format. From: norlin@mailhost.ecn.uoknor.edu (Norman Lin) MOD files are music files containing 2 parts: (1) a bank of digitized samples (2) sequencing information describing how and when to play the samples MOD files originated on the Amiga, but because of their flexibility and the extremely large number of MOD files available, MOD players are now available for a variety of machines (IBM PC, Mac, Sparc Station, etc.) The samples in a MOD file are raw, 8 bit, signed, headerless, linear digital data. There may be up to 31 distinct samples in a MOD file, each with a length of up to 128K (though most are much smaller; say, 10K - 60K). An older MOD format only allowed for up to 15 samples in a MOD file; you don't see many of these anymore. There is no standard sampling rate for these samples. [But see below.] The sequencing information in a MOD file contains 4 tracks of information describing which, when, for how long, and at what frequency samples should be played. This means that a MOD file can have up to 31 distinct (digitized) instrument sounds, with up to 4 playing simultaneously at any given point. This allows a wide variety of orchestrational possibilities, including use of voice samples or creation of one's own instruments (with appropriate sampling hardware/software). The ability to use one's own samples as instruments is a flexibility that other music files/formats do not share, and is one of the reasons MOD files are so popular, numerous, and diverse. 15 instrument MODs, as noted above, are somewhat older than 31 instrument MODs and are not (at least not by me) seen very often anymore. Their format is identical to that of 31 instrument MODs except: (1) Since there are only 15 samples, the information for the last (15th) sample starts at byte 440 and goes through byte 469. (2) The songlength is at byte 470 (contrast with byte 950 in 31 instrument MOD) (3) Byte 471 appears to be ignored, but has been observed to be 127. (Sorry, this is from observation only) (4) Byte 472 begins the pattern sequence table (contrast with byte 952 in a 31 instrument MOD) (5) Patterns start at byte 600 (contrast with byte 1084 in 31 instrument MOD) "ProTracker," an Amiga MOD file creator/editor, is available for ftp everywhere as pt??.lzh. From: Apollo Wong From: M.J.H.Cox@bradford.ac.uk (Mark Cox) Newsgroups: alt.sb.programmer Subject: Re: Format for MOD files... Message-ID: <1992Mar18.103608.4061@bradford.ac.uk> Date: 18 Mar 92 10:36:08 GMT Organization: University of Bradford, UK wdc50@DUTS.ccc.amdahl.com (Winthrop D Chan) writes: >I'd like to know if anyone has a reference document on the format of the >Amiga Sound/NoiseTracker (MOD) files. The author of Modplay said he was going >to release such a document sometime last year, but he never did. If anyone I found this one, which covers it better than I can explain it - if you use this in conjunction with the documentation that comes with Norman Lin's Modedit program it should pretty much cover it. Mark J Cox /*********************************************************************** Protracker 1.1B Song/Module Format: ----------------------------------- Offset Bytes Description ------ ----- ----------- 0 20 Songname. Remember to put trailing null bytes at the end... Information for sample 1-31: Offset Bytes Description ------ ----- ----------- 20 22 Samplename for sample 1. Pad with null bytes. 42 2 Samplelength for sample 1. Stored as number of words. Multiply by two to get real sample length in bytes. 44 1 Lower four bits are the finetune value, stored as a signed four bit number. The upper four bits are not used, and should be set to zero. Value: Finetune: 0 0 1 +1 2 +2 3 +3 4 +4 5 +5 6 +6 7 +7 8 -8 9 -7 A -6 B -5 C -4 D -3 E -2 F -1 45 1 Volume for sample 1. Range is $00-$40, or 0-64 decimal. 46 2 Repeat point for sample 1. Stored as number of words offset from start of sample. Multiply by two to get offset in bytes. 48 2 Repeat Length for sample 1. Stored as number of words in loop. Multiply by two to get replen in bytes. Information for the next 30 samples starts here. It's just like the info for sample 1. Offset Bytes Description ------ ----- ----------- 50 30 Sample 2... 80 30 Sample 3... . . . 890 30 Sample 30... 920 30 Sample 31... Offset Bytes Description ------ ----- ----------- 950 1 Songlength. Range is 1-128. 951 1 Well... this little byte here is set to 127, so that old trackers will search through all patterns when loading. Noisetracker uses this byte for restart, but we don't. 952 128 Song positions 0-127. Each hold a number from 0-63 that tells the tracker what pattern to play at that position. 1080 4 The four letters "M.K." - This is something Mahoney & Kaktus inserted when they increased the number of samples from 15 to 31. If it's not there, the module/song uses 15 samples or the text has been removed to make the module harder to rip. Startrekker puts "FLT4" or "FLT8" there instead. Offset Bytes Description ------ ----- ----------- 1084 1024 Data for pattern 00. . . . xxxx Number of patterns stored is equal to the highest patternnumber in the song position table (at offset 952-1079). Each note is stored as 4 bytes, and all four notes at each position in the pattern are stored after each other. 00 - chan1 chan2 chan3 chan4 01 - chan1 chan2 chan3 chan4 02 - chan1 chan2 chan3 chan4 etc. Info for each note: _____byte 1_____ byte2_ _____byte 3_____ byte4_ / \ / \ / \ / \ 0000 0000-00000000 0000 0000-00000000 Upper four 12 bits for Lower four Effect command. bits of sam- note period. bits of sam- ple number. ple number. Periodtable for Tuning 0, Normal C-1 to B-1 : 856,808,762,720,678,640,604,570,538,508,480,453 C-2 to B-2 : 428,404,381,360,339,320,302,285,269,254,240,226 C-3 to B-3 : 214,202,190,180,170,160,151,143,135,127,120,113 To determine what note to show, scan through the table until you find the same period as the one stored in byte 1-2. Use the index to look up in a notenames table. This is the data stored in a normal song. A packed song starts with the four letters "PACK", but i don't know how the song is packed: You can get the source code for the cruncher/decruncher from us if you need it, but I don't understand it; I've just ripped it from another tracker... In a module, all the samples are stored right after the patterndata. To determine where a sample starts and stops, you use the sampleinfo structures in the beginning of the file (from offset 20). Take a look at the mt_init routine in the playroutine, and you'll see just how it is done. Lars "ZAP" Hamre/Amiga Freelancers ***********************************************************************/ -- Mark J Cox ----- Bradford, UK --- PS: A file with even *much* more info on MOD files, compiled by Lars Hamre, is available from ftp.cwi.nl:/pub/audio/MOD-info. Enjoy! 11.10. The Sample Vision Format. The Sample Vision Format. From: "tim.dorcas@enest.com" First, Sample Vision is a program used by professional musicians to send and receive samples via a MIDI interface to the PC. While on the PC, you can edit several parameters including loop points, pitch, time compression, normalize, sample rate, ect. The list of supported samplers include: AKAI {S700,X700,S900, S950,S612,S1000/1100}, Casio{FZ1,FZ10M,FZ20M}, Ensoniq{EPS,EPS16,ASR10,Mirage}, Emu{Emax,EmaxII}, Korg{DSS1,DSM1,T workstation}, Oberheim DPX-1, Peavey DPM-3, Roland {S10,MKS100,S220,S50,S330,S550}, Sequential Circuits Prophet 2000/2002, Sample Dump Standard devices, Yamaha TX16W. The .smp format breaks down like this: Offset Size Description 000 18 'SOUND SAMPLE DATA ' ASCII FILE ID 0018 04 '2.1 ' ASCII FILE VERSION 0022 60 USER COMMENTS 60 ASCII CHARACTERS 0082 30 SAMPLE NAME LEFT JUSTIFIED 30 ASCII CHARACTERS 0112 04 SAMPLE SIZE SAMPLE DATA COUNT IN WORDS 0116 ?? SAMPLE DATA 1 WORD PER SAMPLE, LEAST SIGNIFICANT BYTE FIRST, LSW FIRST; SIGNED 16 BIT INTEGERS ?? 02(DW) RESERVED ?? 04(DD) LOOP 1 START USE SAMPLE COUNT NOT BYTE COUNT ?? 04(DD) LOOP 1 END ?? 01(DB) LOOP 1 TYPE 0=LOOP OFF,1=FORWARD,2=FORWARD/BACKWARD ?? 02(DW) LOOP 1 COUNT TIMES TO EXECUTE LOOP BEFORE NEXT LOOP THERE ARE SEVEN MORE IDENTICAL LOOP STRUCTURES FOR A TOTAL OF 8 ?? 10 MARKER 1 NAME ASCII MARKER NAME ?? 04(DD) MARKER 1 POSITION FFFF MEANS UNUSED THER ARE SEVEN MORE IDENTICAL MARKER STRUCTURES FOR A TOTAL OF 8 ?? 01(DB) MIDI UNITY PLAYBACK NOTE MIDI NOTE TO PLAY THE SAMPLE AT ITS ORIGINAL PITCH ?? 04(DD) SAMPLE RATE IN HERTZ ?? 04(DD) SMPTE OFFSET IN SUBFRAMES ?? 04(DD) CYCLE SIZE SAMPLE COUNT IN ONE CYCLE OF THE SAMPLED SOUND. -1 IF UNKNOWN (DD) 4 BYTES, LS BYTE FIRST, LS WORD FIRST (DW) 2 BYTES, LS BYTE FIRST (DB) 1 BYTE That's about it. One thing I have noticed is that Sample Vision only writes seven loop structures to file as opposed to the eight structures it claims are written. 11.11. Tandy Deskmate .snd Format Notes. From: Jeffrey L. Hayes Tandy .snd files are created by Sound.pdm, a program that came with the proprietary DeskMate environment. They are used by Music.pdm to create music modules (.sng files). DeskMate Sound and Music require the Tandy sound chip. There is a program to convert RIFF WAVE and other 8-bit PCM formats to .snd, Conv2snd, by Kenneth Udut. Conv2snd v.2.00 comes with Snd2wav, which converts .snd to RIFF WAVE. There are two types of DeskMate .snd files, sound files and instrument files. Both contain 8-bit unsigned PCM samples. Sound files are simpler. These are garden-variety sample files with a fixed-length header giving the name of the sound, the recording frequency, and the length of the sound. Sound files may be recorded at 5500Hz, 11kHz or 22kHz. Instrument files contain samples as well as frequency and looping information used by Music.pdm to represent an instrument. Instrument files provide for attack, sustain, and decay with several samples having different implied frequencies and being used by Music.pdm to represent the instrument in different pitch ranges. Up to 16 different notes (with 16 different samples) can be contained in one instrument file. Instrument files are always recorded at 11kHz. Both sound files and instrument files may be compressed in one of two ways, "music" compression or "speech" compression, or they may be uncompressed. I don't know the compression algorithms, but simple file comparison reveals that "music" and "speech" compression are almost identical. The DeskMate .snd file header consists of 16 bytes of fixed header information followed by one or more 28-byte note records. The sample information, which may be compressed, follows the header. 11.11.1. DeskMate .snd File Format - Fixed Header. DeskMate .snd File Format - Fixed Header. offset size what ------ ---- ---- 0 byte 1Ah (.snd ID byte) 1 byte Compression code: 0 = no compression; 1 = music compression; 2 = sound compression. 2 byte Number of notes in the instrument file. 1 if sound file. 3 byte Instrument number. 0 if sound file; 0FFh if instrument file with no number set. Valid instrument numbers in an instrument file are 1 to 32. Use this field to distinguish a sound file from an instrument file. 4 10 bytes Sound or instrument name. Filled on the right with nulls if less than 10 characters. 0Eh word Sampling rate in samples per second. Note that although a sampling rate other than 5500, 11000 and 22000 can be entered here, Sound.pdm will not actually play at other rates. 10h variable Note records begin, 28 bytes each. Number of records given in byte 2 above. 11.11.2. DeskMate .snd File Format - Note Record. DeskMate .snd File Format - Note Record. 0 byte Pitch of the note: 1 = A1 in American Standard Pitch; 2 = A#1; etc. A1 is lowest note allowed; highest note allowed is B6 (3Fh). Sound files have 0FFh here; so do instrument files with no note set. Note that Sound.pdm does not designate notes in the standard manner to the user. Although A1 and B6 in Sound.pdm are the same as A1 and B6 in standard pitch, Sound.pdm starts octaves at A rather than at C (as is standard). Thus, middle C, C4 in standard pitch, is C3 in Sound.pdm. 1 byte Sound files, and instrument files with no pitch set, have 0 here. If the pitch is set, this byte is 0FFh. 2 2 bytes Range of the note, first byte is lower limit, second is higher limit. Byte encoding as for offset 0 (i.e., 01h to 3Fh). Sound files have FF FF here; so do instrument files with no range set. 4 dword Offset in the file where samples for this note begin (zero-relative), after compression if that was done. 8 dword If compressed, the length of the compressed data in the file for this note. Uncompressed files have 0 here. 0Ch 4 bytes Unknown. Set to zero. 10h dword Number of samples in the note, after decompression if necessary. 14h dword Number of sample at start of sustain region for the note, relative to the first (zeroth) sample of the note. For sound files, or if sustain is not set, this field is 0. 18h dword Number of sample at end of sustain region for the note, relative to the first (zeroth) sample of the note. For sound files, or if sustain is not set, this field is 0. 11.11.3. New Tandy .Snd File Format. This is the new .snd file format used on the 2500-series. From information provided by John Ball (john.ball@two-t.com). Like the old format, the new format header consists of a fixed part followed by one or more sample descriptors. The fixed part is 114 bytes; the sample descriptors are 46 bytes each. Samples are still 8-bit unsigned PCM. Fixed header: offset size what 0 10 bytes ASCIIZ name of sound. 0Ah 34 bytes unknown 2Ch 2 bytes New .snd ID: 1Ah 80h. 2Eh word Number of samples in file. 30h word Sound (instrument) number. 32h 16 bytes unknown 42h word Compression code (0 = no compression, 1 = music compression, 2 = speech compression). 44h 20 bytes unknown 58h word Sampling rate in Hz. 5Ah 24 bytes unknown 72h variable Sample descriptors begin. Sample descriptors (number given by word at 2Eh above): offset size what 0 dword Link to next sample descriptor (offset in file of next sample descriptor record). 0 if last. 4 2 bytes unknown 6 byte Pitch of note (01h-3Fh), 01 = A1 in American Standard Pitch; 0FFh if not set. 7 byte unknown (compare old .Snd format; value is 00 or FF, but seemingly unrelated to pitch setting) 8 2 bytes Range of note. First byte is lower limit, second is higher limit. Values as for byte at offset 6 above; FF FFh if not set. 0Ah dword Offset in file of start of sound data for this sample. 0Eh dword Length of sample sound data in bytes. 12h dword Uncompressed length of sound data (number of samples). 16h 24 bytes unknown ------------------------------------------------------------------------ 11.12. Miscellaneous Formats. Some Miscellaneous Formats. From: bil@ccrma.Stanford.EDU (Bill Schottstaedt) I thought you might find some of this information amusing -- a few header formats I didn't find in your great audio file formats documentation. Some taken from the AFsp sources, or sox, or local ancient documentation. I also have short descriptions of BICSF, NeXT/Sun, AIFF, RIFF, SMP, VOC, and so on, plus full descriptions of the 2 Sound Designer formats, if you're interested. /* ------------------------------------ NIST --------------------------------- * * 0: "NIST_1A" * 8: data_location as ASCII representation of integer * (apparently always " 1024") * 16: start of complicated header -- full details available upon request * * here's an example: * * NIST_1A * 1024 * database_id -s5 TIMIT * database_version -s3 1.0 * utterance_id -s8 aks0_sa1 * channel_count -i 1 * sample_count -i 63488 * sample_rate -i 16000 * sample_min -i -6967 * sample_max -i 7710 * sample_n_bytes -i 2 * sample_byte_format -s2 01 * sample_sig_bits -i 16 * end_head */ /* ------------------------------------ SNDT --------------------------------- * * this taken from sndrtool.c (sox-10): * 0: "SOUND" * 6: 0x1a * 8-11: 0 * 12-15: nsamples * 16-19: 0 * 20-23: nsamples * 24-25: srate * 26-27: 0 * 28-29: 10 * 30-31: 4 * 32-> : "- File created by Sound Exchange" * .->95: 0 */ /* ------------------------------------ ESPS --------------------------------- * * 16: 0x00006a1a or 0x1a6a0000 * 136: if not 0, chans + format = 32-bit float * 144: if not 0, chans + format = 16-bit linear * * from AFgetInfoES.c: * * Bytes Type Contents * 8 -> 11 -- Header size (bytes) * 12 -> 15 int Sampled data record size * 16 -> 19 int File identifier * 40 -> 65 char File creation date * 124 -> 127 int Number of samples (may indicate zero) * 132 -> 135 int Number of doubles in a data record * 136 -> 139 int Number of floats in a data record * 140 -> 143 int Number of longs in a data record * 144 -> 147 int Number of shorts in a data record * 148 -> 151 int Number of chars in a data record * 160 -> 167 char User name * 333 -> H-1 -- Generic header items, including "record_freq" * {followed by a "double8"} * H -> ... -- Audio data */ /* ------------------------------------ INRS --------------------------------- * * from AFgetInfoIN.c: * * INRS-Telecommunications audio file: * Bytes Type Contents * 0 -> 3 float Sampling Frequency (VAX float format) * 6 -> 25 char Creation time (e.g. Jun 12 16:52:50 1990) * 26 -> 29 int Number of speech samples in the file * The data in an INRS-Telecommunications audio file is in 16-bit integer * format. * */ /* old Mus10, SAM formats, just for completeness * * These were used for sound data on the PDP-10s at SAIL and CCRMA in the * 70's and 80's. * The word length was 36-bits. * * "New" format as used by nearly all CCRMA software pre-1990: * * WD 0 - '525252525252 * WD 1 - Clock rate in Hz (PDP-10 36-bit floating point) * WD 2 - #samples per word,,pack-code * (has # samples per word in LH, pack-code in RH) * 0 for 12-bit fixed point * 1 for 18-bit fixed point * 2 for 9-bit floating point incremental * 3 for 36-bit floating point * 4 for 16-bit sambox fixed point, right justified * 5 for 20-bit sambox fixed point * 6 for 20-bit right-adjusted fixed point (sambox SAT format) * 7 for 16-bit fixed point, left justified * N>9 for N bit bytes in ILDB format * WD 3 - # channels * 1 for MONO * 2 for STEREO * 4 for QUAD * WD 4 - Maximum amplitude (if known) * is a floating point number * is zero if not known * is maximum magnitude (abs value) of signal * WD 5 number of Sambox ticks per pass * (inverse of Sambox clock rate, sort of) * WD 6 - Total #samples in file. * If 0 then #wds_in_file*#samps_per_wd assumed. * WD 7 - Block size (if any). 0 means sound is not blocked. * WDs '10-'77 Reserved for EDSND usage * WDs '100-'177 Text description of file (in ASCIZ format) * * * "Old" format * * WD 0 - '525252525252 * WD 1 - Clock rate * has code in LH, actual INTEGER rate in RH * code=0 for 6.4Kc (or anything else) * =1 for 12.8Kc, =2 for 25.6Kc, =3 for 51.2Kc * =5 for 102.4Kc, =6 for 204.8Kc * WD 2 - pack * 0 for 12 bit * 1 for 16 bit (18 bit) * 2 for 9 bit floating point incremental * 3 for 36-bit floating point * N>9 for N bit bytes in ILDB format * has # samples per word in LH. * WD 3 - # channels * 1 for MONO * 2 for STEREO * 4 for QUAD * WD 4 - Maximum amplitude (if known) * is a floating point number * is zero if not known * is maximum magnitude (abs value) of signal * WDs 5-77 Reserved for future expansion * WDs 100-177 Text description of file (in ASCIZ format) */