ADX (file format): Difference between revisions

Content deleted Content added
AndreR908 (talk | contribs)
Minor correction
There aren't any references on the page.
Tag: section blanking
 
(144 intermediate revisions by 86 users not shown)
Line 1:
{{Short description|File format family developed by CRI Middleware}}
{{Infobox Software |
{{More citations needed|date=May 2025}}
| name = CRI ADX
{{Infobox software
| logo = [[Image:ADX logo.png|140px]]
|name=CRI ADX
| developer = [[CRI Middleware]]
|logo=ADX logo.png
| operating_system = [[Cross-platform]]
|developer=[[CRI Middleware]]
| genre = [[Codec]]
|released=1996
| license = [[Proprietary]]
|platform=[[Cross-platform]]
| website = [http://www.cri-mw.co.jp/ CRI Middleware]
|genre=[[Codec]] / [[File format]]
|license=[[Proprietary software|Proprietary]]
|website={{URL|http://www.cri-mw.com/}}
}}
'''CRI ADX''' is a [[Proprietary software|proprietary]] audio container and compression format developed by [[CRI Middleware]] specifically for use in [[video games]]; it is derived from [[ADPCM]] but with [[lossy]] compression. Its most notable feature is a looping function that has proved useful for background sounds in various games that have adopted the format, including many games for the [[Sega Dreamcast]] as well as some [[PlayStation 2]], [[GameCube]] and [[Wii]] games. One of the first games to use ADX was ''[[Burning Rangers]]'', on the [[Sega Saturn]]. Notably, the [[Sonic the Hedgehog|''Sonic the Hedgehog'' series]] since the Dreamcast generation and the majority of [[Sega]] games for home video consoles and PCs since the Dreamcast continue to use this format for sound and voice recordings.
The ADX toolkit also includes a sibling format, AHX, which uses a variant of [[MPEG-2]] audio intended specifically for voice recordings and a packaging archive, AFS, for bundling multiple CRI ADX and AHX tracks into a single container file.
 
Version 2 of the format (ADX2) uses the HCA and HCA-MX extension, which are usually bundled into a container file with the extensions ACB and AWB.
ADX is a lossy [[proprietary]] audio storage and compression format developed by [[CRI Middleware]] specifically for use in [[video games]]. The format is similar in principal to [[ADPCM]] but offers smaller storage sizes. The sound quality is quite impressive given the extremely small sample size used by the format. The format also provides a looping feature that has proved useful for background music in various games that have adopted the format, such as the [[Dreamcast]] and later generation [[Sonic the Hedgehog]] games from [[SEGA]].
The AWB extension is not to be confused with the [[Adaptive Multi-Rate Wideband|Audio format with the same extension]] and mostly contains the binary data for the HCA files.
 
==General File Format overview==
CRI ADX is a lossy audio format, but unlike other formats like [[MP3]], it does not apply a [[psychoacoustics|psychoacoustic model]] to the sound to reduce its complexity. The ADPCM model instead stores samples by recording the ''error'' relative to a prediction function which means more of the original signal survives the encoding process; trading accuracy of the representation for size by using small sample sizes, usually 4 bits. The human auditory system's tolerance for the noise this causes makes the loss of accuracy barely noticeable.
The ADX format's specification is not freely available, however the internal structure of the most significant elements that make up the format have been described in various places on the internet. The information given here may be incomplete but is sufficient to build a working [[codec]] or [[transcoder]].
 
Like other encoding formats, CRI ADX supports up to 96000 Hz frequencies, however, the output sample depth is locked at 16 bits, generally due to the lack of precision through the use of small sample sizes. It supports multiple channels but there seems to be an implicit limitation of stereo (2 channel) audio although the file format itself can represent up to 255 channels. The only particularly distinctive feature that sets CRI ADX apart from other ADPCM formats is the integrated looping functionality, enabling an audio player to optionally skip backwards after reaching a single specified point in the track to create a coherent loop; hypothetically, this functionality could be used to skip forwards as well but that would be redundant since the audio could simply be clipped with an editing program instead.
The format is inherently [[big-endian]] even when used on [[little-endian]] architectures such as the [[Xbox | original Xbox]] or [[x86]] computer. The basic structure is outlined below:
 
{| class="wikitable" <!-- File layout outline -->
For playback aside from CRI Middleware's in-house software, there are a few plugins for WinAmp and also WAV conversion tools. [[FFmpeg]] also has CRI ADX support implemented, but its decoder is hard coded so can only properly decode 44100&nbsp;Hz ADXs.
 
==Technical description==
The CRI ADX specification is not freely available, however the most important elements of the structure have been reverse engineered and documented in various places on the web. As a side note, the AFS archive files that CRI ADXs are sometimes packed in are a simple variant of a [[tar (file format)|tarball]] which uses numerical indices to identify the contents rather than names.
 
The ADX disk format is defined in [[big-endian]]. The identified sections of the main header are outlined below:
{| class="wikitable"
|+ File header layout outline
!
!0
Line 27 ⟶ 41:
!8
!9
!10A
!11B
!12C
!13D
!14E
!15F
|-
|!0x0
|0x80
|0x00
|0
|colspan="2"|Copyright Offset
|Encoding Type
|colspan="3"|Unknown
|Block Size
|Sample Bitdepth
|Channel Count
|colspan="4"|Sample Rate
|colspan="4"|Total Samples
|-
|!0x10
|colspan="42"|VersionHighpass MarkFrequency
|Version
|colspan="4"|Unknown
|Flags
|colspan="2"|Loop Alignment Samples (v3)
|colspan="2"|Loop Enabled (v3)
|colspan="4"|Loop Enabled (v3)
|colspan="4"|Loop beginBegin sampleSample indexIndex (v3)
|-
|!0x20
|colspan="4"|Loop beginBegin byteByte indexIndex (v3)
|colspan="4"|Loop Enabled (v4)
|colspan="4"|Loop beginEnd sampleSample indexIndex (v4v3)
|colspan="4"|Loop Begin Sample Index (v4)
End index (v3)
|colspan="4"|Loop beginEnd byteByte indexIndex (v4v3)
|colspan="4"|Loop Begin Byte Index (v4)
End byte index (v3)
|-
|!0x30
|colspan="4"|Loop endEnd sampleSample indexIndex (v4)
|colspan="4"|Loop endEnd byteByte indexIndex (v4)
|colspan="8"|UnknownZero or more bytes empty space
|-
!???
|0x40
|colspan="16"|[CopyrightOffset - 2] ASCII (unterminated) string: "(c)CRI"
|colspan="16"|...
|-
!...
|???
|colspan="16"|[CopyrightOffset -+ 24] -&gt;Audio ASCIIdata String:starts "(c)CRI"here
|}
Fields labelled "Unknown" contain either unknown data or are apparently just reserved (i.e. filled with null bytes). Fields labelled with 'v3' or 'v4' but not both are considered "Unknown" in the version they are not marked with. This header may be as short as 20 bytes ({{Mono|0x14}}), as determined by the copyright offset, which implicitly removes support for a loop since those fields are not present.
 
The "Encoding Type" field should contain one of:
* '''0x02''' for CRI ADX with pre-set prediction coefficients
* '''0x03''' for Standard CRI ADX
* '''0x04''' for CRI ADX with an exponential scale
* '''0x10''' or '''0x11''' for AHX
The "Version" field should contain one of:
* '''0x03''' for CRI ADX 'version 3'
* '''0x04''' for CRI ADX 'version 4'
* '''0x05''' for a variant of CRI ADX 4 without looping support
When decoding AHX audio, the version field does not appear to have any meaning and can be safely ignored.
 
Files with encoding type '2' use 4 possible sets of prediction coefficients as listed below:
{| class="wikitable"
|+ Coefficients
!
! Coefficient 0
! Coefficient 1
|-
| Set 0
|...
| {{Mono|0x0000}}
|colspan="16"|[CopyrightOffset + 4] -&gt; Audio Data
| {{Mono|0x0000}}
|-
| Set 1
| {{Mono|0x0F00}}
| {{Mono|0x0000}}
|-
| Set 2
| {{Mono|0x1CC0}}
| {{Mono|0xF300}}
|-
| Set 3
| {{Mono|0x1880}}
| {{Mono|0xF240}}
|}
The version mark field should be equal to 01F40400<sub>16</sub> ([[Hexadecimal]]) for 'version' 4, or 01F40300<sub>16</sub> for 'version' 3. Fields labeled unknown contain unknown data or otherwise appear to be reserved (ie. filled with null bytes). Fields labeled with v3 or v4 but not both are 'unknown' in the other version they aren't marked with.
 
== =Sample Format format===
CRI ADX encoded audio data itself is broken into a series of consecutive 'blocks', ofeach 18 bytes. Each block containscontaining data for only one channel. only,The theyblocks are then laid out in 'frames', which consist of one block forfrom eachevery channel makesin upascending aorder. frameFor example, in ascendinga order.stereo ie.(2 channel) stream this would consist of Frame 1: left channel block, right channel block,; Frame 2: left, right,; LRLRLR..etc. TheBlocks layoutare usually always 18 bytes in size containing 4-bit samples though other sizes are technically possible, an example of such a block itself looks like this:
{| class="wikitable"
|+ Audio block layout table
!0
!1
Line 95 ⟶ 147:
!17
|-
|colspan="2"|Predictor/Scale
|colspan="16"|32 4bit4-bit samples
|}
Be aware that the scale is a 16bit [[Signedness | unsigned]] [[big-endian]] integer.
 
The predictor index is a 3-bit integer that specifies which prediction coefficient set should be used to decode that block, while the scale is a 13-bit [[Signedness|unsigned]] integer ([[big-endian]] like the header) which is essentially the amplification of all the samples in that block. Each sample in the block must be decoded in bit-stream order, in descending order. For example, when the sample size is 4 bits:
=== Decoding Samples ===
As noted above, each sample consists of 4bits, the high 4bits of a [[byte]] (more specifically, [[octet_(computing) | octet]]) are the first sample with the low 4bits being the second.
{| class="wikitable"
|+ Sample Byte layout table
!7
!6
Line 116 ⟶ 167:
|}
 
The samples themselves are presented not in reverse. Each sample is signed so for this example, the value can range between −8 and +7 (which will be multiplied by the scale during decoding). Although any bit-depth between 1 and 255 is made possible by the header, it is unlikely that one bit samples would ever occur as they can only represent the values {0, 1}, {−1, 0} or {−1, 1}, all of which are not particularly useful for encoding music.
The sample decoding method is reasonably straightforward (Demonstrated in [[C99]]):
 
int_fast32_t sample;
===CRI ADX decoding===
uint_least8_t sample_4bit;
An encoder for ADX can also be built by simply flipping the code to run in reverse. The code samples are written using [[C99]].
 
Before a 'standard' CRI ADX can be either encoded or decoded, the set of prediction coefficients must be calculated. This is generally best done in the initialisation stage:
<syntaxhighlight lang="c">
#define M_PI acos(-1.0)
double a, b, c;
a = sqrt(2.0) - cos(2.0 * M_PI * ((double)adx_header->highpass_frequency / adx_header->sample_rate));
b = sqrt(2.0) - 1.0;
c = (a - sqrt((a + b) * (a - b))) / b; // (a+b)*(a-b) = a*a-b*b, however the simpler formula loses accuracy in floating point
// double coefficient[2];
/* ... Get 4 bit sample ... */
coefficient[0] = c * 2.0;
data_index = audio_data_start + (sample_index / 32) * num_channels * 18 + (current_channel - 1) * 18 + 2 + sample_index % 32 / 2;
coefficient[1] = -(c * c);
sample_4bit = raw_data[data_index];
</syntaxhighlight>
if (sample_index % 2) /* If the sample index [starting at 0] is odd then we are decoding a second sample */
This code calculates prediction coefficients for predicting the current sample from the 2 previous samples. Once it knows the decoding coefficients, it can start decoding the stream:
sample_4bit &= 0x0F;
<syntaxhighlight lang="c">
else /* Otherwise it is a primary sample */
static int32_t* past_samples; // Previously decoded samples from each channel, zeroed at start (size = 2*channel_count)
sample_4bit &gt;&gt;= 4;
static uint_fast32_t sample_index = 0; // sample_index is the index of sample set that needs to be decoded next
static ADX_header* adx_header;
// buffer is where the decoded samples will be put
/* ... Decode 4 bit sample ... */
// samples_needed states how many sample 'sets' (one sample from every channel) need to be decoded to fill the buffer
sample = sample_4bit;
// looping_enabled is a boolean flag to control use of the built-in loop
if (sample_4bit & 8) sample -= 16; /* Check the 4th bit (the sign), if negative then adjust for larger variable */
// Returns the number of sample 'sets' in the buffer that could not be filled (EOS)
unsigned decode_adx_standard( int16_t* buffer, unsigned samples_needed, bool looping_enabled )
{
unsigned const samples_per_block = (adx_header->block_size - 2) * 8 / adx_header->sample_bitdepth;
int16_t scale[ adx_header->channel_count ];
if (looping_enabled && !adx_header->loop_enabled)
sample *= block_scale * volume; /* Scale up the sample and amplify */
looping_enabled = false;
sample += previous_sample * 0x7298; /* Incorporate previous sample data */
sample -= second_previous_sample * 0x3350; /* Incorporate previous previous sample data */
sample &gt;&gt;= 14; /* Downshift the sample by 14 bits */
if (sample > 32767) /* Round-off the sample within the valid range for a 16bit signed sample */
sample = 32767;
else if (sample < -32768)
sample = -32768;
// Loop until the requested number of samples are decoded, or the end of file is reached
second_previous_sample = previous_sample; /* Update the previous samples for the current channel */
while (samples_needed > 0 && sample_index < adx_header->total_samples)
previous_sample = sample;
{
// Calculate the number of samples that are left to be decoded in the current block
unsigned sample_offset = sample_index % samples_per_block;
unsigned samples_can_get = samples_per_block - sample_offset;
 
// Clamp the samples we can get during this run if they won't fit in the buffer
The accquistion of the desired byte seems more complex then it really is, the long calculation can actually be more easily performed using a byte counter that is incremented every 'audio frame' processed but for complete clarity, the entire calculation is shown. It is assumed the entire file has been either [[mmap]]-ed or else read into memory, however progressive reading [ie. [[Streaming media | streaming]]] of the file is entirely possible as well. We start at the begining of the audio section of the ADX then move the pointer to the 'audio frame' currently being processed, the calculation finds the number of frames already processed then, of course, there is a block for each channel in each frame with each being 18 bytes that must be skipped over until we reach the frame we want. Once that is done it is then necessary to move to the block that belongs to the channel that is currently being processed in the frame (ie. if we are reading the right channel, we must skip over the left channel block), we then skip over the 2 byte scale value at the start of the block before proceeding to the byte we want within it. The division by 2 in the last calculation is done because it is assumed that the array contains 8bit [unsigned] bytes, with each holding 2 samples.
if (samples_can_get > samples_needed)
samples_can_get = samples_needed;
// Clamp the number of samples to be acquired if the stream isn't long enough or the loop trigger is nearby
if (looping_enabled && sample_index + samples_can_get > adx_header->loop_end_index)
samples_can_get = adx_header->loop_end_index - sample_index;
else if (sample_index + samples_can_get > adx_header->total_samples)
samples_can_get = adx_header->total_samples - sample_index;
// Calculate the bit address of the start of the frame that sample_index resides in and record that ___location
unsigned long started_at = (adx_header->copyright_offset + 4 + \
sample_index / samples_per_block * adx_header->block_size * adx_header->channel_count) * 8;
// Read the scale values from the start of each block in this frame
for (unsigned i = 0 ; i < adx_header->channel_count ; ++i)
{
bitstream_seek( started_at + adx_header->block_size * i * 8 );
scale[i] = ntohs( bitstream_read( 16 ) );
}
// Pre-calculate the stop value for sample_offset
unsigned sample_endoffset = sample_offset + samples_can_get;
// Save the bitstream address of the first sample immediately after the scale in the first block of the frame
started_at += 16;
while ( sample_offset < sample_endoffset )
{
for (unsigned i = 0 ; i < adx_header->channel_count ; ++i)
{
// Predict the next sample
double sample_prediction = coefficient[0] * past_samples[i*2 + 0] + coefficient[1] * past_samples[i*2 + 1];
// Seek to the sample offset, read and sign extend it to a 32bit integer
// Implementing sign extension is left as an exercise for the reader
// The sign extension will also need to include a endian adjustment if there are more than 8 bits
bitstream_seek( started_at + adx_header->sample_bitdepth * sample_offset + \
adx_header->block_size * 8 * i );
int_fast32_t sample_error = bitstream_read( adx_header->sample_bitdepth );
sample_error = sign_extend( sample_error, adx_header->sample_bitdepth );
// Scale the error correction value
sample_error *= scale[i];
// Calculate the sample by combining the prediction with the error correction
int_fast32_t sample = sample_error + (int_fast32_t)sample_prediction;
// Update the past samples with the newer sample
past_samples[i*2 + 1] = past_samples[i*2 + 0];
past_samples[i*2 + 0] = sample;
// Clamp the decoded sample to the valid range for a 16bit integer
if (sample > 32767)
sample = 32767;
else if (sample < -32768)
sample = -32768;
// Save the sample to the buffer then advance one place
*buffer++ = sample;
}
++sample_offset; // We've decoded one sample from every block, advance block offset by 1
++sample_index; // This also means we're one sample further into the stream
--samples_needed; // And so there is one less set of samples that need to be decoded
}
// Check if we hit the loop end marker, if we did we need to jump to the loop start
if (looping_enabled && sample_index == adx_header->loop_end_index)
sample_index = adx_header->loop_start_index;
}
return samples_needed;
}
</syntaxhighlight>
Most of the above should be straightforward [[C (programming language)|C]] code. The '<code>ADX_header</code>' pointer refers to the data extracted from the header as outlined earlier, it is assumed to have already been converted to the host Endian. This implementation is not intended to be optimal and the external concerns have been ignored such as the specific method for sign extension and the method of acquiring a bitstream from a file or network source. Once it completes, there will be ''samples_needed'' sets (if stereo, there will be pairs for example) of samples in the output ''buffer''. The decoded samples will be in host-endian standard interleaved [[Pulse-code modulation|PCM]] format, i.e. left 16-bit, right 16-bit, left, right, etc. Finally, if looping is not enabled, or not supported, then the function will return the number of sample spaces that were not used in the buffer. The caller can test if this value is not zero to detect the end of the stream and drop or write silence into the unused spaces if necessary.
 
====Encryption====
The highest bit of the 4bit sample is the [[sign bit]] (negative when set, [[Two's complement]]) so the number has to be converted into an 8bit or larger signed integer for proper arithmetic handling as few processors can handle 4bit numbers natively. The next stage is to multiply the sample by the scale which gives it a rational amplitude, then amplify by a volume, values between 0-2000<sub>16</sub> tend to work best for the volume but can go higher. The next 2 steps include information from the previous two samples to bring the sample in line with the others, be aware that the previous samples progress across block boundaries, but separate sample sets must be kept for each channel [NOTE: The values start at 0 on the first audio frame]. Lastly, it's divided by 16384 using a downshift then rounded off inside of the 16bit signed sample range (-32768 to 32767).
CRI ADX supports a simple encryption scheme which [[bitwise XOR|XOR]]s values from a [[Linear congruential generator|linear congruential]] [[pseudorandom number generator]] with the block scale values. This method is computationally inexpensive to decrypt (in keeping with CRI ADX's real-time decoding) yet renders the encrypted files unusable. The encryption is active when the "Flags" value in the header is '''0x08'''. As XOR is symmetric the same method is used to decrypt as to encrypt. The encryption key is a set of three 16-bit values: the multiplier, increment, and start values for the linear congruential generator (the modulus is 0x8000 to keep the values in the 15-bit range of valid block scales). Typically all ADX files from a single game will use the same key.
 
The encryption method is vulnerable to [[known-plaintext attack]]s. If an unencrypted version of the same audio is known the random number stream can be easily retrieved and from it the key parameters can be determined, rendering every CRI ADX encrypted with that same key decryptable. The encryption method attempts to make this more difficult by not encrypting silent blocks (with all sample nybbles equal to 0), as their scale is known to be 0.
The decoded samples from each channel will still need to be interleaved together to form raw [[PCM]] audio data suitable for output into a sound card.
 
Even if the encrypted CRI ADX is the only sample available, it is possible to determine a key by assuming that the scale values of the decrypted CRI ADX must fall within a "low range". This method does not necessarily find the key used to encrypt the file, however. While it can always determine keys that produce an apparently correct output, errors may exist undetected. This is due to the increasingly random distribution of the lower bits of the scale values, which becomes impossible to separate from the randomness added by the encryption.
== Sources ==
*[http://www.cri-mw.co.jp/index_e.htm CRI Middleware] (Under Products &gt; ADX)
*[http://hcs64.com/in_cube.html in_cube WinAMP codec with source (supports ADX)]
*[http://www.geocities.co.jp/Playtown/2004/dcdev/index.html Dreamcast utilities including ADX converters with source]
*[http://wiki.multimedia.cx/index.php?title=CRI_ADX_ADPCM CRI ADX Description from multimedia.cx Wiki]
 
===AHX decoding===
AHX is an implementation of [[MPEG-1 Audio Layer II|MPEG2 audio]] and the decoding method is basically the same as the standard, making it possible to simply demultiplex the stream from the ADX container and feed it through a standard MPEG Audio decoder like [[mpg123]]. The CRI ADX header's "sample rate" and "total samples" are usually the same as the original but other fields like the block size and sample bit depth will usually be zero, in addition to the looping functionality.
 
==External links==
* [https://web.archive.org/web/20071019013843/http://www.cri-mw.com/products/product_adx_e.htm ADX product page] at [https://web.archive.org/web/20060814150626/http://www.cri-mw.com/ CRI Middleware website]
* [http://hcs64.com/vgmstream.html vgmstream WinAMP codec with source (supports ADX)]
* [https://web.archive.org/web/20090318103258/http://www.geocities.co.jp/Playtown/2004/dcdev/index.html Dreamcast utilities including ADX converters with source] ( ( 2009-10-24)
* [http://wiki.multimedia.cx/index.php?title=CRI_ADX_ADPCM CRI ADX Description from multimedia.cx Wiki]
* [https://archive.today/20121225122438/http://vgmstream.wiki.sourceforge.net/ADX ADX technical description on vgmstream Wiki]
 
{{DEFAULTSORT:Adx (File Format)}}
[[Category:Audio codecs]]
[[Category:Lossy compression algorithms]]
[[Category:Computer file formats]]
[[Category:DigitalArticles audiowith example C code]]
 
[[ja:CRI_ADX]]