SoundTouch audio processing library v1.3.2pre

SoundTouch library Copyright (c) Olli Parviainen 2002-2008

1. Introduction

SoundTouch is an open-source audio processing library that allows changing the sound tempo, pitch and playback rate parameters independently from each other, i.e.:

Sound tempo can be increased or decreased while maintaining the original pitch
Sound pitch can be increased or decreased while maintaining the original tempo
Change playback rate that affects both tempo and pitch at the same time
Choose any combination of tempo/pitch/rate

1.1 Contact information

Author email: oparviai 'at' iki.fi

SoundTouch WWW page: http://www.surina.net/soundtouch

2. Compiling SoundTouch

Before compiling, notice that you can choose the sample data format if it's desirable to use floating point sample data instead of 16bit integers. See section "sample data format" for more information.

2.1. Building in Microsoft Windows

Project files for Microsoft Visual C++ 6.0 and Visual C++ .NET are supplied with the source code package. Please notice that SoundTouch library uses processor-specific optimiations for Pentium III and AMD processors that require a processor pack upgrade for the Visual Studio 6.0 to be installed in order to support these optimiations. The processor pack upgrade can be downloaded from Microsoft site at this URL:

http://msdn.microsoft.com/vstudio/downloads/tools/ppack/default.aspx

If the above URL is unavailable or removed, go to http://msdn.microsoft.com and perform a search with keywords processor pack.

Visual Studio .NET supports required instructions by default and thus doesn't require installing the processor pack.

To build the binaries with Visual C++ 6.0 compiler, either run "make-win.bat" script or open the appropriate project files in source code directories with Visual Studio. The final executable will appear under the "SoundTouch\bin" directory. If using the Visual Studio IDE instead of the make-win.bat script, directories bin and lib have to be created manually to the SoundTouch package root for the final executables. The make-win.bat script creates these directories automatically.

Also other C++ compilers than Visual C++ can be used, but project or makefiles then have to be adapted accordingly. Performance optimiations are written in Visual C++ compatible syntax, they may or may not be compatible with other compilers. If using GCC (Gnu C Compiler) compiler package such as DJGPP or Cygwin, please see next chapter for instructions.

2.2. Building in Gnu platforms

The SoundTouch library can be compiled in practically any platform supporting GNU compiler (GCC) tools. SoundTouch have been tested with gcc version 3.3.4., but it shouldn't be very specific about the gcc version. Assembler-level performance optimiations for GNU platform are currently available in x86 platforms only, they are automatically disabled and replaced with standard C routines in other processor platforms.

To build and install the binaries, run the following commands in SoundTouch/ directory:

./configure -	Configures the SoundTouch package for the local environment.
make -	Builds the SoundTouch library & SoundStretch utility.
make install -	Installs the SoundTouch & BPM libraries to /usr/local/lib and SoundStretch utility to /usr/local/bin. Please notice that 'root' privileges may be required to install the binaries to the destination locations.

NOTE: At the time of release the SoundTouch package has been tested to compile in GNU/Linux platform. However, in past it's happened that new gcc versions aren't necessarily compatible with the assembler setttings used in the optimied routines. If you have problems getting the SoundTouch library compiled, try the workaround of disabling the optimiations by editing the file "include/STTypes.h" and removing the following definition there:

#define ALLOW_OPTIMIZATIONS 1

3. About implementation & Usage tips

3.1. Supported sample data formats

The sample data format can be chosen between 16bit signed integer and 32bit floating point values, the default is 32bit floating point.

In Windows environment, the sample data format is chosen in file "STTypes.h" by choosing one of the following defines:

#define INTEGER_SAMPLES for 16bit signed integer
#define FLOAT_SAMPLES for 32bit floating point

In GNU environment, the floating sample format is used by default, but integer sample format can be chosen by giving the following switch to the configure script:

./configure --enable-integer-samples

The sample data can have either single (mono) or double (stereo) audio channel. Stereo data is interleaved so that every other data value is for left channel and every second for right channel. Notice that while it'd be possible in theory to process stereo sound as two separate mono channels, this isn't recommended because processing the channels separately would result in losing the phase coherency between the channels, which consequently would ruin the stereo effect.

Sample rates between 8000-48000H are supported.

3.2. Processing latency

The processing and latency constraints of the SoundTouch library are:

Input/output processing latency for the SoundTouch processor is around 100 ms. This is when time-stretching is used. If the rate transposing effect alone is used, the latency requirement is much shorter, see section 'About algorithms'.
Processing CD-quality sound (16bit stereo sound with 44100H sample rate) in real-time or faster is possible starting from processors equivalent to Intel Pentium 133Mh or better, if using the "quick" processing algorithm. If not using the "quick" mode or if floating point sample data are being used, several times more CPU power is typically required.

3.3. About algorithms

SoundTouch provides three seemingly independent effects: tempo, pitch and playback rate control. These three controls are implemented as combination of two primary effects, sample rate transposing and time-stretching.

Sample rate transposing affects both the audio stream duration and pitch. It's implemented simply by converting the original audio sample stream to the desired duration by interpolating from the original audio samples. In SoundTouch, linear interpolation with anti-alias filtering is used. Theoretically a higher-order interpolation provide better result than 1st order linear interpolation, but in audio application linear interpolation together with anti-alias filtering performs subjectively about as well as higher-order filtering would.

Time-stretching means changing the audio stream duration without affecting it's pitch. SoundTouch uses WSOLA-like time-stretching routines that operate in the time domain. Compared to sample rate transposing, time-stretching is a much heavier operation and also requires a longer processing "window" of sound samples used by the processing algorithm, thus increasing the algorithm input/output latency. Typical i/o latency for the SoundTouch time-stretch algorithm is around 100 ms.

Sample rate transposing and time-stretching are then used together to produce the tempo, pitch and rate controls:

'Tempo' control is implemented purely by time-stretching.
'Rate' control is implemented purely by sample rate transposing.
'Pitch' control is implemented as a combination of time-stretching and sample rate transposing. For example, to increase pitch the audio stream is first time-stretched to longer duration (without affecting pitch) and then transposed back to original duration by sample rate transposing, which simultaneously reduces duration and increases pitch. The result is original duration but increased pitch.

3.4 Tuning the algorithm parameters

The time-stretch algorithm has few parameters that can be tuned to optimie sound quality for certain application. The current default parameters have been chosen by iterative if-then analysis (read: "trial and error") to obtain best subjective sound quality in pop/rock music processing, but in applications processing different kind of sound the default parameter set may result into a sub-optimal result.

The time-stretch algorithm default parameter values are set by these #defines in file "TDStretch.h":

#define DEFAULT_SEQUENCE_MS     82
#define DEFAULT_SEEKWINDOW_MS   28
#define DEFAULT_OVERLAP_MS      12

These parameters affect to the time-stretch algorithm as follows:

DEFAULT_SEQUENCE_MS: This is the default length of a single processing sequence in milliseconds which determines the how the original sound is chopped in the time-stretch algorithm. Larger values mean fewer sequences are used in processing. In principle a larger value sounds better when slowing down the tempo, but worse when increasing the tempo and vice versa.
DEFAULT_SEEKWINDOW_MS: The seeking window default length in milliseconds is for the algorithm that seeks the best possible overlapping location. This determines from how wide a sample "window" the algorithm can use to find an optimal mixing location when the sound sequences are to be linked back together.

The bigger this window setting is, the higher the possibility to find a better mixing position becomes, but at the same time large values may cause a "drifting" sound artifact because neighboring sequences can be chosen at more uneven intervals. If there's a disturbing artifact that sounds as if a constant frequency was drifting around, try reducing this setting.
DEFAULT_OVERLAP_MS: Overlap length in milliseconds. When the sound sequences are mixed back together to form again a continuous sound stream, this parameter defines how much the ends of the consecutive sequences will overlap with each other.

This shouldn't be that critical parameter. If you reduce the DEFAULT_SEQUENCE_MS setting by a large amount, you might wish to try a smaller value on this.

Notice that these parameters can also be set during execution time with functions "TDStretch::setParameters()" and "SoundTouch::setSetting()".

The table below summaries how the parameters can be adjusted for different applications:

Parameter name	Default value magnitude	Larger value affects...	Smaller value affects...	Music	Speech	Effect in CPU burden
SEQUENCE_MS	Default value is relatively large, chosen for slowing down music tempo	Larger value is usually better for slowing down tempo. Growing the value decelerates the "echoing" artifact when slowing down the tempo.	Smaller value might be better for speeding up tempo. Reducing the value accelerates the "echoing" artifact when slowing down the tempo	Default value usually good	A smaller value than default might be better	Increasing the parameter value reduces computation burden
SEEKWINDOW_MS	Default value is relatively large, chosen for slowing down music tempo	Larger value eases finding a good mixing position, but may cause a "drifting" artifact	Smaller reduce possibility to find a good mixing position, but reduce the "drifting" artifact.	Default value usually good, unless a "drifting" artifact is disturbing.	Default value usually good	Increasing the parameter value increases computation burden
OVERLAP_MS	Default value is relatively large, chosen to suit with above parameters.		If you reduce the "sequence ms" setting, you might wish to try a smaller value.			Increasing the parameter value increases computation burden

3.5 Performance Optimiations

General optimiations:

The time-stretch routine has a 'quick' mode that substantially speeds up the algorithm but may degrade the sound quality by a small amount. This mode is activated by calling SoundTouch::setSetting() function with parameter id of SETTING_USE_QUICKSEEK and value "1", i.e.

setSetting(SETTING_USE_QUICKSEEK, 1);

CPU-specific optimiations:

Intel MMX optimied routines are used with compatible CPUs when 16bit integer sample type is used. MMX optimiations are available both in Win32 and Gnu/x86 platforms. Compatible processors are Intel PentiumMMX and later; AMD K6-2, Athlon and later.
Intel SSE optimied routines are used with compatible CPUs when floating point sample type is used. SSE optimiations are currently implemented for Win32 platform only. Processors compatible with SSE extension are Intel processors starting from Pentium-III, and AMD processors starting from Athlon XP.
AMD 3DNow! optimied routines are used with compatible CPUs when floating point sample type is used, but SSE extension isn't supported . 3DNow! optimiations are currently implemented for Win32 platform only. These optimiations are used in AMD K6-2 and Athlon (classic) CPU's; better performing SSE routines are used with AMD processor starting from Athlon XP.

4. SoundStretch audio processing utility

SoundStretch audio processing utility
Copyright (c) Olli Parviainen 2002-2005

SoundStretch is a simple command-line application that can change tempo, pitch and playback rates of WAV sound files. This program is intended primarily to demonstrate how the "SoundTouch" library can be used to process sound in your own program, but it can as well be used for processing sound files.

4.1. SoundStretch Usage Instructions

SoundStretch Usage syntax:

soundstretch infile.wav outfile.wav [switches]

Where:

"infile.wav"	is the name of the input sound data file (in .WAV audio file format).
"outfile.wav"	is the name of the output sound file where the resulting sound is saved (in .WAV audio file format). This parameter may be omitted if you don't want to save the output (e.g. when only calculating BPM rate with '-bpm' switch).
[switches]	Are one or more control switches.

Available control switches are:

-tempo=n	Change the sound tempo by n percents (n = -95.0 .. +5000.0 %)
-pitch=n	Change the sound pitch by n semitones (n = -60.0 .. + 60.0 semitones)
-rate=n	Change the sound playback rate by n percents (n = -95.0 .. +5000.0 %)
-bpm=n	Detect the Beats-Per-Minute (BPM) rate of the sound and adjust the tempo to meet 'n' BPMs. If this switch is defined, the "-tempo=n" switch value is ignored. If "=n" is omitted, i.e. switch "-bpm" is used alone, the program just calculates and displays the BPM rate but doesn't adjust tempo according to the BPM value.
-quick	Use quicker tempo change algorithm. Gains speed but loses sound quality.
-naa	Don't use anti-alias filtering in sample rate transposing. Gains speed but loses sound quality.
-license	Displays the program license text (LGPL)

Notes:

The numerical switch values can be entered using either integer (e.g. "-tempo=123") or decimal (e.g. "-tempo=123.45") numbers.
The "-naa" and/or "-quick" switches can be used to reduce CPU usage while compromising some sound quality
The BPM detection algorithm works by detecting repeating low-frequency (<250H) sound patterns and thus works mostly with most rock/pop music with bass or drum beat. The BPM detection doesn't work on pieces such as classical music without distinct, repeating bass frequency patterns. Also pieces with varying tempo, varying bass patterns or very complex bass patterns (ja, hiphop) may produce odd BPM readings.

In cases when the bass pattern drifts a bit around a nominal beat rate (e.g. drummer is again drunken :), the BPM algorithm may report incorrect harmonic one-halft to one-thirdth of the correct BPM value; in such case the system could for example report BPM value of 50 or 100 instead of correct BPM value of 150.

4.2. SoundStretch usage examples

Example 1

The following command increases tempo of the sound file "originalfile.wav" by 12.5% and saves result to file "destinationfile.wav":

soundstretch originalfile.wav destinationfile.wav -tempo=12.5

Example 2

The following command decreases the sound pitch (key) of the sound file "orig.wav" by two semitones and saves the result to file "dest.wav":

soundstretch orig.wav dest.wav -pitch=-2

Example 3

The following command processes the file "orig.wav" by decreasing the sound tempo by 25.3% and increasing the sound pitch (key) by 1.5 semitones. Result is saved to file "dest.wav":

soundstretch orig.wav dest.wav -tempo=-25.3 -pitch=1.5

Example 4

The following command detects the BPM rate of the file "orig.wav" and adjusts the tempo to match 100 beats per minute. Result is saved to file "dest.wav":

soundstretch orig.wav dest.wav -bpm=100

5. Change History

5.1. SoundTouch library Change History

v1.3.2:

Bugfixes: Using uninitialied variables, GNU build scripts, compiler errors due to 'const' keyword mismatch.
Some source code cleanup

v1.3.1:

Changed static class declaration to GCC 4.x compiler compatible syntax.
Enabled MMX/SSE-optimied routines also for GCC compilers. Earlier the MMX/SSE-optimied routines were written in compiler-specific inline assembler, now these routines are migrated to use compiler intrinsic syntax which allows compiling the same MMX/SSE-optimied source code with both Visual C++ and GCC compilers.
Set floating point as the default sample format and added switch to the GNU configure script for selecting the other sample format.

v1.3.0:

Fixed tempo routine output duration inaccuracy due to rounding error
Implemented separate processing routines for integer and floating arithmetic to allow improvements to floating point routines (earlier used algorithms mostly optimied for integer arithmetic also for floating point samples)
Fixed a bug that distorts sound if sample rate changes during the sound stream
Fixed a memory leak that appeared in MMX/SSE/3DNow! optimied routines
Reduced redundant code pieces in MMX/SSE/3DNow! optimied routines vs. the standard C routines.
MMX routine incompatibility with new gcc compiler versions
Other miscellaneous bug fixes

v1.2.1:

Added automake/autoconf scripts for GNU platforms (in courtesy of David Durham)
Fixed SCALE overflow bug in rate transposer routine.
Fixed 64bit address space bugs.
Created a 'soundtouch' namespace for SAMPLETYPE definitions.

v1.2.0:

Added support for 32bit floating point sample data type with SSE/3DNow! optimiations for Win32 platform (SSE/3DNow! optimiations currently not supported in GCC environment)
Replaced 'make-gcc' script for GNU environment by master Makefile
Added time-stretch routine configurability to SoundTouch main class
Bugfixes

v1.1.1:

Moved SoundTouch under lesser GPL license (LGPL). This allows using SoundTouch library in programs that aren't released under GPL license.
Changed MMX routine organiation so that MMX optimied routines are now implemented in classes that are derived from the basic classes having the standard non-mmx routines.
MMX routines to support gcc version 3.
Replaced windows makefiles by script using the .dsw files

v1.01:

"mmx_gcc.cpp": Added "using namespace std" and removed "return 0" from a function with void return value to fix compiler errors when compiling the library in Solaris environment.
Moved file "FIFOSampleBuffer.h" to "include" directory to allow accessing the FIFOSampleBuffer class from external files.

v1.0:

Initial release

5.2. SoundStretch application Change History

v1.3.0:

Simplified accessing WAV files with floating point sample format.

v1.2.1:

Fixed 64bit address space bugs.

v1.2.0:

Added support for 32bit floating point sample data type
Restructured the BPM routines into separate library
Fixed big-endian conversion bugs in WAV file routines (hopefully :)

v1.1.1:

Fixed bugs in WAV file reading & added byte-order conversion for big-endian processors.
Moved SoundStretch source code under 'example' directory to highlight difference from SoundTouch stuff.
Replaced windows makefiles by script using the .dsw files
Output file name isn't required if output isn't desired (e.g. if using the switch '-bpm' in plain format only)

v1.1:

Fixed "Release" settings in Microsoft Visual C++ project file (.dsp)
Added beats-per-minute (BPM) detection routine and command-line switch "-bpm"

v1.01:

Initial release

6. Acknowledgements

Kudos for these people who have submitted bugfixed since SoundTouch v1.3.1:

Arthur A.: Bugfix
Stanislav Brabec / Takashi Iwai
Jason Garland

Moral greetings to all earlier contributors as well!

7. LICENSE

SoundTouch audio processing library
Copyright (c) Olli Parviainen

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA