FastMPEG: time-scale modification of audio in the bit-rate-compressed domain

Michele Covell and Malcolm Slaney and Art Rothstein

Many devices have hardware that efficiently performs the MPEG decompression and then sends the analog waveform directly to the user. But such an architecture makes it difficult to change the playback speed of the audio since the processor only has access to the compressed audio stream. As a result, many devices that handle compressed audio (i.e. digital VCRs, MP3 decoders) have to mute the audio when the user is fast forwarding.

Time-scale modification (TSM) in the bit-rate-compressed domain would allow these systems to play back the audio stream at various rates. FastMPEG provides this capability: it time-compresses and expands MPEG audio streams while they are still bit-compressed.

FastMPEG takes, for example, two compressed MPEG audio packets, and returns one new packet with 2:1 time-compressed audio. We can use this algorithm for both speedup and slowdown, and for a wide variety of rate changes (e.g., 3:2, 2:1, 3:1, and other ratios as needed). The compression/expansion rate can be changed interactively with no noticeable delay in response. The resulting audio stream does have minor acoustic artifacts but it is still far superior to offering no audio at all.

Depending on the system capacity and the application needs, different approaches to TSM in the bit-compressed domain are most appropriate. If there is only a very small amount of processor capacity available for the TSM task, then "snippet omission" (dropping small goroups of samples) is the most appropriate approach. If the audio being modified has more than one dominant pitch, "fast playback" (redistributing the low-frequency-band energy across all the frequency bands) may be the most appropriate approach. Finally, If there is a single dominant pitch (e.g. someone talking), then "SOLA-like processing" is the best approach. See our paper for more details. (Michele Covell, Malcolm Slaney, Art Rothstein. "FastMPEG: Time-scale modification of audio in the bit-rate-compressed domain." Proceedings of the 2001 International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, UT, pp. 3261-3264.)

Results from various TSM approaches on the bit-rate-compressed audio

Our paper describes three different approaches to TSM: snippet omission, fast playback, and SOLA-like processing. Short audio examples for all three approaches are available in the following table. All of the audio was sped up by a factor of two, by applying TSM in the MPEG1 layer-2 bit-rate-compressed domain to this excerpt from Tom's Diner.

NOTE: All the processed files on this page started life as MP2 files (like MP3, but without the temporal window switching option) and have been converted to .wav files for easy playback on the web. To avoid "stutter" in the playback of these files, please "pause" the playback until the full file has downloaded to your computer.

TSM approach used to speed the audio up by 2
Snippet omission
Fast playback
SOLA-like processing

Results from SOLA-like approach on the bit-rate-compressed audio

Our SOLA-like approach produces the best quality results for single-pitch audio. Our implementation allows us to speed up or slow down the audio by rational factors. Several examples are given in the following table. All of the modified audio was generated using the SOLA-like approach to time-scale modification in the MPEG1 layer-2 bit-rate-compressed domain.

Sped up in:out ratio Original Slowed down in:out ratio
2.0 x 1.7 x 1.5 x 1.2 x 1.2 x 1.5 x
2:1 5:3 3:2 5:4 tomsdiner 4:5 2:3
2:1 5:3 3:2 5:4 morrow 4:5 2:3