Batch Audio Conversion for IVR (Interactive Voice Response) – WMA, WAV 8bit u-Law

Recently I recorded and edited some voice cues for a savings bank’s phone support centre. I’ve done this before but this was the first time I’d been asked to convert output for playback on the end-client’s IVR system.

Apparently there are a couple common standard formats in use depending on which IVR tech is deployed. Windows Media Audio (*.wma) is one and the other is PCM .wav but using a particular encoding scheme known as µ-law (pronounced ‘mew’ Law, or YOU Law more colloquially) and very low sample rates. For more detailed information on µ-law encoding, see this Wikipedia entry.

Neither of these formats are in common use anymore, especially 8kHz u-law WAV. As such, I had a tricky time doing the necessary batch conversion from high-res edited audio files. For batch audio processing I normally use the batch module in iZotope RX but that software doesn’t support either WMA or u-law WAV.

For my own reference, and in case this may be helpful to others in the future, detailed below are the methods I landed on. They use free, multi-platform command line tools. These aren’t especially complicated but because the programs don’t have a graphical user interface (GUI), users inexperienced with command-line tools may feel intimidated. I’ll try to make this easy.

If you’re on a Mac you can use the Homebrew package manager to download and install the tools required. If you use Windows then I think you can use the native Windows Package Manager (winget) in the PowerShell to the same end. I’ll go into detail on the Mac route because that’s my platform of choice. The same basic methods will apply to Windows and Linux users as well.

Table of Contents

WAV – 8-bit, 8kHz, u-Law – Format Overview

This format is optimized for very low filesize/bandwidth and predates the kind of psychoacoustic compression technology at play in newer formats like mp3.

Here is a sample IVR file before conversion. It’s 4sec long and the filesize is 601kb. The format is WAV, 24bits, 48kHz.

Voice recording before compressing for IVR playback.

And here is the same clip converted to 8bit, 8kHz WAV, u-law encoding:

Actually, I had to re-convert this file to signed 16-bit PCM for it to play back correctly on the embedded player.

The resultant clip is only 33kb, a mere 5% of the original file’s size. These days the space savings is negligible but it wasn’t quite so long ago that storing and transmitting audio was a serious technological hurdle.

To demonstrate the effect that low bit depth, low sample rate u-law encoding has on a more broadband signal, here’s an example of a high resolution solo piano recording. The source clip is 24bit, 96kHz, stereo. ~35MB. (Hosted here as FLAC to save bandwidth)

And here it is converted to a mono 8bit 8kHz clip, u-law encoding:

Thank you to my friend Artun Mikiciyan for allowing me to use his recording of En avril à Paris. Check out his YouTube channel for the full performance and others!

Tools Involved

These are the tools we’ll be using:

Homebrew (or your preferred package manager, depending on your OS)
FFmpeg
SoX

Downloading/Installing the Software

First, open the macOS Terminal app using Spotlight or by navigating to macOS/Applications/Utilities in the Finder.

We’ll begin by installing the Homebrew package manager, a tool for downloading, installing and updating software via the macOS Terminal. Installing Homebrew is as simple as pasting a line of text into a Terminal window and hitting return. Visit the Homebrew website and follow the very simple instructions at the top of the page.

You’ll see a flurry of text while Homebrew is installed. You may be asked for permission to install XCode command-line tools, an optional macOS component from Apple. You should accept. When the operation is complete you should see some text confirming the installation or alerting you to any issues. Afterward, installing a new app is as simple as typing:

brew install appname

So let’s install FFmpeg in this manner. Type the following into the Terminal and hit the return key:

brew install ffmpeg

Your terminal will look something like this while it does its thing:

Next let’s install the sox command in the same manner:

brew install sox

The brilliant thing about using a package manager is that it pulls together all the disparate open-source dependencies that the commands need to function and installs them alongside so that when it’s done its thing, everything should just work.

And that’s it. Now we can move on to the task at hand, which is crushing our meticulously prepared audio down to something straight out of the late 80s. Whether your output format will be .wma or u-law WAV, I’ll assume that you have your pre-conversion audio ready to go in a directory, saved as .wav. For the purposes of this demonstration, I’ll create a directory in the Home Folder of my user account and name it “convert”. Inside that folder I’ll create another and name it “output”. This is where I’ll copy my batch of audio to be converted.

WAV – 8-bit, 8kHz, u-Law – File Conversion Using sox

For the u-law conversion we’ll be using sox. Here is an example command for converting a single file:

sox INPUT.wav --rate 8000 --channels 1 --encoding u-law OUTPUT.wav

sox is the name of the command

INPUT.wav will be the input clip, before conversion. Change this to match your own filename.

--rate 8000 sets the rate of sox’s outputted file to 8000 Hertz (8kHz).

--channels 1 defines the channel count as 1 (mono). If the input file contains multiple channels they will be summed to mono.

--encoding u-law sets the encoding type to u-law, and finally;

OUTPUT.wav will be the name of the output file. Adjust to your needs.

And here is a command that loops through all the .wav files in the current directory, converts to u-law 8kHz and outputs the converted files to a subdirectory named output, leaving the filenames intact:

for file in *.wav; do sox "$file" --rate 8000 --channels 1 --encoding u-law ./output/"$file"; done

note: If /output doesn’t yet exist the command will throw an error. You can create a new subdirectory from the Terminal with the command mkdir ./output

If you aren’t comfortable navigating to the desired directory in the Terminal yourself, an easy way is to locate it in the Finder, right-click the desired folder and in the context menu go: Services > New Terminal At Folder as seen below.

The result should look something like this:

WMA – File Conversion using ffmpeg

WMA, or Windows Media Audio, is not as esoteric as u-law encoded WAV but as a format it’s getting long in the tooth. It isn’t natively supported on macOS, which is to say none of the apps bundled with the operating system can decode the format. For playback, VLC does the job. For converting a single file at a time the free open-source software Audacity will work, and while it’s possible to convert multiple files at a time in Audacity, it’s cumbersome. If sox supports .wma encoding (and it might), I couldn’t figure out how to get it working correctly. That’s where FFmpeg comes in.

FFmpeg is a suite of libraries for processing multimedia files of all kinds and is included as a component of many other, more elaborate toolsets such as Audacity. Like sox, FFmpeg doesn’t have a GUI of its own so we’ll be interfacing with it using the Terminal.

I couldn’t find any reference to WMA at all in the FFmpeg documentation so I first reached for Audacity. However, in the Audacity audio export dialog if you select WMA as the output format it’s made clear that ffmpeg is doing the work behind the scenes.

This gave me the idea to just try .wma as an output format using ffmpeg directly in the command line and it worked. Here is the basic command for doing a single file conversion:

ffmpeg -i INPUT.wav OUTPUT.wav

Pretty simple, right? The wma encoder bundled with ffmpeg defaults to the same sample rate as the input (ie. no resampling applied) and a data bitrate of 128kbps. If a specific bitrate is required, the user can pass that argument to the encoder as follows:

ffmpeg -i INPUT.wav -b:a 256k OUTPUT.wav

where -b:a 256k alerts the program that we’d like to set the audio bitrate (-b:a) to 256kbps

And so we can loop through each .wav file in a directory in the same manner as we did above with sox:

for file in *.wav; do ffmpeg -i "$file" ./output/"$file".wma; done

And what you’ll get is a .wma audio file in your /output directory for each .wav file present in the original directory.

I hope this basic walkthrough has been informative. If you need a more thorough guide to the mac Terminal and command-line computing in general, well I’m probably not the right person for that. That being said, if you’re having difficulty then I welcome your questions in the comments below.