Introduction

Are you ready to dive into the fascinating world of audio processing with Python? Recently, a colleague sparked my interest in music-retrieval applications and the use of Python for audio processing tasks. As a result, I’ve put together an introductory post that will leave you awestruck with the power of Python’s Librosa library for extracting wave features commonly used in research and application tasks such as gender prediction, music genre prediction, and voice identification. But before tackling these complex tasks, we need to understand the basics of signal processing and how they relate to working with WAV files. So, buckle up and get ready to explore the ins and outs of spectral features and their extraction - an exciting journey you won’t want to miss!

Audio storage and processing

What is an audio signal?

An audio signal is a representation of sound waves in the air. These sound waves are captured by a microphone and converted into an electrical signal, which can then be stored and manipulated digitally.

To store an audio signal digitally, the analogue electrical signal is first sampled at regular intervals, typically at 44,100 samples per second for CD-quality audio. Each sample is represented as a binary number with a certain bit depth, such as 16 bits. The higher the bit depth, the more accurately the analogue signal’s amplitude can be represented.

The binary numbers are then stored in a digital audio file format like WAV or MP3. The audio signal is typically compressed in these formats to reduce file size while maintaining acceptable audio quality. This compression can be lossless, meaning that no audio data is lost, or lossy, meaning that some audio data is discarded.

When the digital audio file is played back, the binary numbers are converted back into an analogue electrical signal by a digital-to-analogue converter, which can then be amplified and played through a speaker or headphones to produce sound waves in the air.

Audio file formats

Audio can be stored in files using different formats, depending on the application and the user’s requirements. Some of the most common formats used for storing audio in files include:

MP3: This compressed audio format is widely used for music playback and streaming. It offers high-quality audio with relatively small file sizes, making it a popular choice for storing and sharing music files.
WAV: This uncompressed audio format provides high-quality audio with no loss of fidelity. It is commonly used for recording and editing audio files, as well as for creating audio CDs.
AAC: This compressed audio format is similar to MP3 but offers better sound quality at lower bitrates. It is commonly used for streaming audio and video content.
FLAC: This lossless compressed audio format provides high-quality audio with no loss of fidelity. It is commonly used for storing and sharing high-resolution audio files.
OGG: This compressed audio format is commonly used for streaming audio and video content, and it offers high-quality audio with relatively small file sizes.
AIFF: This uncompressed audio format provides high-quality audio with no loss of fidelity. It is commonly used for recording and editing audio files on Apple computers.

The choice of format depends on factors such as the audio quality, the file size, and the compatibility with the playback device or software.

Python libraries for audio processing

There are several Python libraries for audio processing, each with its features and capabilities. Here are some of the most popular and widely used libraries for audio processing in Python:

NumPy is a fundamental library in Python for numerical computing. It provides the ability to perform various numerical operations on arrays, such as filtering, resampling, and FFT (Fast Fourier Transform).
SciPy is built on top of NumPy and provides additional scientific and technical computing functionalities, including digital signal processing (DSP), Fourier analysis, and filter design.
Librosa is a library for analysing and processing audio signals. It includes functionality for feature extraction, beat tracking, pitch estimation, and more.
Pydub is a simple and easy-to-use library for working with audio files in Python. It allows you to load, manipulate, and save various audio file formats, including MP3, WAV, and AIFF.
Soundfile is a library for reading and writing sound files. It supports various file formats, such as WAV, FLAC, and OGG, and provides a simple and straightforward interface for working with audio data.
PyAudio provides a Python interface to the PortAudio library, a cross-platform library for audio input and output. It allows you to record and playback audio in real-time and supports various input and output devices.
FFMpeg: FFMpeg is a command-line tool for manipulating video and audio files. Several Python bindings for FFMpeg, including moviepy and ffmpeg-python, provide a simple and easy-to-use interface for working with FFMpeg from Python.

Overall, selecting the best library for audio processing depends on the specific use case and the project’s requirements.

In this post, I focus on using Librosa, providing a great starting point for audio processing in Python. I will also use wave, sounddevice, soundfile, wave and, of course, NumPy!

I am affiliated with and recommend the following fantastic books for learning Python and mastering your audio processing and digital music programming skills.

Introduction to Digital Music with Python Programming. Learning Music with Code Introduction to Digital Music with Python Programming - offers beginners a foundation in music and coding, demonstrating how they can enhance creative expression and streamline production processes. Through interactive examples covering rhythm, chords, and melody, the book teaches core programming concepts without requiring prior experience in music or coding.
Authors - Michael S. Horn, Melanie West, Cameron Roberts Paperback Publication date - 7 Feb. 2022 Number of pages - 262 Language - English Publisher - Focal Press, First Edition ISBN-13 - 978-0367470821
The Python Audio Cookbook. Recipes for Audio Scripting with Python The Python Audio Cookbook is an important guide for those wanting to use Python in sound and multimedia projects. It explains audio synthesis techniques and GUI development in easy-to-understand terms, helping both beginners and experienced programmers create exciting audio projects.
Author - Alexandros Drymonitis Paperback Publication date - 18 Dec. 2023 Number of pages - 298 Language - English Publisher - Focal Press, First Edition ISBN-13 - 978-1032480114

Installing required libraries

First, you’ll need to install a few libraries to work with audio files in Python. Besides librosa, there are a few useful libraries for audio processing, such as NumPy and SciPy (check the scipy.signal). You can install them using pip. We can also use the sounddevice library 4 to play our sound, soundfile to save our audio files. Additionally, we can use the wave module from the Python standard library, which provides an interface to work with WAV files.

pip install librosa
pip install numpy
pip install soundfile
pip install sounddevice

As usual, importing the required libraries beforehand we start coding.

import librosa
import numpy as np
import soundfile as sf
import wave
import sounddevice as sd

Working with WAV files

WAV for audio storage

WAV files have the extension .wav and can be played on most media players, including Windows Media Player, iTunes, and VLC Media Player. WAV is a standard file format for storing high-quality audio and is supported by many devices and audio applications. WAV files are uncompressed, keeping the raw audio data without losing quality. This results in large file sizes but ensures the audio quality is preserved.

WAV files are often used in professional audio applications such as recording studios and sound production, where high-quality audio is required. The WAV format is flexible and supports various audio formats, including mono and stereo, 8-bit and 16-bit, and different sample rates. This makes WAV files popular for audio storage, especially for high-quality audio applications.

Recording voice

Sure, here’s an example Python code to record voice using the sounddevice library and save it as a WAV file using the wave library: Please note that you can also use pyaudio, a popular library for recording and playing audio.

import sounddevice as sd

# Set the sampling frequency and duration of the recording
sampling_frequency = 44100
duration = 5  # in seconds

# Record audio
print("Recording...")
audio = sd.rec(int(sampling_frequency * duration), samplerate=sampling_frequency, channels=1)
sd.wait()  # Wait until recording is finished
print("Finished recording")

The sample rate is the number of samples or times the audio signal is measured per second.

The sample rate determines the precision and accuracy of the audio signal representation. A higher sample rate means the audio signal is sampled more frequently, resulting in a more detailed and accurate representation. On the other hand, a lower sample rate leads to a lower precision and accuracy representation of the audio signal.

Standard sample rates include 44.1 kHz, 48 kHz, and 96 kHz. The most commonly used sample rate for music is 44.1 kHz, used in CDs and considered a standard for high-quality audio.

It’s important to note that changing the sample rate of an audio signal will affect its sound. Increasing the sample rate will result in a higher-quality sound and a larger file size. Decreasing the sample rate will result in a lower-quality sound and a smaller file size.

Saving an audio file

To save our recording, we can use the soundfile’s write function as follows.

import soundfile as sf

# Save the recorded audio to a WAV file
sf.write('voice.wav', audio, sampling_frequency)

This code will record 5 seconds of audio using the default microphone, save it as a WAV file with a sample rate of 44.1 kHz and 16-bit depth, and print the name of the saved file to the console. You can adjust the duration variable to change the length of the recording and the file_name variable to change the name of the saved file.

Playing an audio file

To play an audio in Python, we can use the sounddevice library:

sd.play(audio, fs)
sd.wait()

In this example, we use the play() function to play the signal array at the specified framerate, and then we use wait() to wait until the sound is finished playing.

Loading WAV files

To load a WAV file, we can use the “wave” module:

with wave.open('voice.wav', 'rb') as wav_file:
    channels_number, sample_width, framerate, frames_number, compression_type, compression_name = wav_file.getparams()
    frames = wav_file.readframes(frames_number)
    audio_signal = np.frombuffer(frames, dtype='<i2')

channels_number, sample_width, framerate, frames_number, compression_type, compression_name

(1, 2, 44100, 220500, 'NONE', 'not compressed')

In this example, we open the audio.wav file in read-only mode (‘rb’), and then we extract some metadata from the file using the getparams() method. We then read all the audio frames into a bytes object and convert them to a NumPy array with the frombuffer() method, specifying the data type as <i2 (16-bit signed integers).

If you prefer using Jupyter notebooks or Google Colab, you can also play the audio files using the Audio function in the IPython.display.

from IPython.display import Audio

Audio(audio_signal, rate=sampling_frequency)

Librosa use cases

Subscribe to unlock the full article ❤️

Audio Signal Processing with Python's Librosa