An example application: removing unwanted noise from audio

An example application: removing unwanted noise from audio#

In this notebook, we will analyze sound waves in the frequency domain. We will demonstrate how modifying signals in the frequency domain can effectively “purify” them.

For frequency domain analysis, we will begin by examining the Discrete-Time Fourier Transform (DTFT). However, we will find that the DTFT is not very suitable for our purposes. The DTFT analyzes signals that exist across the entire time axis, extending from negative infinity to positive infinity. Since in this task we are dealing with practical signals that have a finite duration, the DTFT’s assumption of infinite signal length is not realistic.

To overcome this limitation, we will introduce the Discrete Fourier Transform (DFT). The DFT performs a frequency domain analysis similar to the DTFT, but specifically for signals that are time-limited. This makes it a more practical tool for our denoising task in this notebook.

Let us create an A4 tone, which has 440 Hz (cycles/second) frequency. It is the A above middle C in the piano. It is also the dial tone used in our phones.

import numpy as np

sampleRate = 24000 # samples per second
frequency = 440    # this is the frequency of note A4
duration = 1       # a second

n = np.linspace(0, duration, int(sampleRate * duration))  
x = np.sin(frequency * 2 * np.pi * n)  #  Has frequency of 440Hz   

Here we created a signal in the form \(x[n] = sin(2\pi f n)\), where \(f\) is the frequency of the signal in cycles per second, or Hertz (Hz). The sample rate defines how many times per second a continuous signal is measured to create a digital representation. The value we use here (\(24000\)) is a typical value for audio WAV files.

Here is how the signal looks like:

from matplotlib import pyplot as plt
plt.stem(x[:100]); # we are just plotting the first 100 samples for visibility
_images/67df874b4432319debb7a1b451845e7662eeea59be2c0cf7b90eeb10b97a4534.png

And, here is how this signals sounds:

import sounddevice as sd
sd.play(x, sampleRate) 

Now let us pollute this nice tone with another high-pitch tone and with some random noise. For this, we will create a very high frequency signal and add normal random noise to it:

noise_freq = 7902 # this is the note B8
noise = 0.3*np.sin(noise_freq * 2 * np.pi * n) + 0.1*np.random.normal(0,1,len(x)) #  Has frequency of 440Hz

plt.stem(noise[:100])
<StemContainer object of 3 artists>
_images/22a4af34aa310c58a309471b43bae8adb6c8ec8c0e236d854c44672a7e5cd7f0.png

Here is how this signal sounds:

sd.default.reset()
sd.play(noise, sampleRate) 

Now we create the polluted signal by adding the A4 tone with the noise we just created. Below we both plot and play it. Together with the A4 tone, you should be able to hear the high-pitch tone and a noisy “hiss” sound.

y = x + noise 

plt.stem(y[:100]);

sd.play(y)
_images/3011faa7f5f4c22d46b2f2b12d50db130387e3e186122c32535e6a6f658b1f5d.png

One can develop a time-domain algorithm to smooth this polluted signal, that is, to remove both the high-pitch tone and the noisy “hiss”. However, we will use the frequency domain analysis to carry out this task.

Let us remember the discrete time Fourier transform. The Fourier transform representation of a signal enables us to decompose an aperiodic discrete time signal into its frequency components, which are embedded in the signal. A discrete time function, \(x[n]\), can be uniquely represented as a weighted integral of complex exponential function by the DTFT synthesis equation:

\[ x[n] = \frac1{2\pi}\int_{2\pi}X(e^{j\omega})e^{j\omega n}d\omega, \]

where the weight, called the Fourier transform, is a continuous function of frequency, which can be uniquely obtained from the time domain function by the DTFT analysis equation:

\[ \quad X(e^{j\omega}) = \sum_{n=-\infty}^{\infty}x[n]e^{-j\omega n}. \]

The analysis equation shows us how to obtain the Fourier transform, \(X(e^{j\omega})\) of \(x[n]\), which represents the signal as a function of frequencies, in the frequency domain.

Below you can find an implementation to compute DTFT. The DTFT has a unique characteristic: it takes a discrete time function as input but produces a continuous time function as output. Throughout this website, we have primarily used SymPy for continuous-time operations and NumPy for discrete-time signal processing. However, mixing these two libraries to implement DTFT is not trivial. Therefore, we will utilize NumPy for our DTFT implementation due to its more extensive capabilities. Of course, we will have to discretize the frequencies to represent a continuous domain.

# compute DTFT
def dtft(x):    
    # here we discretize the continuous frequency axis (between -pi and +pi) to
    # 75 equally spaced intervals. The choice of 75 is arbitrary. The higher the
    # better but the higher it is, computation becomes slower. 
    w = np.linspace(-np.pi, np.pi, 75) 
    
    X = np.zeros(len(w), dtype=np.complex64)
    mag = np.zeros(len(w))
    for i in range(len(w)): 
        X[i] = 0
        # in theory, the following loop should run from minus infinity to plus
        # infinity
        for n in range(len(x)):
            X[i] += x[n]*np.exp(-1j*w[i]*n)
        mag[i] = np.abs(X[i])

    # return the Fourier transform (X), which consists of complex numbers; the
    # magnitude of Fourier transform and the discrete values of frequencies 
    return X, mag, w

Let’s use this implementation to analyze the pure tone A4 that we started with. Below we compute the DTFT of x and plot its magnitude spectrum.

X,mag,w = dtft(x)
fig = plt.figure()
plt.plot(w,mag)
plt.xlabel(r'$\omega$')
plt.ylabel(r'$|X(e^{j\omega})|$');
fig.axes[0].set_xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi])
fig.axes[0].set_xticklabels([r'$-\pi$', r'$-\pi/2$', 0, r'$\pi/2$', r'$\pi$']);
_images/a9845fb6d16362bde598066ef44eacf46c157bd85bab690c45db70121052429e.png

As you can see, some low frequency components (near zero) have large magnitudes. And there are small non-zero activations in other frequencies. You might ask: x consists of only one frequency (440Hz), why don’t we see only one (actually two, it should be symmetric around zero) non-zero components? This would be true it x was periodic but it is not. It is not periodic because it does not extend to minus infinity to plus infinity. It is a time-limited signal. Due to this time-limited nature, we see many frequency components to have non-zero magnitudes.

Let us also have a look at the magnitude spectrums of the noise.

Xn,magn,w = dtft(noise)
fig = plt.figure()
plt.plot(w,magn)
plt.xlabel(r'$\omega$')
plt.ylabel(r'$|X_{noise}(e^{j\omega})|$');
fig.axes[0].set_xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi])
fig.axes[0].set_xticklabels([r'$-\pi$', r'$-\pi/2$', 0, r'$\pi/2$', r'$\pi$']);
_images/17002962ca7ab346887386a5aee35ca7da73b0e8349484e15b3c11412fefeee2.png

It has non-zero magnitudes all over the frequency domain. This is due to the standard normal noise. Let us look at the spectrum of only the high-pitch tone.

hp = 0.5*np.sin(noise_freq * 2 * np.pi * n) 
Xhp,maghp,w = dtft(hp)
fig = plt.figure()
plt.plot(w,maghp)
plt.xlabel(r'$\omega$')
plt.ylabel(r'$|X_{high-pitch}(e^{j\omega})|$');
fig.axes[0].set_xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi])
fig.axes[0].set_xticklabels([r'$-\pi$', r'$-\pi/2$', 0, r'$\pi/2$', r'$\pi$']);
_images/27221360b6b01959cdf27b5fe14b183455c1492922d4fa025af0e44ed57b5f39.png

This plot clearly shows two high-frequency (around 2 and -2) components having large magnitudes. And, here is the spectrum of the polluted signal y.

Y,magy,w = dtft(y)
fig = plt.figure()
plt.plot(w,magy)
plt.xlabel(r'$\omega$')
plt.ylabel(r'$|Y(e^{j\omega})|$');
fig.axes[0].set_xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi])
fig.axes[0].set_xticklabels([r'$-\pi$', r'$-\pi/2$', 0, r'$\pi/2$', r'$\pi$']);
_images/3a8c5b77552f36157ce8199e7e3ce94e85bc576d2aa38d15f5d9cdb4311b6b7a.png

Remember that our goal was to recover x from y. Looking at the Fourier transform specrum of y above, it does not look possible to discern the contributions of x and y in this plot.

To tackle this problem, we wil introduce the Discrete Fourier Transform (DFT), which is specially designed to analyze time-limited signals, such as x.

Discrete Fourier Transform (DFT)#

The DFT computes frequency coefficients for a time-limited input signal, whose samples are equally spaced apart. The following is the analysis equation of DFT:

\[ X_k = \sum_{n=0}^{N-1} x[n] e^{-j 2 \pi \frac{k}{N} n}. \]

Compare this to the analysis equation of DTFT:

\[ \quad X(e^{j\omega}) = \sum_{n=-\infty}^{\infty}x[n]e^{-j\omega n}. \]

The major difference between the two is that in DFT, the sum runs from 0 to N (hence, time-limited). And, the DFT returns a sequence of complex numbers (\(X_k\)) but the DTFT returns a complex-valued continuous function (\(X(e^{j\omega})\)).

At this point, it should be trivial to implement the analysis equation of DFT. We leave this as an exercise for the reader. Numpy provides a function to compute DFT. In the following, we will use numpy’s fft to compute the DFT.

Let us compute and plot the DFT of x, the pure A4 tone.

Xk = np.fft.fft(x)
freq = np.fft.fftfreq(n.shape[-1])
plt.plot(freq, np.abs(Xk))
plt.xlabel('k/N')
plt.ylabel(r'$|X_k|$');
_images/06f6876a19cffaaca41bb48091d2dd72199604e0827c4c16a47aa7b6f30eda35.png

As you can see, there are two large magnitude components for a certain low frequency (around 0, for +k and -k). And now, let us observe the DFT magnitude spectrum of the noise.

sp = np.fft.fft(noise)
freq = np.fft.fftfreq(n.shape[-1])
plt.figure(), plt.plot(freq, np.abs(sp));
print(np.abs(sp[:100]))
[ 4.75698709 13.29513495  4.57641489 13.19883897  9.89873933 21.65890634
 20.8151662  16.55482307  7.146025    4.56764041 22.65371838 29.25204489
 13.49045264  6.2059436   6.17461822  4.97786722 27.83266909 16.55784768
 11.57663734 19.84686108 16.43398781 20.5285779   8.45403072 16.95319067
 25.168836   28.06530893 32.75206405 27.03324919  4.36217594 12.60555754
 18.64621713 21.71638382  0.87713456 18.34325865  4.40219255  8.23504041
 20.75169185  0.59601926 23.38352737  3.12605506 14.69029871  3.61659405
  3.91791543 19.33755481 26.28004969 23.84169626 19.99609848 16.33469857
 24.27884342 21.35762222 19.83556345 22.68876844 24.1330642   2.7626167
 12.23616205 13.42094127 26.62307021 18.67352335 25.06748246 21.26652853
 11.34807624 28.91179724 30.87217813 10.70659842 26.273807   21.2211305
  6.01464403 16.81548664  7.89549051 16.81920299  1.81531847 12.57704291
 10.68299321  5.41885516 26.47997138 40.40686686 15.05078859 22.78504208
  7.02156692 15.15295542 15.62508956 16.02506689  5.66209384 10.54810694
 18.47975912 15.49258887 11.78567514 12.84033065  3.63947214  4.46910491
 17.29035643  8.01493133  3.56405924 15.19377273  5.52487503  5.74407201
  6.84191015 12.86899382 16.86887993 27.65560794]
_images/e8f65a4b795e3e1b0b96c7526c07920c904618ac366364cebb4fda68a1ee2c56.png

The noise is characterised by two large magnitude high frequency components. And also, small but non-zero magnitudes all over the spectrum, as printed above. The DFT magnitude spectrum of y is as follows.

Yk = np.fft.fft(y)
freq = np.fft.fftfreq(n.shape[-1])
plt.figure(), plt.plot(freq, np.abs(Yk));
plt.xlabel('k/N')
plt.ylabel(r'$|Y_k|$');
_images/20e2b2d9fb438a9c48de1aaddf2e767edd0c80a56e4c02e55c2e52f4d3ce66b1.png

This plot clearly shows that y consists of one low frequency and one high frequency component. To keep the low frequency component only, we can zero out everything that has a magnitude less than 3000. Below we do this modification and reconstruct back the signal using inverse DFT, which should remove the noise in y. You can find the plot of denoised y (compare this with the original plot of y at the beginning of this notebook) and listen to it. It should sound like the pure A4 tone you listenede to above.

Yk[np.abs(Yk)<3000] = 0

x_recons = np.real(np.fft.ifft(Yk))
plt.plot(np.real(x_recons[:200]))
plt.title('denoised y')

sd.default.reset()
sd.play(np.real(x_recons), sampleRate)
_images/248e1fb26e392a96f497bcd8ba5651e16238d3c4a331033ce3267257c6fb2a98.png