Contents
Why is the Mel spectrogram called a spectrum?
In other words, it converts the signal from the time domain into the frequency domain. The result is called a spectrum. This is possible because every signal can be decomposed into a set of sine and cosine waves that add up to the original signal. This is a remarkable theorem known as Fourier’s theorem.
How are frequencies converted to the mel scale?
We perform a mathematical operation on frequencies to convert them to the mel scale. A mel spectrogram is a spectrogram where the frequencies are converted to the mel scale. I know, right? Who would’ve thought?
How to reconstruction audio signal from spectrogram Stack Exchange?
Start with x 0 being a random vector of length of the audio signal. For me a few iterations were sufficient to get a result that sounded alright. The absolute error to the original signal was nevertheless quite high.
When did Stevens and Volkmann create the mel scale?
In 1937, Stevens, Volkmann, and Newmann proposed a unit of pitch such that equal distances in pitch sounded equally distant to the listener. This is called the mel scale. We perform a mathematical operation on frequencies to convert them to the mel scale. A mel spectrogram is a spectrogram where the frequencies are converted to the mel scale.
Which is the best Mel spectrogram for deep learning?
Mel Spectrograms work well for most audio deep learning applications. However, for problems dealing with human speech, like Automatic Speech Recognition, you might find that MFCC (Mel Frequency Cepstral Coefficients) sometimes work better. These essentially take Mel Spectrograms and apply a couple of further processing steps.
Is there a way to augment audio data?
Just like with images, there are several techniques to augment audio data as well. This augmentation can be done both on the raw audio before producing the spectrogram, or on the generated spectrogram. Augmenting the spectrogram usually produces better results. The normal transforms you would use for an image don’t apply to spectrograms.