Recently we had an email from a company who are developing a bird song recognizer who were having problems with wind noise corrupting recordings and giving inaccurate results. The company, iSpiny was interested in using our code for real time wind noise detection to indicate when high levels of wind noise would cause problems with their algorithm. So while not directly related to audio quality it shows that our research has a wider possible application. As we understand the wind noise detector is now being utilized within the mobile bird song recognizer app . For more information see the following site;
if you are interested in using the algorithm with your own application there is the offline batch detector here,
as well as a realtime method implemented for iPhone. (contact us for details)
Our work into the perception and automated detection of microphone wind noise had been published in the Journal of The Acoustical Society of America. This paper discuss how wind noise is perceived by listeners, and uses this information to form the basis of s wind noise detector / meter for analyzing audio files you can access the Journal here:
Or if you don’t have access, the paper is will also be available here (the next couple of days)
If you want to run the wind noise detection algorithm you can do so using the code here
We have a developed an algorithm which is able to measure the level of wind noise on your recordings. This algorithm is the result of research carried out for our project where we carried out perceptual studies about the effect of microphone wind noise on sound quality of recordings. We then developed an algorithm which was able to analyse audio files and detect wind noise and predict the level of degradation to audio quality.
This program is useful to people who may have a lot of audio files they want to quickly sort through to find versions of recordings without wind noise. Or if they want to quickly located regions in recordings which are free of problems. A possible application of this technology is to collect together many recordings of an out door concert and without having to listen to all recordings piece together the best quality files.
The program has been uploaded to GIThub, it is a command line program written in c/c++ and needs to be compiled first.
from here you can download and compile the program to use in your own applications.
We would love to hear how you get along using the program and what you have been using it for. We will be expand the program to detect other common recording errors to
Good news! Today sees the launch of the project’s first ever app – The Good Recorder. Absolutely free and available now via the iTunes store, or click here.
What is The Good Recorder?
The Good Recorder is a sound recording app (currently only for iOS 7 devices) designed to help users achieve high quality audio recordings by monitoring for common recording errors and providing feedback about them. Currently the app incorporates findings and algorithms from our previous work with wind noise. The plan is to further develop the app with auto-detection of handling noise and distortion as our research in these areas progresses. Continue reading
Array of microphones used to capture wind noise
After developing a microphone wind noise detector which is trained on simulated examples of wind noise (see my ICME conference paper), rigorous proof of the algorithm’s success (or failure!) is required. In fact the reviewers of this aforementioned paper suggested this. To that aim I packed a car full with microphone stands, cables, preamps, and a number of recording devices and set off to collect some examples of wind noise.
The requirement for the location to collected these examples is that there is very low levels of background noise. I found a location up upon Rivington Pike, north of Manchester. There was a road which was closed for repair, ideal! as it means no traffic. After a couple of false starts and some help from a kindly local man, I found a good location with, no road, rail, urban or air traffic noise. I located a place away from trees, which can create a surprisingly loud level of rustling noise and set my microphones up.
Array of microphones used to capture wind noise
Array of microphones used to capture wind noise
I was using an Edirol R-44 to capture four channel of audio onto an SD card at 44.1 kHz sampling frequency. I set up two measurement microphones, one with a wind shield, a sure SM58 dynamic microphone, a zoom H2 recorder and an iPhone taped to a stand. Though one of my microphones sported a windshield, due to the particularly blustery conditions with 20 mph winds, wind noise was present on all recordings. This made it all the more important that the background sound level was as low as possible as I intend to compute the wind noise level, assuming that the background noise level is negligible.
- recording device used, 4 channels
- Calibration was carried out on the two measurement microphones by placing a calibrator on each, playing a 1 kHz tone at around 94dB and recording these sounds. Now I can calibrate my recordings so that I can present data in the actual sound pressure levels recorded for these two microphones. To calibrate the other devices is a little tricky, but a 1 kHz tone was played back over a loudspeaker at approx 1m distance and recorded on all devices simultaneously. As I can now know the true sound pressure level from the calibrated measurement microphones, i can also compute the true level of this tone relative to the calibrated recordings and using this information calibrate the other microphones to within a few decibels. To remove wind noise a narrow band-pass filter is applied centered on 1 kHz. Clearly there is some error due to the location of the microphones and and residual wind noise present within the pass-band, but this is not a significant problem.
- Several hours later, and I am rather cold but have the data, now back to Salford set up my validation procedure.
So we have we have been working on a number of things recently. We have finished our web experiments where we have been looking at the influence of wind noise on the perceptual quality of speech. For this experiment people were asked to listen to samples of recordings with added wind noise and rate the quality, attempt to repeated what was said and rate the difficulty of the task. We varied the wind noise sample in term of level and ‘gustiness’. We are analyzing the data at the moment attempting to understand how level and gustiness relate to sound quality for this particular case.
Wind noise Detector
In addition to these subjective tests we have developed a ‘wind noise detector’. This algorithm listens to an audio stream and detects the presence of ‘wind-noise’. The detector compresses the information within the audio stream by extracting ‘audio features’. Audio Features are efficient representations of sounds. The amount of data required to represent an uncompressed digital audio stream is very large and to build a detector which utilized the raw audio stream is simply not possible. Therefore features must be extracted which can represent the information present in the stream much more efficiently Luckily, by an understanding of how sound is processed by the human auditory system, gives us a way of compressing the information stream, throwing away all the perceptually unimportant parts while keep the salient features. This is the how mp3 and other compression method achieve their high compression ratios. See the later topic for more information on the features extraction.
Teaching a machine to detect wind noise
The wind noise was simulated based on a number of realistic models. This allowed us to generate a huge range of possible examples. The scheme adopted was a supervised learning one. This is where a set of audio features are extracted and a target value (the wind nosie level) is associated with this feature vector. a large number of examples are generated an classified according to wind noise level. A support vector machine is then train to try to classify between two groups, where one group contains features from wind noise above a certain level and the other below. A support vector machine (SVM) is a binary classifier where the objective find a line, (or a plane or hyper plane depending on the number of dimensions of the features) which can be drawn in the feature space which will separate between the two groups. A number of SVMs are trained using different wind noise levels as a thresholds. Three thresholds are chosen so that four class are defined: high, medium low and undetectable. Three SVMs are trained and the data combined using a decision tree. The results are very promising which simulated data showing detection rates of 87%, and real world test also showing good promise.
Audio Features – Mel-Frequency Cepstrum coefficients
The audio features representation called Mel Frequency Cepstrum Coefficients (MFCC) is commonly used in speech recognition to compress the information stream prior to the recognition stage. The MFCC is a spectral representation of a signal over a (usually short eg 20 ms ) time period. A spectral representation means, rather than representing the signal in the time domain i.e. how the pressure fluctuations over time the representation simply shows the levels of the different frequency components with the analysis time period (this time window is often referred to as a window). The ‘Mel’ part refers to the frequencies over which the spectrum is evaluated. A Fourier transform has a linearly spaced frequency components, however this is not how the human auditory system performs The human system is sensitive over a logarithmic scale, in other words the change in frequency for a low pitched sound is much more noticeable compared with the same change but a t a higher pitch. The Mel scale attempts to represent how the human auditory system represents pitch.
Cepstrum – The cepstrum is a representation of a signal where the inverse Fourier transform of the log spectrum is computed. A property of the logarithm is that process that previously were multiplicative become additive, this enables components parts of signals to be separated more easily. For example speech spectra can be thought as a product between the spectra of the speech source and the vocal tract. The vocal tract produces resonances or ‘Formants’, by computing the cepstrum the formats and speech source components can be separated out, where low ‘quefrency‘ components represent the spectral envelop of formants and higher components represent the speech source.
Therefore the Mel-frequency cepstrum is a representation of the spectral envelope of a signal where the frequency scale is warped to be representative of the human auditory system. Typically this reduces the data in a 20 ms wind sampled at 44.1 kHz from 1102 samples to 12 MFCCs. This is a very efficient representation and much of the salient information is preserved.
So for the past few months have been have been investigating microphone wind noise. We choose microphone wind noise as this came very high in our online survey into the main issues that can degrade the audio quality. The survey is ongoing so do please take the time to carry it out if you are interested.
To investigate microphone wind noise, the first task is to understand how it is generated. Luckily has already been significant research that has already been carried out to this aim, so a thorough literature review was carried out. The dominant source of wind noise in outdoor microphones are turbulent velocity fluctuations in the wind which interact with the microphone and are converted to pressure fluctuations. There are other less significant factors which can contribute, for example when the microphone is embedded in a device and this is placed in a flow, this can cause vortex shedding and other resonant type behaviors. Continue reading