Following the he(a)rd: How much should we trust the crowd when it comes to quality in audio?

Audio quality research often involves manipulating a known facet of a recording (such as distortion level, bit rate, and so on) and seeing what effect it has on people’s ratings of quality. Unfortunately however, the simple act of requesting a rating of quality can change the way people would normally listen to the recording. Recently we’ve been considering alternative ways of approaching this problem.

If, for instance, we could find another measure that predicted quality reasonably enough we might not have to ask directly for people’s ratings. And if this implicit measure of quality could be found quickly and freely, in data that already exists, we might have any number of new and exciting avenues to pursue.

With this in mind, we are currently running a range of experiments to explore how well quality ratings are related to other common metrics found on the web – measures such as the number of views of a YouTube video, the number of ‘likes’ on Soundcloud, or the number of downloads on FreeSound, and so on.  (In fact, if you’ve a few minutes to spare, I’d encourage you to go and take part in our experiments by clicking here to compare footage from Glastonbury, and here to compare nature sounds.)

An important assumption of this approach is that there is some intrinsic quality in recordings which over time causes them to become more popular than other equally available samples. In other words, we assume that the cream will always rise to the top.

But do some recordings really rise to the top solely because they are of higher quality? When you search for something on YouTube what is that influences your decision to select one particular video out of tens, hundreds, maybe thousands of possible alternatives? Usually in this scenario it is most convenient to put our trust in the wisdom of the crowd: we don’t have time to wade through all the videos on offer, we simply look at which clips are the most viewed already, and assume that they are the clips most worth watching. But how then do we know that popular videos become popular because they are of superior quality and not simply because of a feedback loop where more and more people click on popular videos, however good (or bad) they happen to be?

  • Charlie Bit My Finger: The most viewed YouTube video of all time. But is it also the best? (Or at least the best in the ‘category’ of babies biting fingers…)

I recently stumbled across an interesting paper (Salganik & Watts, 2008 ; NB. interested readers should also see Salganik, Dodds & Watts, 2006) where the authors investigated pretty much exactly this question.  The paper is quite dense but at its core is an elegantly simple question – how is a song’s future popularity affected when we provide false information about how popular it has been in the past?

Salagnik and colleagues asked 48 unsigned, unknown, bands to provide a song for a new website which would be free for anyone to stream or download.  Visitors to the site were free to play, rate, and download however many or few of the songs as they wanted.  Over time some songs emerged as more popular than others and the list of songs took the shape of a music chart.  Each song was presented in rank order of the number of downloads (download numbers were presented alongside the songnames) and after a few hundred visitors to the site a fairly stable rank order was established.

So far, pretty standard. But this is where things get interesting. At this point visitors to the site were randomly assigned to different versions of the rankings. One version was business as usual – no change in presentation of the songs occurred. In an alternate version the song names in the rankings were completely inverted. So the song which was least popular, #48, swapped places with #1 at the top of the charts. Song #47 swapped places with song #2, and so on.

To the naive participant, the songs which had previously had the fewest downloads now appeared to be the most downloaded, and vice versa. So, if people choose what to download on the basis of which is ‘best’ in terms of quality, ranking should be unimportant – we should still observe #48 being downloaded more than #1 and so on. Alternatively, if people are choosing what to download on the basis of the social influence of what other people (appear to) download most, then we should observe the songs at the top of the charts retain their positions despite previously being the least downloaded of all.

What was actually observed was something of a mix of these two scenarios.  Broadly, download behaviour was influenced by both the intrinsic quality of the songs and the social cues provided by the herd. The songs which were previously least popular received a large boost in their download numbers once they occupied the positions at the top of the chart. However the songs which were previously most popular defied their new lowly rankings and also continued to be downloaded at a decent rate. The overall effect was to distort the original rank order of the songs entirely – the correlation between the real-world rankings and the rankings after inversion was very poor.

Several other interesting findings also emerged from the experiment.  The overall number of plays and downloads decreased markedly after the rankings were switched, likely because visitors were using the first few songs as the benchmark for quality of the rest of the chart. Having assumed the standard would further decrease moving down the list, people would persevere through fewer tunes.  This said, the relationship between likelihood of a visitor listening to a song was not a simple linear association with its rank position. Unsurprisingly, the top tunes were more likely to be listened to than middle-ranking tunes (about 6 times more likely), but the very bottom tunes were also 3 times more likely to be listened to than those in the middle.  The authors of the study suggest this might reflect a form of anti-conformism (a deliberate rejection of the behaviour of the crowd) or could simply reflect the relative saliency of the top and bottom positions of the list compared to positions in the middle.

The experiment perhaps implies that raw number of plays (or listens, or downloads, etc) may be less informative than the proportion of those who played the sample and also went on to ‘like’ (or dislike) it, download it, share it, and so on.  This is something we will consider in our search for a reliable alternative metric for estimating quality. Two YouTube clips might both have a million views, for example, but if one has 200,000 likes compared to the other with only 2000 likes that could be highly informative.

Finally, another study in the news lately seems relevant and worthy of a brief mention in this post.  Mann, Faria, Sumpter and Kraus (2013) investigated the dynamics of social cues (or social ‘contagion’ as they describe the phenomenon) when audiences applaud.  Why is it that some people applaud longer than others? How do we judge how long to clap for?  In that study they applied many different models to some real world observations of an audience’s applause after different events. It was discovered that the most important factor in how likely an individual is to start clapping is simply how many other people in the audience are already clapping. The same simple principle applied in reverse for cessation of applause.

Spacial proximity was not found to be important but rather a more general feature such as the acoustic cue of volume of clapping in a room. In other words, it didn’t matter so much if the person next to you was clapping or not but roughly how many people in the whole room were clapping. But most interestingly, with relation to the discussion above, the authors note that “randomness in the audience interactions can sometimes result in unusually strong or weak levels of appreciation, independent of the quality of the presentation” [emphasis mine].

The strongest influence on how long an audience applauds for is simply the social influence of the crowd, not necessarily how good the performance was.  When it comes to whether or not we download a free song however, it seems wise to put some trust in the crowd but not to forget to consult our own ears as well.  We’ll report back with our own findings soon…



Skip to the end…

  • We are investigating whether other, indirect, measures can help us learn something about quality. You can help us out by taking part here and here.
  •  When researchers manipulated download numbers visitors to a website were influenced enough to start downloading previously unpopular songs.
  • However, the previously most popular songs continued to be downloaded at a decent rate, despite appearing to be the least popular.
  • Charts of downloads, listens, and so on appear to be the outcome of a combination of both social influence and genuine intrinsic quality in samples.