At this point, you’ve probably already heard that the new FIveThirtyEight has some problems. It seems a little unsporting to pile on, but if Nate Silver is gonna be a dick, I’d like to make a nomination for FiveThirtyEight’s worst piece yet.
The piece, a blog post by Mona Chalabi, is titled “The Fastest Rapper In the Game” and attempts to identify that rapper. Although more light-hearted than who’s going to win the midterm elections or the NCAA tournament, it’s an empirical question where good data analysis would provide a satisfying answer. Sadly, despite coming to the reasonable conclusion that Twista raps faster than other rappers, the analysis is an utter disaster, making serious errors at nearly every step of the analysis process.
Before diving into her analysis, Chalabi shares a chart from Rap Genius’s Rap Stats (Rap Genius’s equivalent of Google N-Grams), comparing usage of the words “girls” and “money” over time, then makes the following critique:
“But here at FiveThirtyEight, it’s our job to spot axis labels. You’ll see that while the trends above may seem big, they actually represent less than 0.1 percent of all raps. That doesn’t bode well for Rap Genius research on artists’ rap speed.”
The reason the the axis stops at 0.1 percent is there are a lot of words in the English language, and even very common nouns like “girls” and “money” are going to amount to a tiny percentage of the total words used in all songs in Rap Genius’s database. FiveThirtyEight is blaming Rap Genius for the fact that rappers don’t have tiny vocabularies.
And what is this “Rap Genius research on artists’ rap speed” that FiveThirtyEight is so convinced is wrong? Here’s the next sentence:
“FiveThirtyEight is keen to develop accurate rap research, so we tested the 10 fastest rappers according to Rap Genius to see whether their rankings were justified.”
If you click that link, you’ll get a post on a Rap Genius forum by a user with the account name DeLillo. It’s a ranking of ten rappers, that user DeLillo makes clear is merely his opinion. To clarify, what FiveThirtyEight describes as research by Rap Genius, turns out to be some dude’s opinion on Rap Genius’s public forums.
Keep in mind we haven’t even begun the actual analysis. FiveThirtyEight takes DeLillo’s list of ten fast rappers and claims to look at each rapper’s most popular song of Rap Genius. They then go on to count the number of words in the song, and divide it by the song’s length. Chalabi briefly mentions one of the major issues with this methodology: that there are parts of songs in which a rapper isn’t rapping. This flaw is probably fatal, since most rap songs feature more than one rapper, a chorus, and lyric-free seconds of music, and the percentage of a song that features actual rapping by the lead artist probably doesn’t correlate with the speed at which said rapper raps (if anything, you’d expect faster rappers to get through their parts more quickly)
But Chalabi skirts over the biggest methodological problem: that determining the fastest rapper based on single song is akin to trying to pick the best player in the NBA by only looking at yesterday’s box scores. And since the metric she chose is such a poor approximation of speech rate, Chalabi isn’t just restricting her scope to yesterday’s box scores, she’s only looking at rebounds.
After baselessly criticizing Rap Genius (a site, it should be said, that is incredibly easy to criticize on merit), and presenting a completely useless methodology, the piece even fails to execute one its stated methodology. Chalabi claims to have picked the most popular song per Rap Genius for each artist, “excluding ones in which the artists were merely featured”. Chalabi doesn’t mention what song she uses for each artist on her list, other than for her winner, Twista, for whom she used “Mista Tung Twista”, but if you look at the most popular songs for Twista on Rap Genius, “Mista Tung Twista” is nowhere to be seen. According to Rap Genius, the most popular Twista song that he wasn’t merely featured on was “Slow Jams” which you probably think of as a Kanye song, but is also on Twista’s album Kamikaze.
There are more mistakes. Chalabi cites Rap Genius as her only source, but Rap Genius doesn’t appear to include song lengths on its website, so those numbers seem to have come from somewhere else. She also incorrectly labels Joey Bada$$ as an amateur rapper. It ends with two lines of rapping that are just as successful as the rest of the post.
Thank god FiveThirtyEight’s job is merely “to spot axis labels”.