Incentives, Cheating Teachers and Suspicious DNK Accounts
Last week Steve Levitt was in Jacksonville and did a lecture of sorts at a local university. It was really good and like his Freakonomics book, pretty darn thought provoking. Steve's lecture consisted mostly of him telling really interesting stories, stopping to point out some of the economic principles along the way. He only briefly touched on it in the lecture, but his book has a chapter where he talks about a group of teachers that were caught cheating in the Chicago's public school system. Here is the abstract for a paper Steve published on the subject ...
We develop an algorithm for detecting teacher cheating that combines information on unexpected test score fluctuations and suspicious patterns of answers for students in a classroom. Using data from the Chicago Public Schools, we estimate that serious cases of teacher or administrator cheating on standardized tests occur in a minimum of 4-5 percent of elementary school classrooms annually. Moreover, the observed frequency of cheating appears to respond strongly to relatively minor changes in incentives. Our results highlight the fact that incentive systems, especially those with bright line rules, often induce behavioral distortions such as cheating. Statistical analysis, however, may provide a means of detecting illicit acts, despite the best attempts of perpetrators to keep them clandestine. - Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating
I thought it was interesting example of how incentives can effect behavior. Teachers had a financial incentive to cheat - the better the students scored, the bigger the teachers paycheck. And guess what ended up happening - some teachers were cheating. Steve uncovered a pattern that occurred in the standardized test data that he used to identify the teachers (the cheating teachers would answer the last 10 questions or so for students that couldn't answer all of the questions in the given time period).
Identifying Suspicious DotNetKicks Accounts
I know its not quite as sensational as catching cheating teachers, but I thought it might be interesting to use a similar concept to identify phony DotNetKicks accounts (DNK is a lot like digg and reddit, except it focuses on .Net topics and there aren't any down mods). Getting 'kicked' to the DNK homepage can result in a nice boost to traffic as well as a quality incoming link, so there is clearly an incentive to get your posts on the front page. And it only takes 6 kicks total, so you wouldn't really need to setup too many fake accounts to do this. I figure there must be at least one person out there who has setup a handful of fake accounts and uses them to kick their stories to the homepage. So what the heck, why not try to identify them.
So I used DNK's JsonServices to extract the last ~1,500 front page and upcoming page stories (dating back to the end of March 2008). Then I compiled a list of the users who kicked these stories and moved all of this data into 3 tables my local SQL Server: User, Story and Kicked.
Then I placed the kicks into 2 buckets - promotion kicks and supporter kicks. A promotion kick is a kick for a story that helps it land on the front page (so it is one of the stories first 6 kicks). A supporter kick is a kick for a story has already made it to the DNK front page. Then, to identify a list of suspect user accounts, I broke down the Promotion Kicks by domain to try to see what accounts have strong relationships to what domains. If 70% of a user promotion kicks go to a single domain, I mark them as a candidate for a phoney account. Nothing too scientific, but it was interesting to see the relationships that started to appear between accounts and domains.
Upcoming Story: A story that has 6 or fewer kicks
Front Page Story: A story that has more than 6 kicks
Promotion Kick: A kick for an Upcoming Stories
Supporter Kick: A kick for a Front Page Stories
Phony Account: A DNK account that with more than 70% of its promotional kicks going to a stories for a single domain
Here are some of the results. I was only interested in seeing if I could identify suspicious accounts and I don't really want to humiliate anyone publicly - so I am not publishing any individual account names or domains. Sorry.
The screen shots below are the results for a handful of queries, the table immediately below describes what values the columns hold.
|kicked_count||Total number of kicks by the user for the time period|
|promoter_count||Total number of promotion kicks for the time period|
|promoter_kicks||Total number of promotion kicks for the provided domain|
|supporter_kicks||Total number of supporter kicks for the provided domain|
|front_page_promoter_kicks||Number of promoter kicks that contributed in a story making it to the front page for the provided domain|
- 16 users have 100% of their promotion kicks going to a single domain
Check out the account with a kicked_count of 66. This guy has kicked a total of 66 stories, 45 of these kicks are promotional kicks and the remaining are supporter kicks. But ALL 66 of this guys kicks are for a single domain. He has never kicked a story for another domain.
- 17 users have between 80% and 99% of their promotion kicks going to a single domain.
Same stuff here, just a little bit more variance in the number of domains the user is supporting.
- 8 users have between 70% and 79% of their promotion kicks going to a single domain
- These 41 users have a strong relationship to 17 domains.
Its tough to show here because I have all of the domains hard-coded to xxxx.com, but if I group by the domain and sum up the measures, you will see that there 41 different user accounts are only kicking stories for 17 different domains.
- A total of 1,197 promotional kicks are represented by these 41 users.
That is roughly 13.5% of all promotional kicks over the observed time period.
There is an incentive for publishers that use DNK to setup phony accounts that will help get stories from certain domains on the front page. So guess what happens ... it would appear that a certain number of people are doing this.
That's it. Enjoy!