Incentives, Cheating Teachers and Suspicious DNK Accounts

Last week Steve Levitt was in Jacksonville and did a lecture of sorts at a local university.  It was really good and like his Freakonomics book, pretty darn thought provoking.  Steve's lecture consisted mostly of him telling really interesting stories, stopping to point out some of the economic principles along the way.  He only briefly touched on it in the lecture, but his book has a chapter where he talks about a group of teachers that were caught cheating in the Chicago's public school system.  Here is the abstract for a paper Steve published on the subject ...

We develop an algorithm for detecting teacher cheating that combines information on unexpected test score fluctuations and suspicious patterns of answers for students in a classroom. Using data from the Chicago Public Schools, we estimate that serious cases of teacher or administrator cheating on standardized tests occur in a minimum of 4-5 percent of elementary school classrooms annually. Moreover, the observed frequency of cheating appears to respond strongly to relatively minor changes in incentives. Our results highlight the fact that incentive systems, especially those with bright line rules, often induce behavioral distortions such as cheating. Statistical analysis, however, may provide a means of detecting illicit acts, despite the best attempts of perpetrators to keep them clandestine. - Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating

I thought it was interesting example of how incentives can effect behavior.  Teachers had a financial incentive to cheat - the better the students scored, the bigger the teachers paycheck.  And guess what ended up happening - some teachers were cheating.  Steve uncovered a pattern that occurred in the standardized test data that he used to identify the teachers (the cheating teachers would answer the last 10 questions or so for students that couldn't answer all of the questions in the given time period).

 

Identifying Suspicious DotNetKicks Accounts

I know its not quite as sensational as catching cheating teachers, but I thought it might be interesting to use a similar concept to identify phony DotNetKicks accounts (DNK is a lot like digg and reddit, except it focuses on .Net topics and there aren't any down mods).  Getting 'kicked' to the DNK homepage can result in a nice boost to traffic as well as a quality incoming link, so there is clearly an incentive to get your posts on the front page.  And it only takes 6 kicks total, so you wouldn't really need to setup too many fake accounts to do this.  I figure there must be at least one person out there who has setup a handful of fake accounts and uses them to kick their stories to the homepage.  So what the heck, why not try to identify them.       

So I used DNK's JsonServices to extract the last ~1,500 front page and upcoming page stories (dating back to the end of March 2008).  Then I compiled a list of the users who kicked these stories and moved all of this data into 3 tables my local SQL Server: User, Story and Kicked.

image

 

Then I placed the kicks into 2 buckets - promotion kicks and supporter kicks.  A promotion kick is a kick for a story that helps it land on the front page (so it is one of the stories first 6 kicks).  A supporter kick is a kick for a story has already made it to the DNK front page.  Then, to identify a list of suspect user accounts, I broke down the Promotion Kicks by domain to try to see what accounts have strong relationships to what domains.  If 70% of a user promotion kicks go to a single domain, I mark them as a candidate for a phoney account.  Nothing too scientific, but it was interesting to see the relationships that started to appear between accounts and domains.   

 

Definitions

Upcoming Story: A story that has 6 or fewer kicks

Front Page Story: A story that has more than 6 kicks

Promotion Kick: A kick for an Upcoming Stories

Supporter Kick: A kick for a Front Page Stories

Phony Account: A DNK account that with more than 70% of its promotional kicks going to a stories for a single domain

 

Results

Here are some of the results.  I was only interested in seeing if I could identify suspicious accounts and I don't really want to humiliate anyone publicly - so I am not publishing any individual account names or domains.  Sorry.

The screen shots below are the results for a handful of queries, the table immediately below describes what values the columns hold.

Column Description
kicked_count Total number of kicks by the user for the time period
promoter_count Total number of promotion kicks for the time period
promoter_kicks Total number of promotion kicks for the provided domain
supporter_kicks Total number of supporter kicks for the provided domain
front_page_promoter_kicks Number of promoter kicks that contributed in a story making it to the front page for the provided domain

 

  • 16 users have 100% of their promotion kicks going to a single domain 

Check out the account with a kicked_count of 66.  This guy has kicked a total of 66 stories, 45 of these kicks are promotional kicks and the remaining are supporter kicks.  But ALL 66 of this guys kicks are for a single domain.  He has never kicked a story for another domain.    

image

  • 17 users have between 80% and 99% of their promotion kicks going to a single domain. 

Same stuff here, just a little bit more variance in the number of domains the user is supporting.

image 

  • 8 users have between 70% and 79% of their promotion kicks going to a single domain

image

  • These 41 users have a strong relationship to 17 domains.

Its tough to show here because I have all of the domains hard-coded to xxxx.com, but if I group by the domain and sum up the measures, you will see that there 41 different user accounts are only kicking stories for 17 different domains. 

image 

  • A total of 1,197 promotional kicks are represented by these 41 users. 

That is roughly 13.5% of all promotional kicks over the observed time period.

 

Conclusion

There is an incentive for publishers that use DNK to setup phony accounts that will help get stories from certain domains on the front page.  So guess what happens ... it would appear that a certain number of people are doing this.

 

That's it.  Enjoy!


TrackBack

TrackBack URL for this entry:
http://mattberseth.com/blog-mt/mt-tb.fcgi/160

Comments


Posted by: Kirk Clawson on November 5, 2008 07:52 PM

Interesting. Did you strip the domains of any further url info before this analysis? There are obviously community site (asp.net for instance) that have hundreds if not thousands of blogs. If I kicked ten of those blogs would that show up in your analysis as ten kicks for the same domain?

Interesting read

I bet my girlfriend is in that list.

Posted by: Seth Petry-Johnson on November 6, 2008 08:10 AM

I thought this was a thought-provoking and well-written post.

Of course, some people might create multiple accounts to not only kick up their own content, but also that of their favorite bloggers. If those people favor a handful of domains, rather than just their own, they won't show up in your analysis. But that's probably outside the scope of what you were trying to do in a single blog post...

My only question is, how many DNK accounts are you going to create to kick this up to the homepage? :)

Posted by: Jason on November 6, 2008 09:02 AM

very interesting spike. have you shared this with DNK directly? It would be in there best interest to reduce self promotion.

Nice article. Now would you like to be kicked? haha.

I admit to having multiple accounts on DotNetKicks. The trick isn't that you only have to use one account, it's that you don't multi-kick from each account. So no matter how many accounts I use, I only kick a story *once*. It's called "scruples", some have them, some don't.

http://define.com/scruples

I guess some of my colleagues are on the list because they probably kick only my posts... not phony accounts tho.

Posted by: PeterK on November 6, 2008 12:04 PM

Source code please :)

If I recall correctly, 6 kicks is not a hard and fast number. The "score" is based on kicks + comments, which may throw your analysis off.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Consulting Services

Yep - I also offer consulting services. And heck, I'll do just about anything. If you enjoy my blog just drop me an email describing the work you need done.

Recent Comments

  • John S. wrote: If I recall correctly, 6 kicks is not a hard and fast number. The "score" is based on kicks + commen...
  • PeterK wrote: Source code please :)...
  • Simone wrote: I guess some of my colleagues are on the list because they probably kick only my posts... not phony ...
  • Mike Hall wrote: I admit to having multiple accounts on DotNetKicks. The trick isn't that you only have to use one ac...
  • Scott wrote: Nice article. Now would you like to be kicked? haha....
  • Jason wrote: very interesting spike. have you shared this with DNK directly? It would be in there best interest t...
  • Seth Petry-Johnson wrote: I thought this was a thought-provoking and well-written post. Of course, some people might create ...
  • Dave Ward wrote: I bet my girlfriend is in that list....