Landing : Athabascau University

Handful of “highly toxic” Wikipedia editors cause 9% of abuse on the site | Ars Technica

Reporting on a very interesting study (linked from the article) on aggressive comments on Wikipedia pages, revealing that less than half of all personal attacks on the site are from anonymous users. Although it is true (as conventional wisdom suggests) that anonymous comments are six times more likely to be attacks than those from identified users, over half the attacks came from logged in users. In fact, 30% came from registered users that had made over 100 contributions to the site, and 9% came from just 34 vocal and aggressive users.

Wikipedia is about collections of people inhabiting a shared virtual space, bound together by nothing much other than shared interests - what Terry Anderson and I describe as sets, as opposed to networks or groups. Unlike in a social network, it is the subject and not the people that attract, although this varies a little from page to page: some pages are collaboratively edited by identifiable groups, a few drive/are driven by social ties, and there are some people that just like editing Wikipedia in general. It's a complex social site with many layers, motives, and sub-communities, so there are lots of patterns at play here, but the set is the dominant social form. Wikipedia's highly structured processes play some of the role of conventional group rules and structures to bring some order to that, especially to guide people towards consistent and reliable outcomes, but the neworked social ties or commitments to anything group-like are generally very weak, whether or not people are identifiable. Although reasons for engaging with Wikipedia are diverse and multi-faceted, common sense suggests that a majority of those that edit pages - especially those that contribute a lot - probably feel quite strongly about their topics. A combination of strong feelings and weak social ties is a bit of a recipe for anger, in the event of disagreements about the content of the pages. It would be interesting as a follow up to this study to see whether there is correlation between amount of editing of a page and a tendency to attack. I'm guessing there might be. Of course, there will be many other factors to consider: the study reveals that attacks are highly clustered, for instance. An attacking comment is 22 times more likely to occur near another attacking comment.

The study was performed as a benchmark to test the reliability of an algorithm to detect attacks, and to suggest ways it might be improved. The model used turned out to be pretty good, though I hope that it won't lead to a tool that automatically takes action without human intervention: like many analytics tools, this kind of system is useful when used to informate, not to automate.

The methodology used is interesting in itself. The researchers employed Crowdflower to outsource coding of Wikipedia comments to 4,000 workers, allowing each of 100,000 comments included in the study to be annotated by 10 different people. The coding is therefore likely to be very reliable. As the article notes, Wikipedia makes all talk-page comments from 2004-2015 available via Figshare, so anyone could perform analyses to extend this, test it, or seek other patterns.


  • Daryl Campbell February 15, 2017 - 12:57pm

    Hi Dr. Dron,

    Interesting topic. Your comment about having reserve if this was automated reminds me of Linus Torvald's concerns when his lieutenants wanted to use automated code merges. He resisted for a while and then wrote Git. It's become common place to just trust code merges now with high degree of confidence. 

    What concerns you with an automated system for detecting and responding to attacks when there are known patterns? Assume of course that there these changes still hit the Wiki history and there is some way to appeal. (I'm not enticing you to write this tool though ;-) )


  • Jon Dron February 15, 2017 - 1:54pm

    Machines might be very good at *identifying* problems, and that's great, but humans are needed to react to and deal with them: there are infinite possible ways to do that, and there are always vastly many opportunities to heal rifts, and make things positive again. It's about humans socializing with humans and the smartest AI in the world does not yet (and likely never will) know what it is like to be a human, so will not be able to respond creatively or appropriately to that unique social context.