A couple of months ago, I left my job as a math professor and joined Insight Data Science in order to start a new career in data science. Why on earth would I do that, you ask? I enjoyed my job as a professor, and I was even (arguably) good at it. I got to do fun things with students. The truth is simply that I wanted to work on real problems with good people, and Insight has been so much fun, that I am sad now that it is coming to a close (but I am excited to see where I will land next).
Our first task at Insight is to select a project. There were some consulting project available, but I ended up deciding to strike out on my own, trying to better understand trolls on twitter. I’ve been a twitter user for several years, and I am well aware of the problems that trolls cause for users of twitter and other social media.
I decided I wanted to take a machine learning approach to finding trolls. My first stab at the problem was to use shared block lists. Twitter allows users to block other users, and in the past couple of years, Twitter has started allowing users to share these block lists. There is one service helping with this sharing called @blocktogether. So I collected multiple blocklists (as a result, I currently block 250K users), and my hope was that I could use these to curate a list of trolls. I was able to identify a number of interesting things about blocked users vs non-blocked users. For instance, blocked users tweeted less per day on average but had more followers, so apparently we are not following the rule of “don’t feed the trolls.”
A few things got in the way — I found that the lists coming from @blocktogether cut off the number of users they would allow me to download. But even worse, I began to suspect that even without my download problems, I would still struggle to find a sufficient number of users on large numbers of blocklists (twitter is a very big place after all). I would also have no way of knowing if a user appeared on multiple blocklists simply because the blocklists were being shared, rather than because users independently decided to block them.
So, I decided I had to come up with my own definition of trolling and to use a rules-based approach to identify trolls. My first stab at that was to say that trolling was repeatedly mentioning a user in a tweet with a negative sentiment. I was able to identify these trolling tweets, and I made a machine learning model to predict which users were trolls. The model did fine, but as I looked through the users identified as trolls, I realized that I was really doing a good job of finding arguments on twitter, but a less good job of finding users I would truly consider trolls.
As I thought about the problem, and I landed on one older example of trolling that helped me change my perspective. This was the case of Robin Williams daughter who posted a tribute to her father and was set upon by a pair of trolls telling her things like “I hope you choke and have to scream out for help but no one helps you and they just watch you choke.” I looked back on other examples I had of real trolling and realized that most of them involved saying “you” — these were personal attacks that needed the second person pronoun. At that point, I changed my criteria for trolling to be that there were at least two mentions of the same person, that they had negative sentiment, and that they used the word “you” (or similar words like “your”).
With this criteria, I was able to find a number of users that I consider trolls. These were users that were on my blocklist, who also engaged in trolling behavior in their last 200 tweets. That is, they mentioned a specific user at least twice, used you language in those tweets, and those tweets had negative sentiment (I used the Vader python package to get the sentiment). Out of about 10K users on blocklists, I found that 44% had trolled by this definition. I also needed “human” tweets, so I grabbed a random collection on people who were on positive lists (so that they were about as engaged as the trolls) using the twitter API. It turns out that about 7% of those had engaged in trolling behavior, so I threw those out.
From these categorized users, I collected tweets without mentions (since I had used the mentions to determine if they were trolls or not). I split those tweets off into a training a test set so that I could see if I could predict the trolls from their no-mention tweets. I turned their tweets into a bag of words and bigrams, vectorized them using TF-IDF and fed the vectors into a logistic regression model. My model was a pretty good predictor, and gave me the collection of words that were most predictive, which was illuminating. Basically, trolls are talking about gamergate (still!) and politics.
Now, here are my words of caution. I need to update this model regularly, because the topics trolls are talking about are going to change, so my model needs to change to catch todays trolls. I also need to still refine my model a bit. My model looks for negative sentiment, but saying “I’m really angry at you” isn’t trolling, it’s expressing a negative feeling. I need to search for hate-speech, and I have not yet implemented this in the model. I would also love to do some test to see how little text I can use and identify a troll. What if I just grabbed a couple of tweets plus the users description. Is that enough? And finally, if I wanted to really deploy this model to catch trolls, I need to be sure that it errs on the side of avoiding false negatives in order to protect the free speech of twitter users. I have not explored different thresholds for this model (although note that it has a nice ROC curve).