Twitter truth-telling

Slate reports on new research that raises the potential of machine-aided Twitter reading — that is, initial vetting of tweets for veracity, based on certain elements:

A 2010 paper from Yahoo Research analyzed tweets from that year’s 8.8 Chile earthquake and found that legitimate news—such as word that the Santiago airport had closed, that a supermarket in Concepcion was being looted, and that a tsunami had hit the coastal town of Iloca—propagated on Twitter differently than falsehoods, like the rumor that singer Ricardo Arjona had died or that a tsunami warning had been issued for Valparaiso. One key difference might sound obvious but is still quite useful: The false rumors were far more likely to be tweeted along with a question mark or some other indication of doubt or denial.

Building on that work, the authors of the 2010 study developed a machine-learning classifier that uses 16 features to assess the credibility of newsworthy tweets. Among the features that make information more credible:

– Tweets about it tend to be longer and include URLs.

– People tweeting it have higher follower counts.

– Tweets about it are negative rather than positive in tone.

– Tweets about it do not include question marks, exclamation marks, or first- or third-person pronouns.

Several of those findings were echoed in another recent study from researchers at India’s Institute of Information Technology who also found that credible tweets are less likely to contain swear words and significantly more likely to contain frowny emoticons than smiley faces.

But won’t many chronic Twitter liars simply absorb these lessons and tailor their tweets to trick the new algorithm? (Say that last part five times fast.)