Tuesday, 31 August 2010

Capturing Influence in Twitter

One of the most interesting questions in social networks is who influences whom. This question obviously applies also to Twitter. Politicians, activists, companies, celebrities, religious leaders or advertisers might use Twitter to spread short messages to the world in order to influence people in a way or another. Hence, one starts to be curious, who has much influence in the world of Twitter. Twinfluence.com is one renown example of an application trying to measure the influence of Twitter users. And by now, also researchers in IR and social sciences are looking at this field and have made some interesting observations.

But, before going into the analysis of who influences whom on Twitter, it is necessary to address the question of what influence actually means. Cha et al. [1] provide a definition taken from the Merriam-Webster dictionary: "influence is the power or capacity of causing an effect in indirect or intangible ways". Leavitt et al. [2] define influence in the context of Twitter "as the potential of an action of a user to initiate a further action by another user". All these definitions are still quite vague, but [3] states, that there actually is no crisp definition of the concept. Cosley et al. [3] consider the adoption of behaviour as an observable result of influence between users. However, they looked at Wikipedia as a social network of editors contributing to articles, not at Twitter. For Twitter it would be more difficult to define behaviours.

So, what did these studies look at, and how do they eventually define and measure influence?

The concrete definition of influence of Leavitt et al. [2] was based on actions. For Twitter they defined actions based on the tweets and came up with four categories: retweets (of the form: RT @username), replys (@username at the beginning), mentions(@username in the middle) and attributions (via @username in the middle). The authors also provide a further classification for the first two actions: retweets are more content oriented actions while replies are conversation oriented. This classification goes along a line of thought dividing the users in more content oriented and more conversation oriented users. Content oriented users spread their messages, the conversation oriented users employ Twitter for communicating with their followers.

Using these types of actions, Leavitt et al. look which of them occur among the followers of twelve popular Twitter users. For instance, they looked at the ratio of retweets, replies and mentions among all reactions to a particular user or at how often followers reacted with an content centric (retweet) or conversational response (reply) to the tweets of the popular users. The numbers were put in relation to the total number of original tweets provided by the observed users. As a result, they observed that some Twitter users that are highly involved in social media create more conversation with their followers. Others, mainly celebrities, have more passive followers, that rarely reply or retweet their messages.

Cha et al. [1] instead looked at a comparison between retweets (of both forms: RT @username and via @username), mentions and indegree of users. They state, that the indegree (so the number of followers) measures a user popularity, the number of retweets captures the ability to generate content that gets passed along and the number of mentions is the ability to engage others in discussion and conversation. An analysis with Spearman's rank correlation revealed, that indegree is not related to the other two factors, while retweets and mentions are stronger correlated. Cha et al. also analyzed the numbers of retweets and mentions on three well chosen topics and found out, that influence is different depending on the topic.

Romero et al. [4,5] incorporate the passivity of users in the calculation of influence. Passive users merely consume incoming messages, but do propagate information to the network. This means the influence of a user is determined not only by the number of followers, but also on the willingness of the followers to pass messages on.

In general the research on influence in Twitter is focusing on users and their social networks. The tweets themselves are merely considered as types of actions that reflect the influence a user has over his followers. This seems suitable to grasp the overall influence of a user, but not the influence of the tweets themselves.


[1] Measuring User Influence in Twitter: The Million Follower Fallacy
Meeyoung Cha, Hamed Haddadi, Fabricia Benevenuto, Krishna P. Gummadi, AAAI Conference on Weblogs and Social Media (ICWSM), 2010

[2] The Influentials: New Approaches for Analyzing Influence on Twitter
Alex Leavitt, Evan Burchard, David fisher, Sam Gilbert, Web Ecology Project, 2 September 2009

[3] Sequential Influence Models in Social Networks
Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Xiangyang Lan, Siddharth Suri, AAAI Conference on Weblogs and Social Media (ICWSM), 2010

[4] Ethan Bauley, What makes a tweet influential? New HP Labs social media research may provide answers http://h30507.www3.hp.com/t5/Data-Central/What-makes-a-tweet-influential-New-HP-Labs-social-media-research/ba-p/81855 ((August, 20th 2010)

[5] Daniel Romero, Wojciech Galuba, Sitaram Asur, Bernardo Huberman, Influence and Passivity in Social Media, 2010

Friday, 6 August 2010

Twitter Corpora

Applying to IR techniques to Twitter is very much in fashion right now. I can see several reasons why:
  1. Twitter is a successful phenomenon of the Web (2.0) . Millions of users write and read tweets. Hence, as there are lots of messages and as there are lots of users, there is a need of managing the information in twitter.
  2. Tweets are challenging for text IR. First of all, they are short, so the terms in a tweet are extremely sparse. Second, the messages are not formulated as well as classical documents. The scope of the medium, the length restrictions and the typically non-professional writers create a mixture of abbreviations, misspelling and slang expressions.
  3. Time plays an important role. Tweets typically report about something that happens now. Older tweets are outdated to an extend, that one could consider them irrelevant.
For research purposes, there are a few Twitter corpora out on the web. One is the corpus collected by Munmun De Choudhury. It contains tweets, information about users and the social graph of following relations. Another well known data collection from Twitter is the quite large social graph compiled by Haewoon Kwak. However, the latter does not contain any messages.

One thing I found missing was a corpus with messages of closely connected users. The collection of Munmun has several users and their tweets, but the connections between those users are not very dense. There are only a few users for which also more than ten followers are listed.

Analysing the graph and individual messages is interesting. But, the network of messages will probably be even more fascinating.