National Science Foundation (NSF)

Project Team Members:

Northwestern University - EECS Dept.

Understanding, Analyzing, and Retrieving Knowledge from Social Media


Social Media has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter, Customer reviews (e.g. Amazon, CNET) and Blogs (e.g. WSJ Blogs), there is vast amount of user-generated content. Our goal is to retrieve valuable nuggets of knowledge from this huge amount of data and help users make informed decisions. To achieve this goal, the specfic objectives of the project are as follows.

Our current progress on this project can be categorized in two broad categories.

Community Mining

The rapid evolution of modern social networks motivates the design and understanding of networks based on users' interest. Using popular social media such as Facebook and Twitter, we present new perspective to bring out more meaningful information about the networks. Instead of using traditional user networks of Facebook and Twitter, we deduce user interest based networks using posts, comments, and tweets. Our approach is able to closely capture relations found in static networks and also finds affiliations that are constantly evolving either due to temporal or spatial activities. Further, we develop a new approach for mining communities to understand and analyze the structure of social networks. Our user-interest based model and community extraction algorithm together can be used to identify target communities in the context of business requirements. Figure 1 shows several such focused communities that belong to categories such as Technology (c231), Consumer Merchandize (c232), Retail (c243), Travel and Leisure (c244), Food (c248) and Baby Products (c251). We got many more interesting focused communities. Note that using the existing approaches, most of these focused communities belong to one large community, which does not reflect the structure of the network.

Figure 1. Partial dendrogram showing communities in Facebook.

In addition, we also explore newer methods to extract communities from a given network. In many real networks vertices may belong to more than one group, and such groups form overlapping communities. Classical examples are social networks, where an individual usually belongs to different circles at the same time, from that of work colleagues to family, sport associations, etc. Finding such overlapping community is a challenging problem, and is not supported by traditional community detection algorithms. We devised a hierarchical pruning based heuristic algorithm for finding the maximum clique for a given graph. The details on the maximum clique algorithm and source code are available online. We are currently working on extending the clique algorithm to identify the communities for massive networks. Using this, we also devised a clique-based community detection algorithm which is capable of finding overlapping communities. Figure 2 shows some of the communities detected. We see two isolated communities, one for popular singers, and another for retail chains and products. We also see a community for news channels and politics, and a community of MSNBC and popular TV shows. The highlight of our algorithms is that it allows a node to be a member of more than one community giving an overlapping community structure. Although the "news channels and politics" and "MSNBC and tv shows" communities are not directly related and have different members, they share a common member.

Figure 2. Some Facebook communities detected by our clique-based community finder.

Text Mining

Enormous amounts of messages get published each day on social media sites. For example, Twitter processes 230 million tweets (messages that are 140 characters long) a day (twitterstats). The explosion of textual messages can cause information overload. Our goal is to design systems that can analyze and summarize social media content. The current work encompasses two main themes:


Related Work

DiscKNet: Discovering Knowledge from Scientific Research Networks.

Source Code

Fast algorithms to find maximum clique in massive graphs

Sentiment Service (API)

Register for trial version of sentiment service (API)


Facebook comments, Tweets, Amazon reviews sample datasets

Northwestern University EECS Home | McCormick Home | Northwestern Home | Calendar: Plan-It Purple
© 2011 Robert R. McCormick School of Engineering and Applied Science, Northwestern University
"Tech": 2145 Sheridan Rd, Tech L359, Evanston IL 60208-3118  |  Phone: (847) 491-5410  |  Fax: (847) 491-4455
"Ford": 2133 Sheridan Rd, Ford Building, Rm 3-320, Evanston, IL 60208  |  Fax: (847) 491-5258
Email Director

Last Updated: $LastChangedDate: 2015-02-22 10:04:49 -0600 (Sun, 22 Feb 2015) $