Report ID
2016-02
Report Authors
Theodore Georgiou, Amr El Abbadi, Xifeng Yan
Report Date
Abstract

Understanding what social media users discuss and what is happening in the real world can be enabled through the automatic analysis and summarization of Online Social Media. Trend Discovery, through the extraction of trending topics, is utilized in marketing campaigns and by companies to identify customer interests and potential new markets. While there is a plethora of techniques to identify trending topics, there is a lack of focus on the characteristics of the underlying population that participate in a trend. Users that mention a topic define a multivariate vector of demographics and user characteristics with the potential to offer insights into the communities that are focused on a topic, both latent and obvious. We propose a novel algorithmic framework for the efficient and scalable extraction of the combination of user characteristics that define a community interested in a topic. Potential results include cases like "female residents of Boston, MA, that support the Democratic party, are focused on the activistic topic #FreeJustina" or "young adults, living in the US, are focused on topic #NavyYardShooting with a Negative sentiment". Such topics might be significantly popular or not, but community extraction emphasizes the importance and focused interest in a topic even if it is not as popular as other topics with a more defused audience. The proposed framework can support any number of attributes and scales linearly. We assessed our algorithm's accuracy and efficiency on synthetic and real Twitter data and results show high accuracy and efficiency.

Document
2016-02.pdf505.38 KB