Research

My current research focuses on designing dynamic graph analysis and modeling. In the past, I worked on several projects, including massive social graph embedding, differential privacy in graphs and crowdturfing measurement analysis.

Dynamic graph analysis and modeling


Data confidentiality policies at major social network providers have severely limited researchers’ access to large-scale datasets. The biggest impact has been on the study of network dynamics, where researchers have studied citation graphs and content-sharing networks, but few have analyzed detailed dynamics in the massive social networks that dominate the web today. We study the growth and evolution in large social graph, e.g. Renren. By understanding the dynamics in this network, we try to identify the key dynamic features and then model the dynamic graphs.


Crowdturfing measurement and analysis


Popular Internet services in recent years have shown that remarkable things can be achieved by harnessing the power of the masses using crowd-sourcing systems. However, crowd-sourcing systems can also pose a real challenge to existing security mechanisms deployed to protect Internet services. We crawled the two largest crowdturfing websites, zhubajie and sandaha . We measure the work flow and revenue flow in the two crowdturfing systems. With all the tasks history and related social network data, we analyzed information cascade on OSNs performed by crowdtrufing systems. By implementing end-to-end experiment, we measured the effectiveness of crowdturfing systems.We published our measurement work in WWW 2012


Pygmalion: Differential privacy on graphs


Continuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user data to the public. Unfortunately, current techniques to improve privacy on graphs only target specific attacks, and have been proven to be vulnerable against powerful de-anonymization attacks. We develop a differentially private graph model, Pygmalion. This model extracts a graph's detailed structure into degree correlation statistics, introduces noise into the resulting dataset and generates a synthetic graph. This model can make tradeoff between strength of privacy protection and maintaining structural similarity to the original graph by tuning the noise added into the graph structure statistics. We published one paper in IMC 2011.


Rigel: Massive social graph embedding


Analysis of large networks is a critical component of many of today’s application environments. The arrival of massive network graphs with hundreds of millions of nodes presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS). In this project, we explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. And Rigel’s functionality can be easily extended to locate shortest paths between node pairs. This system has been used by Renren, Zynga, Google.
We published papers on WOSN 2010, CollaborateCom 2011. You can also check our project website (link) for more details.