My current research focuses on designing dynamic graph analysis and modeling. In the past, I worked on several projects, including massive social graph embedding, differential privacy in graphs and crowdturfing measurement analysis.
Dynamic graph analysis and modeling
Data confidentiality policies at major social network providers have severely limited researchers’ access to large-scale datasets. The biggest impact has been on the study of network dynamics, where researchers have studied citation graphs and content-sharing networks, but few have analyzed detailed dynamics in the massive social networks that dominate the web today. We study the growth and evolution in large social graph, e.g. Renren. By understanding the dynamics in this network, we try to identify the key dynamic features and then model the dynamic graphs.
Crowdturfing measurement and analysis
Popular Internet services in recent years have shown that remarkable things can be achieved by harnessing the power of the masses using crowd-sourcing systems. However, crowd-sourcing systems can also pose a real challenge to existing security mechanisms deployed to protect Internet services. We crawled the two largest crowdturfing websites, zhubajie and sandaha . We measure the work flow and revenue flow in the two crowdturfing systems. With all the tasks history and related social network data, we analyzed information cascade on OSNs performed by crowdtrufing systems. By implementing end-to-end experiment, we measured the effectiveness of crowdturfing systems.We published our measurement work in WWW 2012
Pygmalion: Differential privacy on graphs
Continuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user data to the public. Unfortunately, current techniques to improve privacy on graphs only target speciﬁc attacks, and have been proven to be vulnerable against powerful de-anonymization attacks. We develop a differentially private graph model, Pygmalion. This model extracts a graph's detailed structure into degree correlation statistics, introduces noise into the resulting dataset and generates a synthetic graph. This model can make tradeoff between strength of privacy protection and maintaining structural similarity to the original graph by tuning the noise added into the graph structure statistics. We published one paper in IMC 2011.
Rigel: Massive social graph embedding
Analysis of large networks is a critical component
of many of today’s application environments. The arrival of massive network graphs
with hundreds of millions of nodes presents
a unique challenge to graph analysis applications. Most of these
applications rely on computing distances between node pairs,
which for large graphs can take minutes to compute using
traditional algorithms such as breadth-ﬁrst-search (BFS). In this project, we explore the design space
of graph coordinate systems, a new approach that accurately
approximates node distances in constant time by embedding
graphs into coordinate spaces. We show that a hyperbolic
embedding produces relatively low distortion error, and propose
Rigel, a hyperbolic graph coordinate system. Rigel produces
signiﬁcantly more accurate results than prior systems, and is
naturally parallelizable across compute clusters, allowing it to
provide accurate results for graphs up to 43 million nodes. And Rigel’s
functionality can be easily extended to locate shortest paths between node
pairs. This system has been used by Renren, Zynga, Google.