Graph Coordinate Systems

    Node distance is a fundamental metric for graph analysis and network applications. Important node distances, such as shortest path and random walk distances, are computationally expensive and difficult to scale on big real graphs. To address the scalability problem, we propose an efficient and practical solution - graph coordinate systems. By embedding graphs into a coordinate space, our system can accurately estimate node distances seven orders of magnitude faster than traditional methods, which is in microseconds. Three papers about graph coordinate systems have been published in VLDB'13, CollaborateCom'11, and WOSN'10. Code and more details can be found here.

    This project is funded by DARPA. To our best knowledge, our system has also been deployed at several social networking and gaming companies, such as Renren, Zynga, and Google.


Dynamic graph analysis and models

(1) Analyze dynamics in a large online social network at multiple scales [IMC'12]

A number of interrelated processes drive dynamics in social networks, while many of the prior studies focused on capturing dynamics as a single process. In this work, we are interested in the question “how are individual user dynamics influenced by processes at different scales?” To answer this question, we conduct detailed analysis on the first 2-year dynamic trace of Renren network. We measure this dynamic graph at individual user level, user community level, and global network level. The measurement shows that user actions are not only driven by dynamic processes at user level, but also significantly influenced by the events at community and network levels.


(2) Self-similarity analysis on dynamic graphs [In submission]

This work aims to model both structural and temporal properties in dynamic social networks. Our work is inspired by self-similarity analysis on network traffic. Self-similarity refers to that the relative variance or volatility of traffic traces stays similar across different time scales. The existence of self-similarity in the observed social network dynamics will fundamentally change the way we view and consider dynamic graph models. In this work, we find that self-similar properties exit in the edge creation process of online social networks, including Renren and Facebook. Based on our observations, we propose a dynamic graph model that captures both temporal and structural properties in dynamic online social networks. This is the first model that can produce time-stamped network traces uniquely defining the formation and evolution of a social network.

The two projects are part of the project "graph dynamics", which is supported in part by NSF Proposal: IIS-1321083, titled "Analysis and Models of Social Network Structure, Growth and Dynamics."


Graph Privacy and Security

(1) Pygmalion: differential privacy in graphs [IMC'11]

Big real graphs usually contain sensitive information. Sharing these graphs often raise the risks of leaking users’ private data. Current anonymization techniques only address the privacy problems caused by specific attacks, and are vulnerable to powerful de-anonymization attacks. Our work seeks a solution to sharing meaningful graph datasets while preserving privacy. To navigate the tradeoff between strength of privacy and graph structural similarity, we design Pygmalion, a differentially-private graph model.


(2) Graph Watermarks [In Submission]

The owners of big real graph data are facing a real challenge: how to share sensitive graphs with collaborators and authorized users. Current solutions provide limit privacy, but require modifications to graph structure. In this work, we propose a novel solution, graph watermarks, which are small unique graphs tailored for a given graph and a secure user key, serving as a deterrent against data leakage. We provide robust schemes to create, embed, and extract watermarks. We also develop techniques to defend against potential attacks.


Analysis of Popular Complex Networks

(1) Crowdturfing Networks [WWW'12]

Crowdsourcing networks provide a new power to solve complex problems. How- ever, the success of the crowdsourcing networks also poses a real challenge to existing security mechanisms: malicious activities are generated by human beings instead of automated programs. In this work, we study malicious crowdsourcing networks, referred to as crowdturfing networks. Through measurement results from our detailed data collected from these crowdturfing systems, we find surprising evidence showing that not only do these systems exist in a number of countries, but also they are rapidly growing in both user base and total revenue. Our results suggest that campaigns on these networks are highly effective at reaching users, and their continuing growth poses a concrete threat to online communities both in the US and elsewhere


(2) Geosocial Networks
[Measurement results in HotNets-XII], [Modeling work in submission]

Mobile networking researchers have long searched for large-scale realistic mobility traces, which have remained elusive for both privacy and logistical reasons. Re- cently, researchers have begun to use geosocial mobility traces, such as Foursquare checkin traces, because of their availability and scale. But are these traces reflective of users’ true mobility patterns? In this work, we answer this question by performing a large user mobility study, and comparing a ground-truth of user mobility (via GPS data) to a Foursquare dataset for the same users. Our measurement results show that 75% of Foursquare checkin events are extraneous checkins generated by users to achieve in-system rewards, and checkin events only capture 10% of user actual visited locations. Based on this observation, we develop a method to detect extraneous checkins, and design an algorithm to recover missing locations.


Topology Generation for Fabric Computing Systems

[MSR Cambridge Intern project], [In submission]

Fabric computing sys- tems are complete self-contained rack-scale clusters that support general-purpose com- modity data center workloads. By connecting the switches co-located with processors in the fabric system, users can customize network topologies, such as 2D or 3D Torus. In this project, we analyze the performance of various workloads on a wide range of fab- ric topologies, and study the workload assignment problem on fixed fabric topologies. Through our simulation, we find that no single topology performs best, and workload optimization on a given topology provides limited improvement. To optimize the perfor- mance for a given workload, we propose a practical solution to customize the topology of a fabric computing system.