Projects
-
Parker: Storing the Social Graph
A scalable distributed storage system designed specifically for social networking applications. It addresses the scalability and performance issues common in other approaches. A specialized API makes application programming easy and less prone to error. Optimal routing techniques provide high performance and high availability in the face of increasing load as well at machine failures.
Paper [PDF] - Presentation [PPT] -
Scaling Into The Cloud
We analyze three platform-agnostic algorithms for scaling resources dynamically in the cloud based on load. In order to effectively compare these algorithms, we developed a novel scoring metric based on availability and the standard cost model provided by cloud hosting services.
Our results show that dynamic provisioning provides marked improvements over static allocation in terms of cost with minimal drop in availability. In addition, scaling algorithms that model past usage to predict future workloads tend to respond better to sharp changes in traffic.
Paper [PDF] -
Social Networking Spam: Will You Be My Friend?
We begin by discussing the recent trend towards spam on Social Networking web sites. We then study the performance of three types of classifiers in order to detect spam on these sites. While Naïve Bayes has been a mainstay in text-categorization and spam filtering, we compare its performance to a boosting algorithm and a decision-tree algorithm. While boosting algorithms show improvements over Naïve Bayes, decision-trees provide more accurate spam detection than both. Decision-trees are able to achieve strong results using a very small number of features.
Paper [PDF]