Probase: Building a Probabilistic Ontology from the Web

Monday, August 9, 2010 - 10:41am


Tuesday, Aug 17. 2010
11:00 – 12:00 PM
Computer Science Conference Room, Harold Frank Hall Rm. 1132

HOST: Amr El Abbadi

SPEAKER: Haixun Wang
Microsoft Research

Title: Probase: Building a Probabilistic Ontology from the Web


The dream of the Semantic Web is to develop one ontology in one
language covering everything that exists. The reality is that we
have a large number of ontologies that each focuses on a small
domain, and are extremely hard to integrate. Recently, a lot of
interest has been devoted to universal ontologies, either
automatically constructed or built by community effort. However,
they still have limited scope. For example, Freebase, the most well
known taxonomy built by community effort, contains about 1,500
concepts, which is far cry from “covering everything that exists.”
In this talk, I will present a universal, probabilistic taxonomy that
is more comprehensive than any of the existing taxonomies
today. Currently, it contains over 2 million concepts harnessed
automatically from a corpus of 1.68 billion web pages and 2 years’
worth of search log data. Unlike traditional knowledge bases that
treat knowledge as black and white, it enables probabilistic
interpretations of the information it contains. The probabilistic
nature also enables it to incorporate heterogeneous information in a
natural way. We present the detail of how the core taxonomy, which
contains hypernym-hyponym relationships, is constructed, and how it
models knowledge’s inherent uncertainty, ambiguity, and
inconsistency. I will aslo discuss potential applications, e.g.,
understanding user intent, that can benefit from the taxonomy.


Haixun Wang joined Microsoft Research Asia in Beijing, China in 2009,
and he leads research in database, semantic web, graph data processing
systems, and distributed query processing. Before joining Microsoft, he
had been a research staff member at IBM T. J. Watson Research Center for
9 years. He was Technical Assistant to Stuart Feldman (Vice President of
Computer Science of IBM Research) from 2006 to 2007, and Technical
Assistant to Mark Wegman (Head of Computer Science of IBM Research) from
2007 to 2009. He received the Ph.D. degree in computer science from the
University of California, Los Angeles in 2000. He has published more
than 120 research papers in referred international journals and
conference proceedings. He was PC Vice Chair of KDD’10, ICDM’09, SDM’08,
and KDD’08, and he served as demo/workshop/sponsor Chair of various
conferences, including SIGMOD’08, ICDM’08, ICDE’09, ICDM’11, etc. He
serves on the editorial board of IEEE Transactions of Knowledge and Data
Engineering (TKDE), and Journal of Computer Science and Technology
(JCST). He is an adjunct professor of Nanjing University and Renmin
University of China.