ChemXSeer: A Digital Library for Chemical Kinetics Data & Scientific Literature

Wednesday, January 7, 2009 - 9:39am

3:30 – 4:30
Computer Science Conference Room, Harold Frank Hall Rm. 1132


SPEAKER: Prasenjit Mitra
College of Information Sciences and Technology, Pennsylvania State

Title: ChemXSeer: A Digital Library for Chemical Kinetics Data &
Scientific Literature


Scientists have digital documents and experimental data that they want
to publish, link, and share. ChemXSeer is an ongoing NSF-funded project
that aims to establish a digital library for documents and data related
to chemical kinetics. This talk will introduce the architecture and
algorithms deployed for the following components: (a) Data extraction:
(i) TableSeer: This tool automatically identifies tables in digital
documents and extracts the contents in the cells of the tables. The
contents are stored in a queryable table in a database. TableSeer
extracts table metadata, and uses a novel ranking function to search for
tables relevant to user queries. (ii) Extraction of data from
two-dimensional plots in figures in digital documents using image
processing techniques, (b) Chemical Entity Search: We seek to enable
improved search capabilities for chemists. We propose a domain-specific
search engine deploying an extract-index-rank framework. Our tool
identifies chemical formulae and chemical names, disambiguates the terms
from other general terms using hierarchical Conditional Random Fields,
and tags them. Novel similarity scores, ranking functions and search
methods are introduced to enable searching for chemical entities.


Prasenjit Mitra is an assistant professor at the College of Information
Sciences and Technology at the Pennsylvania State University. He
received his Ph.D. in Electrical Engineering from Stanford University in
2004, and, a Master of Science degree in Computer Science at The
University of Texas at Austin in December 1994. His Bachelor of
Technology (with Honors) degree in Computer Science and Engineering was
received from the Indian Institute of Technology, Kharagpur in May,
1993. From 1995, he was a senior member of the technical staff at the
Server Technologies division at Oracle Corporation in Redwood Shores, CA
for five years developing database management systems software.