Talk (1): Men Also Like Shopping: Reducing Social Bias Amplification in Natural Language Processing
Machine learning techniques have played a major role in natural language processing systems. These techniques are often based on data-driven approaches, in which an automated system learns how to make decisions based on the statistics and diagnostic information from collected data. Despite these methods being successful in various applications, they run the risk of discovering and exploiting societal biases present in the underlying data. For instance, an automatic resume filtering system may inadvertently select candidates based on their gender and race due to implicit associations between applicant names and job titles, causing the system to perpetuate unfairness potentially. Without properly quantifying and reducing the reliance on such correlations, broad adoption of these models can work to magnify stereotypes or implicit biases. In this talk, I will describe a collection of results that quantifying and reducing gender bias in natural language processing models, including word embeddings and models for visual semantic role labeling and coreference resolution.
Talk (2): Jointly Learning Representations for Low Resource Information Extraction
There is abundant knowledge out there carried in the form of natural language texts, such as social media posts, scientific research literature, medical records, etc., which grows at an astonishing rate. Yet this knowledge is mostly inaccessible to computers and overwhelming for human experts to absorb. Information extraction (IE) processes raw texts to produce machine understandable structured information, thus dramatically increasing the accessibility of knowledge through search engines, interactive AI agents, and medical research tools. However, traditional IE systems assume abundant human annotations for training high quality machine learning models, which is impractical when trying to deploy IE systems to a broad range of domains, settings and languages. In this talk, I will present how to leverage the distributional statistics of characters and words, the annotations for other tasks and other domains, and the linguistics and problem structures, to combat the problem of inadequate supervision, and conduct information extraction with scarce human annotations.
Kai-Wei Chang is an assistant professor in the Department of Computer Science at the University of California, Los Angeles. He has published broadly in machine learning and natural language processing. His research has mainly focused on designing machine learning methods for handling large and complex data. He has been involved in developing several machine learning libraries, including LIBLINEAR, Vowpal Wabbit, and Illinois-SL. He was an assistant professor at the University of Virginia in 2016-2017. He obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2015 and was a post-doctoral researcher at Microsoft Research in 2016. Kai-Wei was awarded the EMNLP Best Long Paper Award (2017), KDD Best Paper Award (2010), and the Yahoo! Key Scientific Challenges Award (2011). Additional information is available at http://kwchang.net.
Nanyun Peng is a computer scientist at Information Science Institute. She got her Ph.D at Johns Hopkins University under the supervision of Professor Mark Dredze. She is broadly interested in Natural Language Processing, Machine Learning, and Information Extraction. Her research focuses on using deep learning for information extraction with scarce human annotations. Nanyun is the recipient of the Johns Hopkins University 2016 Fred Jelinek Fellowship. She holds a master's degree in Computer Science and BAs in Computational Linguistics and Economics, all from Peking University.