CMPSC 190I Introduction to Natural Language Processing

Quarter

Spring 2023

Instructor/s

William Wang

Course Type

Special Topics Course

Course Area

Intelligent and Interactive Systems

Enrollment Code

07898

Location

ILP 1101

Units

4

Day and Time

T/R 2-3:15pm

Course Description

Have you ever used intelligent virtual assistants such as OpenAI ChatGPT, Google Now, Apple Siri, Amazon Alexa or Microsoft Cortana? What are the technologies behind such systems? How did IBM's Watson beat top human Jeopardy players? Or are you just curious about how Google Translate works? Understanding human language is an important goal for Artificial Intelligence, and this course introduces fundamental theories and practical applications in Natural Language Processing (NLP). In particular, this course will focus on the design of basic machine learning algorithms (e.g., classification and structured prediction) for core NLP problems. The concentration of this course is on the mathematical, statistical, and computational foundations for NLP.

Throughout the course, we will cover classic lexical, syntactic, and semantic processing topics in NLP, including language modeling, sentiment analysis, part-of-speech tagging, parsing, word sense disambiguation, distributional semantics, question answering, information extraction, and machine translation. The parallel theme on machine learning algorithms for NLP will focus on classic supervised learning, semi-supervised learning, and unsupervised learning models, including naive Bayes, logistic regression, hidden Markov models, maximum entropy Markov models, conditional random fields, feed-forward neural networks, recurrent neural networks, and convolutional neural networks. Throughout the course, we will study the implicit assumptions made in each machine learning model and understand the pros and cons of these modern statistical tools for solving NLP problems. A key emphasis of this course is on an empirical and statistical analysis of large text corpora and distilling useful, structured knowledge from large collections of unstructured documents.

Prerequisites:
Good programming skills and knowledge of data structure (e.g., CS 130A)
Basic understanding about automata and parsing (e.g., CS 138)
Advance knowledge in machine learning (CS 165B), artificial intelligence (CS 165A), linear algebra, probability, and calculus.