This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP). It develops an understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages.
Introduction to NLP: Knowledge in speech and language processing – Ambiguity – Models and Algorithms Language, Thought and Understanding.
Regular Expressions and automata: Regular expressions – Finite-State automata. Morphology and Finite-State Transducers: Survey of English morphology – Finite-State Morphological parsing – Combining FST lexicon and rules – Lexicon-Free FSTs: The porter stammer – Human morphological processing
Word classes and part-of-speech tagging: English word classes, Tag sets for English, Part-of-speech tagging, Rule-based and stochastic part-of-speech tagging, Transformation based tagging. Context-Free Grammars for English: Constituency, Rules and Trees – Sentence-level constructions, The noun phrase, Coordination, The verb phase and sub categorization, Auxiliaries, Spoken language syntax – Grammars equivalence and normal form, Finite-State and Context-Free grammars, Grammars and human processing.
Features and Unification: Feature structures, Unification of feature structures, Features structures in the grammar, implementing unification, Parsing with unification constraints – Types and Inheritance. Lexicalized and Probabilistic Parsing: Probabilistic context-free grammar, problems with PCFGs – Probabilistic lexicalized CFGs – Dependency Grammars – Human parsing.
Representing Meaning: Computational desiderata for representations, Meaning structure of language, First order predicate calculus, linguistically relevant concepts – Related representational approaches, Alternative approaches to meaning. Semantic Analysis: Syntax-Driven semantic analysis – Attachments for a fragment of English – Integrating semantic analysis into the early parser – Idioms and compositionality – Robust semantic analysis. Lexical semantics: relational among lexemes and their senses, WordNet: A database of lexical relations, The Internal structure of words, Creativity and the lexicon.
Word Sense Disambiguation and Information Retrieval: Selectional restriction-based disambiguation – Robust word sense disambiguation, Information retrieval other information retrieval tasks. Natural Language Generation: Introduction, Architecture, Surface realization, Discourse planning. Machine Translation: Language similarities and differences, The transfer metaphor, The Interlingua idea: Using meaning, Direct translation, Using statistical techniques – Usability and system development.