Syllabus
Unit 1
Introduction- History of NLP, Study of Human languages, ambiguity, Phases in natural language processing, applications. Textual sources and Formats. Linguistics resources- Introduction to the corpus, elements in the balanced corpus, (examples -TreeBank, PropBank, WordNet, VerbNet, etc.) Word Level analysis – Regular expressions, Morphological parsing, Types of Morphemes. Tokenization, N-grams, Stemming, Lemmatization, Spell checking. Management of linguistic data with NLTK.
Unit 2
Syntactic Analysis – Lexeme, phonemes, phrases and idioms, word order, agreement, tense, aspect and mood and agreement, Context Free Grammar, and spoken language syntax. Parsing- Unification, probabilistic parsing. Part of Speech tagging- Rule-based POS tagging, Stochastic POS tagging, Transformation-based tagging (TBL), Handling of unknown words, named entities, and multi-word expressions.
Semantics Analysis- Meaning representation, semantic analysis, lexical semantics, WordNet -WordNet similarity measures., Synsets and Hypernyms, Word Sense Disambiguation- Selectional restriction, machine learning approaches, dictionary-based approaches.
Unit 3
Discourse- Reference resolution, constraints on co-reference, an algorithm for pronoun resolution, text coherence, discourse structure. Information Retrieval-Types of an information retrieval model, Boolean Model, Vector space model-Word2Vec, BERT, Improving user queries. Machine Translation – EM algorithm – Discriminative learning – Deep representation learning – Generative learning.
Applications of NLP- Machine translation, Document Summarization, sentiment Analysis, ChatGPT4
Textbook(s) / Reference(s)
Textbook(s)
Martin JH, Jurafsky D. “Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition”. Pearson Publication, Second Edition; 2013.
Reference(s)
- James A. “Natural language Understanding”, Second Edition, Pearson Education; 2002.
- Bharati A., Sangal R., Chaitanya V.“Natural language processing: a Paninian perspective”, PHI; 2000.
- Tiwary U S, Siddiqui T. “Natural language processing and information retrieval”. Oxford University Press, Inc.; 2008.
- Steven Bird, Ewan Klein, Edward Loper, “Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit” (O’Reilly 2009, website 2018)
Evaluation Pattern
Evaluation Pattern: 70:30
Assessment |
Internal |
End Semester |
Midterm |
20 |
|
Continuous Assessment – Theory (*CAT) |
10 |
|
Continuous Assessment – Lab (*CAL) |
40 |
|
**End Semester |
|
30 (50 Marks; 2 hours exam) |
*CAT – Can be Quizzes, Assignments, and Reports
*CAL – Can be Lab Assessments, Project, and Report
**End Semester can be theory examination/ lab-based examination/ project presentation