Introduction to Natural Language Processing
News:
- 26.3. - A paper copy of exercise solutions will be distributed on the last lecture, after that you can pick one from my office
- 26.3. - Some hints for the exam are here and the exercise bonuses are here
- 19.3. - The tasks for the fifth exercise session (to be held 25.3.) are now available here.
- 19.3. - The slides for BioNLP were added
- 17.3. - The slides for Information Extraction were added
- 12.3. - The slides for word sense disambiguation were added
- 5.3. - The project assignment is now available here.
- 5.3. - The slides for text categorization were added
- 4.3. - The tasks for the fourth exercise session (to be held 11.3.) are now available here.
- 26.2. - The slides for information retrieval were added
- 26.2. - The tasks for the third exercise session (to be held 4.3.) are now available here.
- 20.2. - The slides for dependency and link grammars were added
- 13.2. - The slides for probabilistic context-free grammars were added
- 11.2. - The slides for feature structures and unification were added
- 11.2. - The tasks for the second exercise session (to be held 19.2.) are now available here.
- 6.2. - The slides for syntactic analysis were added.
- 28.1. - The slides for Hidden Markov Models and POS-tagging were added.
- 26.1. - The tasks for the first exercise session are now available here.
- 22.1. - The Thursday lectures will be held in Pharmacity auditorium since now on until further notice. Time remains the same.
- 20.1. - The slides for introduction, morphology, and n-gram models were released. Follow the link at the bottom of the page.
Schedule:
Thursday, 10-12, Auditorio (Pharmacity) (It's the big auditorium in Pharmacity's ground floor.)
Friday, 12-14, Etäluokka (2138, DataCity) [map]
First lecture: Thu, 22.1.2004
Last lecture: Fri, 26.3.2004
Lecturer:
Filip Ginter
Literature:
The Jurafsky and Martin book is available in three copies in the library (two copies in the course-book library, one copy in the humanities library). The Manning and Schütze book is available (one copy) in the IT library in DataCity.
Exam
Some hints for the exam are available here.
Exam dates:
Exercises
The bonus points earned in the exercises are here. The exercise solutions will be distributed
on the last lecture, after that you can pick a copy from my office.
Grading:
You can gain up to 100 points during the course: 20 points for the project and 80 points in the examination. You must gain at least 50 points to pass the course. The points are translated to grades according to the following table:
| points | grade
|
| 0-49 | failed
|
| 50-55 | 1
|
| 56-61 | 1.25
|
| 62-66 | 1.5
|
| 67-72 | 1.75
|
| 73-77 | 2
|
| 78-83 | 2.25
|
| 84-88 | 2.5
|
| 89-94 | 2.75
|
| 95- | 3
|
Project:
A reasonably-sized project will be assigned during February. The project will be a programming assignment. A project that fulfills all the requirements receives 20 points. A project that clearly goes beyond the requirements and exhibits some creativity will receive up to 20 bonus points. The project assignment is here.
Lectures:
A rough list of topics covered in the course. The list is still preliminary and may change.
- Introduction, motivation, history
- Morphology
- Two-level morphology and finite state transducers
- Lexicon-free morphology: The Porter stemmer
- N-gram models of language, smoothing techniques
- Hidden Markov models
- Part-of-speech tagging
- HMM taggers
- The Brill tagger
- The ENGCG tagger
- Constituent and dependency grammars of English
- A brief introduction to the structures of English
- Constituent vs. dependency formalisms
- Context-free grammars and parsers
- Top-down, bottom-up parsing
- The Earley algorithm
- Feature structures and unification
- Dependency and lexicalized grammars and parsers
- Probabilistic context-free grammars
- Semantic analysis
- Word sense disambiguation (WSD), information extraction (IE), information retrieval (IR)
- Features and basic algorithms for WSD
- Pattern-based IE
- Vector space model and IR
- Latent semantic indexing (if time permits)
- Natural language processing of biological texts
Course materials:
The course materials are here.
The materials are available only to the students who signed to the course. You are not allowed to redistribute the files. Use your student number (five-digit number) as both the username and password in the login window. If you have problems with the login or cannot sign to the course for whatever reason, email me or visit me in my office.