Formal languages and natural languages
Guy Perrier
Master in Cognitive Science
Specialization :  Natural Language Processing
Université de Lorraine

The aim of the course is to familiarize students with some mathematical and algorithmic tools used in the domain of formal languages and oriented towards natural language processing.

The plan of the course is the following :
  1. Introduction to Linguistics and Computational Linguistics (pdf slides)
    First, we address some basic concepts in linguistics and then we highlight what makes natural languages closer to formal languages such as programming languages and what differentiates them.
  2. Finite State methods (pdf slides)
    We start by familiarizing ourselves with mathematical concepts and computational technologies related to these two notions: regular languages, finite state automata and transducers. Then, we illustrate their application to the processing of the low levels of natural language with the implementation of part of speech tagging using transducers.
  3. Context Free Grammars (pdf slides)
    We study Context Free Grammars from a formal point of view. As application domain, we choose to model the syntax of natural languages. We compare context free languages with regular languages by putting them in the Chomsky hierarchy of languages. In connection with this hierarchy, we study the limits of formal languages for representing natural languages.
    Then, we present two parsing methods for natural languages (CKY and Earley) with context-free grammars, which use the efficient technology of tabulation.
  4. Feature structures and unification (pdf slides)
    Starting from context-free grammars, we show how to enrich them with feature structures and the associated composition mechanism, unification.
  5. Regular Tree Grammars (pdf slides)
    We introduce Regular Tree Grammars as a framework, in which the first class citizens are trees and not strings. Context Free Grammars are viewed as a special kind of regular tree grammar. Paired with a notion of interpretation, RTGs are enriched as Interpreted RTGs and are used for the study of various existing formalisms in a new unified light.

Bibliography