Formal languages and natural languages
Guy
Perrier
Master in Cognitive Science
Specialization : Natural Language Processing
Université de Lorraine
The
aim of the course is to
familiarize
students with some mathematical and algorithmic tools used in the
domain of formal languages and oriented towards natural language
processing.
The plan of the course is the
following
:
- Introduction to Linguistics and
Computational Linguistics (pdf slides)
First, we address some basic concepts in linguistics and then we
highlight what makes natural languages closer to formal languages such
as programming languages and what differentiates them.
- Finite State methods (pdf slides)
We
start by familiarizing ourselves with mathematical concepts and
computational technologies related to these two notions: regular
languages, finite state automata and transducers. Then, we illustrate
their application to the processing of the low levels of natural
language with the implementation of part of speech
tagging using transducers.
- Context Free Grammars (pdf slides)
We study Context Free Grammars from a formal point of view. As application domain,
we choose to model the syntax of natural languages. We compare context
free languages with regular languages by putting them in the Chomsky
hierarchy of languages. In connection with this hierarchy, we study the limits
of formal languages for representing natural languages.
Then, we present two parsing methods for natural languages (CKY and Earley)
with context-free grammars, which use the efficient technology of
tabulation. - Feature structures and
unification (pdf slides)
Starting from context-free grammars, we show how to enrich them with
feature structures and the associated composition mechanism,
unification.
- Regular Tree Grammars (pdf slides)
We
introduce Regular Tree Grammars as a framework, in which the first
class citizens are trees and not strings. Context Free Grammars are
viewed as a special kind of regular tree grammar. Paired with a notion
of interpretation, RTGs are enriched as Interpreted RTGs and are used
for the study of various existing formalisms in a new unified light.
Bibliography
- D. Jurasfsky & J. H. Martin - Speech and Language Processing -
Prentice Hall, 2000
- E. Roche & Y. Schabes - Finite-State Language Processing
- MIT Press, 1997
- J. E. Hopcroft & R. Motwani & J; D. Ullman - Introduction
to Automata Theory, Languages and Computation - Pearson
International Edition, 2007