Basic of formal language theory
Basic of formal language theory
Roughly, there are two dual views of languages:
The recognition point view.
M, (an automaton) that takes a string, w, as input and returns two possible answers:
- Yes, the string w is accepted, which means that w belongs to the language, L, that we are trying to define.
- No, the string w is rejected, which means that w does not belong to the language, L.
The generation point of view.
formalisms that specify a language in terms of rules that allow the generation of “legal” strings. The most common formalism is that of a formal grammar
we are typically considering two things:
- The syntax , i.e., what are the “legal” strings in that language (what are the “grammar rules”?).
- The semantics of strings in the language, i.e., what is the meaning (or interpretation) of a string.
we will only be dealing with syntax!
Alphabets, Strings, Languages
a language is a set of strings
Definition 2.1. An alphabet Σ is any finite set.
We often write . The are called the symbols of the alphabet.
A string is a finite sequence of symbols. Technically, it is convenient to define strings as
functions. For any integer n ≥ 1, let
and for , let
Definition 2.2. Given an alphabet Σ, a string over Σ (or simply a string) of length n is any function
empty string, or null string
Definition 2.3. Given an alphabet Σ, given any two strings and , the concatenation (also written ) of u and v is the string
, defined such that
Definition 2.4. Given an alphabet Σ, given any two strings u, v ∈ Σ∗ we define the following notions as follows:
- prefix
- suffix
- substring
We say that u is a proper prefix (suffix, substring) of v iff u is a prefix (suffix, substring) of v and .
Definition 2.5. Given an alphabet Σ, assumed totally ordered such that, lexicographic ordering:
reversal
Definition 2.6. Given an alphabet Σ, a language over Σ (or simply a language) is any subset L of Σ∗. If Σ ̸= ∅, there are uncountably many languages.
Finite, Infinite, Countable, and Uncountable Sets
The number n is uniquely determined. It is called the cardinality (or size) of X and is denoted by |X|.
Definition 2.7. A set X is countable (or denumerable) if there is an injection from X into N.
Theorem 2.1. (Cantor, 1873) For every set X, there is no surjection from X onto .
We will begin with the family of regular languages, and then proceed to the context-free
languages
Operations on Languages
- union
- intersection
- difference
- relative complement
- complement
Definition 2.8. Given an alphabet Σ, for any two languages L1, L2 over Σ, the concatenation L1L2 of L1 and L2 is the language
We define the reversal of a language L ⊆ Σ∗ as
Definition 2.9. Given an alphabet Σ, for any language L over Σ,
The Kleene +-closure of L is the language
The Kleene ∗-closure of L is the language
- if ,
- if , then
homomorphism h
这俩都是 infinite union