Formal Language : Basic terms
Symbols, Alphabets, Strings, and Languages
So, what exactly is a language, in
the formal sense? It’s not quite the same as English or Spanish. We’re going to
break it down into its most basic components.
2.1 Symbol
The most fundamental part is the symbol.
A symbol is just a basic, indivisible unit. Think of it as a letter in the
alphabet, or a digit in a number system.
Examples:
The letter ‘a’
The digit ‘1’
The symbol ‘#’
Even emojis
can be symbols!
2.2 Alphabets
Next, we have an alphabet. An
alphabet is simply a finite, non-empty set of symbols. We often use the Greek
letter Sigma (Σ) to represent an
alphabet.
Examples:
i.
Σ = {‘a’, ‘b’, ‘c’} (A small
alphabet of lowercase letters)
ii.
Σ = {0, 1} (The binary alphabet
– super important for computers!)
iii.
Σ = {A, B, C, …, Z}
o Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
2.3 Strings
Now, things get more interesting. A string
is a finite
sequence of symbols chosen from an alphabet. The symbols are placed
one after another.
Examples (using the alphabet Σ = {a, b}):
i.
"a" (A string of length 1)
ii.
"ab" (A string of length 2)
iii.
"baa" (A string of length 3)
iv.
"aabbaba" (A longer string)
The
Empty String:
There’s also a special string called the empty string,
which has no symbols. We represent it with the Greek letter Epsilon
(ε) or sometimes lambda (λ). It’s like an empty word.
The key is that a string is always
formed from a single, well-defined alphabet.
Valid String: A string is valid
if all
of its symbols are drawn from the same, predefined alphabet.
Invalid String (in this
context): A string is invalid if it contains symbols that do not belong to the
specified alphabet.
Example to illustrate your
point:
Let’s say we have two alphabets:
·
Σ1 = {a, b}
·
Σ2 = {0, 1}
If we try to create a string "a1", it
would be considered invalid with respect to either Σ1 or Σ2, because it contains symbols from both
alphabets. To be valid, "a1" would need to be constructed from an
alphabet that includes {a, b, 0, 1}.
Length of a String:
The length of a string is simply the number of
symbols in the string. We denote the length of a string ‘w’ as |w|.
i.
|ε| = 0
ii.
|a| = 1
iii.
|ab| = 2
iv.
|baa| = 3
Lexicographic Order
Lexicographic order for strings is essentially
the same as dictionary order.
Strings are compared character by character from
left to right.
The order is determined by the underlying order
of the alphabet’s symbols.
For example, if the alphabet is {a, b, c} and
a< b < c then "abc" comes before
"acb" in lexicographic order.
Shorter strings always come before longer
strings.
If two strings have the same length, they are
ordered lexicographically.
Example: Over the alphabet {0, 1}, the order is (epsilon,
0, 1, 00, 01, 10, 11, 000, …), where epsilon
represents the empty string.
What is Language?
2.4 Language
Finally, we get to the definition of a formal
language. A formal language is simply a set of strings formed from a specific
alphabet.
Important: A language can be finite or infinite!
A language L is a subset of Σ*, where Σ is an alphabet and Σ* is the Kleene star of
Σ. This means that every string
in the language L is formed by concatenating symbols from the alphabet Σ.
Key
Concepts:
Powers of an Alphabet (Σk): "The set Σk is the set of all
strings of length exactly k formed using symbols from Σ."
Example: If Σ = {a, b}, then Σ2 = {aa,
ab, ba, bb}.
Given an alphabet Σ, the set Σk represents the set
of all strings of length exactly k, formed using symbols from Σ." In other words, we’re taking the alphabet
and combining its symbols to make strings of a specific length k.
More Examples:
i.
Σ
= {0, 1}
ii.
Σ0 = {ε} (The set of all strings of length 0. Only the empty string.)
iii.
Σ1 = {0, 1}
iv.
Σ2 = {00, 01, 10, 11}
v.
Σ3 = {000, 001, 010, 011, 100,
101, 110, 111}
The Kleene
Star (Σ*): "The Kleene star Σ* is the set of all
possible strings (of any finite length, including the empty string ε) that can be formed using symbols from Σ."
Mathematically: Σ* = Σ0 ∪ Σ1 ∪ Σ2 ∪ Σ3 ∪ …
The Kleene
Plus (Σ+): "The Kleene plus Σ+ is the set of all
possible strings (of length 1 or more) that can be formed using symbols from Σ."
Mathematically: Σ+ = Σ1 ∪ Σ2 ∪ Σ3 ∪ …
Σ+ = Σ* – {ε}