The pumping lemma
0^n 1^n
Whenever a character is raised to a power whilst considering strings
for finite state machines, the ^n
denotes simple
repetitions of that character within the string, such that when n
= 4
, 0^4 1^4 = 00001111
. So it is that the set of
strings '0^n 1^n' is an expression of all strings that have that number
n of 0
's followed by the same number of
1
's.
This machine is readily identifiable as having infinite states, as any attempt to draw it demonstrates; It is clearly the case that for every value of n, the machine grows towards infinity with n.
The set of strings 0^n 1^n
can be proven to be
irregular using the pumping lemma.
For any machine that attempts to show 0^n 1^n
to be
regular, we can choose a string to test it with, that is longer than
the machine has states; When this is the case; For that machine to
parse the string, it absolutely must loop somewhere along its pattern,
in order to finish upon the final state.
Suppose a machine that has 24 states, reading the string 0^24
1^24
. Since there are 24 0's in the first half of this string,
the machine will need to loop during these 0's, in order to complete;
At the very least one state will be used twice. Taking note of when
that state occurs for the second time and counting how many characters
have passed before during and after it, we learn how many characters
are in each of three sections; The string could be divided as
follows:
0^17, loop(0^3), 0^4, 1^24
Now using that same machine, what will it do when given a string that has its loop section doubled?
0^17, loop(0^3), loop(0^3), 0^4, 1^24
The loop is an exact repetition and as such the machine will treat
this string in exactly the same way as the prior, except that this time
it will loop twice instead of once. However, the machine will arrive in
the exact same state at the end of the loops in both cases. If the
first string ends up in a final state, then it follows that so will the
second string complete; This is very problematic for a machine that is
only suppose to accept 0^n 1^n
.
This looping of the machine over the string is said to be pumping, and thus named the pumping lemma.
Used to demonstrate that a string is of a set that is not regular, when pumping cannot occur. A string that does show the pumping property is not always of a regular set.
Anything that grows in a linear way is likely to be regular, anything that grows faster than linear can never be regular.
A formal definition
What is the pumping lemma?
If L is a regular set,
then
∀ z in L, where |z| >= n
∃ n = # of states in machine for L
∃ v,w,x such that z=vwx, such that |vw| <= n, |w| >= 1
∀ i >= 0 vw^ix is also in L.
If L
is a regular set, then for every string
z
in L
where |z|
the number of
symbols in z is >= n
the #
states in
machine for L
.
There is a way to split that string into three parts, before the loop, the loop itself and then the rest of the string with all consequent loops included.
The string now has three parts v,w,x,
such that
z=vwx
. The loop occurs, vw
, within the first
n
symbols and the loop w
must be >=
1
.
For every i >= 0
, vw^ix
is also in
L
. The repeated concatenation of w
in
vw
remains within L
giving the same final
state.
When i = 0
the string is correct also, this should also
be considered when checking whether or not a set is acceptable by a
state machine; We are effectively cutting the string shorter and
testing if the new shortened string will pass the conditions of the
machine.
Defining the negative pumping property
∀ n = # states in any machine for L.
∃ z in L, |z| >= n
∀ v,w,x where z = vwx, and |vw| <= n, w >= 1
∃ i >= 0 such that vw^ix is not in L.
For every n
in the set of states L
, there
exists as string in L
, whose symbols are >=
n
, such that for every v,w,x,
where z =
vwx
, for every possible way there is to split vwx into three
pieces, and vw <= n and w >= 1
, there exists some i
>= 0
such that vw^ix
is not within
L
.
The converse is not always true, if you have a set and it repeatedly pumps loops, displaying the pumping property; This is not conclusive proof that the set is regular.
Examples
First we take the example of a set that displays the pumping property that is not a regular set; The palindrome.
Palindromes
0^(k/2) 1 0^(k/2)
A machine has k states. 0^(k/2) 1 0^(k/2)
this is
certainly a palindrome with more than k states.
if z = 0^(k/2) 1 0^(k/2), z >= k symbols
v = 0^(k/2), w = 1, x = 0^(k/2)
note that |vw| <= k and |w| >= 1
No matter how many times we loop over i, we still get a palindrome; It does satisfy the pumping property; We can only conclude that this attempt to show that the pumping property will fail, every single value of i that we might try will fail.
Selecting w a 1 was the reason that this machine worked for this string, but this is not proof that the string is part of a regular set, as we will see.
The pumping property does hold for this string, but we have selected a bad string to start with; This does not show that the pumping property holds for all cases, only that it does in this case.
0^k 1 0^k
if z = 0^k 1 0^k, z >= k symbols
and I note that w = 0^m, for some m between 1 and k
No matter how the string is split into three parts; It is no longer possible to select a value of w that is 1, the only thing that can possibly be done is to select a value for w that is within the part of the string that is all 0's.
let i = 2: vw^2x = 0^(k+m) 1 0^k
This cannot be a palindrome; Hence the set of palindromes is not a regular set.
There are different kinds of pumping lemmas but they are all the same idea; Basicaly that regular sets have these loops, and they regularly can be pumped up. Anything that can not be regularly pumped up is not a regular set.
There are things that can be pumped up, but that are still not
regular, as such this is not a characterisation of all of the regular
sets; It is actualy very hard to get the required 'if and only if' that
results in a definition. There is an 'if and only if' for alphabets
with a single symbol, 0
. There is a characterisation for
all alphabets over the single character set 0
, that are
regular, a very linear characterisation.
0^k^2
For an alphabet of strings that contains a square number of
0
's.
Of note: anything raised to the power of 0
, in this
case 0^0
, gives the identity element of the mathematical
structure; The element of this structure is an empty string.
0^k^2
is not regular.let
n
be the number of states in a hypothetical machine
for 0^k^2
.For
0^n^2
(which is in L the language) and |0^n^2| &tg;= n
Let 0^n^2 = vwx, |vw| >= n, |w| >= 1
Let |w| = m <= n
Picking a value for i to test, we are again very lucky in our choice, and find the appropriate case that will fail.
i = 2, vw^2x = 0^(n^2+m)
We can state that
0^(n^2+m) != 0^k^2
for any k, because ...n^2 < n^2+m < (n+1)^2
Thus
n^2+m
cannot be a square if it is stuck in between two
squares.
o^k, k = composite
This case is an example of a set that has the pumping property but is not a regular set.
The compliment of this set which is o^p
prime, can be
shown to not be regular by using the pumping lemma. Having demonstrated
this, how then do we prove that 0^c composite is not regular? The
composite set being the compliment of the prime, as such it is
impossible for the set to be regular; A proof by contradiction due to
the closure properties.
z = 0^(2n) = vwx
If w
is 2
, then we will be adding two
0
's to an already even number as 2n
is even;
The result will never be a prime number. If the string were for
3n
, then we could pick 3 0
's and the same
will result.
For this example which is obviously always composite, the pumping
property still holds; no matter how many times i is pumped the
resulting value of w will always be a composite and also aways be even
and thus never a prime. We see then that it is both composite, but also
that the set is not regular. The pumping property can be used to show
that o^p
, prime is not regular (shown later). How do we,
with this information, conclude that the example for a composite is
also not regular? By noticing that these two sets are the compliment of
each other, therefore if o^p is not regular it follows that o^k can not
be regular either.
This is an example of a closure, if the pumping lemma is not working to prove that a set is not regular, perhaps its composite set will, by way of this proof by contradiction using the properties of the closure; We have seen how closures can be used to prove that a set is not contained within the regular.
equal 0's and 1's, using an intersection
Having already seen that 0^n 1^n
is not regular, by way
of the pumping lemma; let us then assume that we know that this is not
regular; How would we show that for an equal number of 0
's
and 1
's the set is not regular?; The string can be mixed
up in any which way, but there must be and equal number of
0
's and 1
's, not necessarily in order of all
0
's and then all 1
's.
We could prove this using the pumping lemma, but will take this opportunity to demonstrate the same using this closure technique.
0
's and 1
's, intersected
with 0^* 1^*
0^n 1^n = equal(0's, 1's) ⋂ 0^* 1^*
If we were to take all of the strings that have equal
0
's and 1
's and all of the strings that start
with 0
's and end with 1
's and ask what do
they have in common; They have 0^n 1^n
in common.
Regular sets are closed under intersection, therefor as we
know that 0^* 1^*
is regular and we know that 0^n
1^n
is not regular, as such equal(0's, 1's)
is
also not regualr, if it were then the intersection of the two would
have to be regular. It is the same idea, here using closure over
intersection, the pumping lemma is a direct method and the closure
property is indirect.
0^210100100101100 or 0^2|0|00|00|0||00
The number of states are listed first the number of 0
's
in the string and the seperator is 1
. Starting from the
start state, for a 0
we would go to state number
1
(one 0), and for a 1
to state number
2
(2 0
's), then another seperator.
0^2|0|00|00|0||00
a double line at the end depicting the
final state. As such we can construct a binary string for any satate
machine.
This string is recognisably correct as it has the correct number of
portions, for each of the two states there are two states shown, thus
there are four states in total. 0^3|0|0|
is clearly
incorect as it should have 9 states.
Is it possible to write a finite state machine that is capable of detecting whether or not a finite state machine is correct or not, that can recognise itself; Is this set regular or not regular? It is not regular because finite state machines are so limited.
Showing this somewhat more formaly. Let n be the number of states in the machine. Choosing a leagal finite state machine with more than n symbols in it, so that it must be pumped up...
Turing machines however, are able to determin whether or not a finite state machine is correct. Turing machines are able to recognise if a machine is a turing machine, but are not powerful enough to recognise any interseting properties about turing machines, they cannot know anything about other state machines.
Linear grammar
This next topic will link together the trio of regular expressions deterministic and nondeterministic state machines; Adding a new idea, a new way of looking at sets. A grammatical way of looking at sets. As the levels go up, the grammar way of looking at a set becomes at least as important, if not more important than examining sets with machines. This really is a different way of looking at a set. Although more important to the next level up, in describing languages and compilers, it is far more transparent and easy to understand at this level.
The best way to explain this is with an example: Consider this
machine that will accept an odd number of 1
's.
Constructing a grammar, a grammar is formalism that neither accepts nor rejects things, it generates them. Anything that the grammar generates the machine will accept, and anything that the machine would fail, the grammar will not be able to generate. To do this we will use each state of the machine to represent a nonterminal symbol, and each transition arrow to represent a terminal symbol from our language.
The following are productions of this grammar.
A -> 0A
A -> 1B
B -> 0B
B -> 1A
Using these productions to generate some strings. Normaly one of these symbols will be denoted as being the start symbol, usualy S for start, but we have used A with our smaller quanitiy of states. This is known as a derrivation, a sequence of substitutions in the grammar, one after the other.
A->0A->01B->010B
The capital symbols are called non terminals, we have nothing yet to show the final state. From our machine it is apparent that B can also terminate, we need to add this to complete our grammar.
A -> 0A
A -> 1B
B -> 0B
B -> 1A
B -> e
->010
The intermedeary steps between our starting point and its acception of the final state, with the capital letters, are sometimes known as 'sentential forms'.
This is known as a linear grammar, the definition being that it has a single capital letter on the left hand side of every production and the right side is a very special form, it has a single terminal symbol, and a single non terminal symble, in our case for our language a binary state and a capital letter.
This kind of linear grammar is sometimes known as a left linear grammar. The same procedure also works for nondeterministic finite state machines.
We have just demonstrated a specal type of grammar, this is a context free grammar. A context free grammar have single non terminals on the left, they can have anything on the right, our construction here is a special case of a context free grammar in which the right side is constrained, a linear grammar.
Converting a grammar into a machine
S -> 0A
A -> 1B
A -> 0B
A -> 0S
B -> 1S
A -> 1
B -> 0
S -> 1
A -> e
We are drawing a nondeterministic state machine, grammars are by
their nature nondeterministic, the states A -> 1
and
the others that do not have a capital letter on the right hand side,
did not appear in our previous example. They are to be considered as
state changes that lead to a dead state as shown in the state machine
drawing of our grammar.