The pumping lemma

0^n 1^n

Whenever a character is raised to a power whilst considering strings for finite state machines, the ^n denotes simple repetitions of that character within the string, such that when n = 4, 0^4 1^4 = 00001111. So it is that the set of strings '0^n 1^n' is an expression of all strings that have that number n of 0's followed by the same number of 1's.

This machine is readily identifiable as having infinite states, as any attempt to draw it demonstrates; It is clearly the case that for every value of n, the machine grows towards infinity with n.

An attempt
				to draw an infinite state machine.
Fig. 03-01

The set of strings 0^n 1^n can be proven to be irregular using the pumping lemma.

For any machine that attempts to show 0^n 1^n to be regular, we can choose a string to test it with, that is longer than the machine has states; When this is the case; For that machine to parse the string, it absolutely must loop somewhere along its pattern, in order to finish upon the final state.

Suppose a machine that has 24 states, reading the string 0^24 1^24. Since there are 24 0's in the first half of this string, the machine will need to loop during these 0's, in order to complete; At the very least one state will be used twice. Taking note of when that state occurs for the second time and counting how many characters have passed before during and after it, we learn how many characters are in each of three sections; The string could be divided as follows:

0^17, loop(0^3), 0^4, 1^24

Now using that same machine, what will it do when given a string that has its loop section doubled?

0^17, loop(0^3), loop(0^3), 0^4, 1^24

The loop is an exact repetition and as such the machine will treat this string in exactly the same way as the prior, except that this time it will loop twice instead of once. However, the machine will arrive in the exact same state at the end of the loops in both cases. If the first string ends up in a final state, then it follows that so will the second string complete; This is very problematic for a machine that is only suppose to accept 0^n 1^n.

This looping of the machine over the string is said to be pumping, and thus named the pumping lemma.

Used to demonstrate that a string is of a set that is not regular, when pumping cannot occur. A string that does show the pumping property is not always of a regular set.

Anything that grows in a linear way is likely to be regular, anything that grows faster than linear can never be regular.

A formal definition

What is the pumping lemma?

If L is a regular set,
then
∀ z in L, where |z| >= n
∃ n = # of states in machine for L
∃ v,w,x such that z=vwx, such that |vw| <= n, |w| >= 1
∀ i >= 0 vw^ix is also in L.

If L is a regular set, then for every string z in L where |z| the number of symbols in z is >= n the # states in machine for L.

There is a way to split that string into three parts, before the loop, the loop itself and then the rest of the string with all consequent loops included.

The string now has three parts v,w,x, such that z=vwx. The loop occurs, vw, within the first n symbols and the loop w must be >= 1.

For every i >= 0, vw^ix is also in L. The repeated concatenation of w in vw remains within L giving the same final state.

When i = 0 the string is correct also, this should also be considered when checking whether or not a set is acceptable by a state machine; We are effectively cutting the string shorter and testing if the new shortened string will pass the conditions of the machine.

Defining the negative pumping property

∀ n = # states in any machine for L.
∃ z in L, |z| >= n
∀ v,w,x where z = vwx, and |vw| <= n, w >= 1
∃ i >= 0 such that vw^ix is not in L.

For every n in the set of states L, there exists as string in L, whose symbols are >= n, such that for every v,w,x, where z = vwx, for every possible way there is to split vwx into three pieces, and vw <= n and w >= 1, there exists some i >= 0 such that vw^ix is not within L.

The converse is not always true, if you have a set and it repeatedly pumps loops, displaying the pumping property; This is not conclusive proof that the set is regular.

Examples

First we take the example of a set that displays the pumping property that is not a regular set; The palindrome.

Palindromes

0^(k/2) 1 0^(k/2)

A machine has k states. 0^(k/2) 1 0^(k/2) this is certainly a palindrome with more than k states.

if z = 0^(k/2) 1 0^(k/2), z >= k symbols
v = 0^(k/2), w = 1, x = 0^(k/2)
note that |vw| <= k and |w| >= 1

No matter how many times we loop over i, we still get a palindrome; It does satisfy the pumping property; We can only conclude that this attempt to show that the pumping property will fail, every single value of i that we might try will fail.

Selecting w a 1 was the reason that this machine worked for this string, but this is not proof that the string is part of a regular set, as we will see.

The pumping property does hold for this string, but we have selected a bad string to start with; This does not show that the pumping property holds for all cases, only that it does in this case.

0^k 1 0^k

if z = 0^k 1 0^k, z >= k symbols
and I note that w = 0^m, for some m between 1 and k

No matter how the string is split into three parts; It is no longer possible to select a value of w that is 1, the only thing that can possibly be done is to select a value for w that is within the part of the string that is all 0's.

let i = 2: vw^2x = 0^(k+m) 1 0^k

This cannot be a palindrome; Hence the set of palindromes is not a regular set.

There are different kinds of pumping lemmas but they are all the same idea; Basicaly that regular sets have these loops, and they regularly can be pumped up. Anything that can not be regularly pumped up is not a regular set.

There are things that can be pumped up, but that are still not regular, as such this is not a characterisation of all of the regular sets; It is actualy very hard to get the required 'if and only if' that results in a definition. There is an 'if and only if' for alphabets with a single symbol, 0. There is a characterisation for all alphabets over the single character set 0, that are regular, a very linear characterisation.

0^k^2

For an alphabet of strings that contains a square number of 0's.

Of note: anything raised to the power of 0, in this case 0^0, gives the identity element of the mathematical structure; The element of this structure is an empty string.

claim 0^k^2 is not regular.

let n be the number of states in a hypothetical machine for 0^k^2.
For 0^n^2 (which is in L the language) and |0^n^2| &tg;= n
Let 0^n^2 = vwx, |vw| >= n, |w| >= 1
Let |w| = m <= n

Picking a value for i to test, we are again very lucky in our choice, and find the appropriate case that will fail.

Selecting i = 2, vw^2x = 0^(n^2+m)
We can state that 0^(n^2+m) != 0^k^2 for any k, because ...
n^2 < n^2+m < (n+1)^2
Thus n^2+m cannot be a square if it is stuck in between two squares.

o^k, k = composite

This case is an example of a set that has the pumping property but is not a regular set.

For all numbers that are composite numbers.

The compliment of this set which is o^p prime, can be shown to not be regular by using the pumping lemma. Having demonstrated this, how then do we prove that 0^c composite is not regular? The composite set being the compliment of the prime, as such it is impossible for the set to be regular; A proof by contradiction due to the closure properties.

z = 0^(2n) = vwx

If w is 2, then we will be adding two 0's to an already even number as 2n is even; The result will never be a prime number. If the string were for 3n, then we could pick 3 0's and the same will result.

For this example which is obviously always composite, the pumping property still holds; no matter how many times i is pumped the resulting value of w will always be a composite and also aways be even and thus never a prime. We see then that it is both composite, but also that the set is not regular. The pumping property can be used to show that o^p, prime is not regular (shown later). How do we, with this information, conclude that the example for a composite is also not regular? By noticing that these two sets are the compliment of each other, therefore if o^p is not regular it follows that o^k can not be regular either.

This is an example of a closure, if the pumping lemma is not working to prove that a set is not regular, perhaps its composite set will, by way of this proof by contradiction using the properties of the closure; We have seen how closures can be used to prove that a set is not contained within the regular.

equal 0's and 1's, using an intersection

Having already seen that 0^n 1^n is not regular, by way of the pumping lemma; let us then assume that we know that this is not regular; How would we show that for an equal number of 0's and 1's the set is not regular?; The string can be mixed up in any which way, but there must be and equal number of 0's and 1's, not necessarily in order of all 0's and then all 1's.

We could prove this using the pumping lemma, but will take this opportunity to demonstrate the same using this closure technique.

Equal 0's and 1's, intersected with 0^* 1^*
0^n 1^n = equal(0's, 1's) ⋂ 0^* 1^*

If we were to take all of the strings that have equal 0's and 1's and all of the strings that start with 0's and end with 1's and ask what do they have in common; They have 0^n 1^n in common.

Regular sets are closed under intersection, therefor as we know that 0^* 1^* is regular and we know that 0^n 1^n is not regular, as such equal(0's, 1's) is also not regualr, if it were then the intersection of the two would have to be regular. It is the same idea, here using closure over intersection, the pumping lemma is a direct method and the closure property is indirect.

How to turn finite state machines into binary.
An image
				depicting how to change a finite state machine into binary.
Fig. 03-02
0^210100100101100 or 0^2|0|00|00|0||00

The number of states are listed first the number of 0's in the string and the seperator is 1. Starting from the start state, for a 0 we would go to state number 1 (one 0), and for a 1 to state number 2 (2 0's), then another seperator. 0^2|0|00|00|0||00 a double line at the end depicting the final state. As such we can construct a binary string for any satate machine.

This string is recognisably correct as it has the correct number of portions, for each of the two states there are two states shown, thus there are four states in total. 0^3|0|0| is clearly incorect as it should have 9 states.

Is it possible to write a finite state machine that is capable of detecting whether or not a finite state machine is correct or not, that can recognise itself; Is this set regular or not regular? It is not regular because finite state machines are so limited.

Showing this somewhat more formaly. Let n be the number of states in the machine. Choosing a leagal finite state machine with more than n symbols in it, so that it must be pumped up...

Turing machines however, are able to determin whether or not a finite state machine is correct. Turing machines are able to recognise if a machine is a turing machine, but are not powerful enough to recognise any interseting properties about turing machines, they cannot know anything about other state machines.

Linear grammar

This next topic will link together the trio of regular expressions deterministic and nondeterministic state machines; Adding a new idea, a new way of looking at sets. A grammatical way of looking at sets. As the levels go up, the grammar way of looking at a set becomes at least as important, if not more important than examining sets with machines. This really is a different way of looking at a set. Although more important to the next level up, in describing languages and compilers, it is far more transparent and easy to understand at this level.

The best way to explain this is with an example: Consider this machine that will accept an odd number of 1's.

An image
				depicting how to change a finite state machine into binary.
Fig. 03-02

Constructing a grammar, a grammar is formalism that neither accepts nor rejects things, it generates them. Anything that the grammar generates the machine will accept, and anything that the machine would fail, the grammar will not be able to generate. To do this we will use each state of the machine to represent a nonterminal symbol, and each transition arrow to represent a terminal symbol from our language.

The following are productions of this grammar.

A -> 0A
A -> 1B
B -> 0B
B -> 1A

Using these productions to generate some strings. Normaly one of these symbols will be denoted as being the start symbol, usualy S for start, but we have used A with our smaller quanitiy of states. This is known as a derrivation, a sequence of substitutions in the grammar, one after the other.

A->0A->01B->010B

The capital symbols are called non terminals, we have nothing yet to show the final state. From our machine it is apparent that B can also terminate, we need to add this to complete our grammar.

A -> 0A
A -> 1B
B -> 0B
B -> 1A
B -> e

->010

The intermedeary steps between our starting point and its acception of the final state, with the capital letters, are sometimes known as 'sentential forms'.

This is known as a linear grammar, the definition being that it has a single capital letter on the left hand side of every production and the right side is a very special form, it has a single terminal symbol, and a single non terminal symble, in our case for our language a binary state and a capital letter.

This kind of linear grammar is sometimes known as a left linear grammar. The same procedure also works for nondeterministic finite state machines.

We have just demonstrated a specal type of grammar, this is a context free grammar. A context free grammar have single non terminals on the left, they can have anything on the right, our construction here is a special case of a context free grammar in which the right side is constrained, a linear grammar.

Converting a grammar into a machine

S -> 0A
A -> 1B
A -> 0B
A -> 0S
B -> 1S
A -> 1
B -> 0
S -> 1
A -> e

We are drawing a nondeterministic state machine, grammars are by their nature nondeterministic, the states A -> 1 and the others that do not have a capital letter on the right hand side, did not appear in our previous example. They are to be considered as state changes that lead to a dead state as shown in the state machine drawing of our grammar.

A state
				machine that depicts the above grammar.
Fig. 03-04