Minimising finite state machines

One of the really nice things about finite state machines is that whenever you create one, you can minimise it to a unique optimal state machine. If several people develop different finite state machines that solve the same particular problem, then all of those finite state machines, once minimised, will share the exact same minimal form.

The same can not be said of Turing machines, nor of the mid level pushdown automata; There is no minimum machine amongst these programs, at these levels, but for finite state machines there is.

The presence of these minimal machines really helps us to understand state machines well. They are at the core of decision algorithms about finite state machines, our understanding of which is helped by minimise them.

Intuition About State Machines

We can imagine ourselves inside of a network of caves when in a finite state machine, each room of the cave has little doors to passages out of the room, with the doors marked with 0's and 1's; From inside this environment as we move form room to room, there can be specific good or bad things found inside of each room, very similar to the kind of things that we find in dungeon style adventure games. If we would like to make a map of the cave whilst moving about inside of it, in so doing we might take the same path several times without realising that we have done so, the result of which is a huge map that works just fine, except that it has duplicate data in its structure where we have not realised that we are walking in places that we have been before. As we start to realise that parts of the cave on our map have been duplicated, we can reduce it to its more optimal real form.

The initial map is an unoptimised state machine, and the actual map is the minimised, optimal state machine.

If rooms with food in them are shown as final state nodes, with a double circle, rooms without resources a single circle; we cannot confuse them, they can not be merged together in any way whilst still maintaining the same functionality of the machine. But what about the other states, suppose we consider C and D, if a string that finds us at state C advances with either a 1 or a 0, does it differ at all from the same situation were we in state D? If all that we want to know is whether or not we will accept a string, and all of the strings from D and all of the strings from C both are either accepted or rejected in the same way, then there is no reason to have the be different places or different states. Are there any possible strings that can distinguish D from C? If there are, we need to define with some rigour how we are going to decide this.

A is very easily differentiated from E by the empty string, from A we will not accept an empty string whereas from E we can, therefore A and E are clearly not the same state.

What we do next is just keep increasing the string lengths to distinguish the machines state, moving up to single symbol strings to distinguish the machines state then two then three then four. It turns out that if you can distinguish two different states, in this machine with six states you will need a string of six symbols or less, to distinguish it. After that, you will always find a loop and it will be possible to reduce that string to a shorter one, to make the distinction. So we do not need to go on for ever looking for strings that determine replication of state, we only need to go up to strings of length six.

We can make up a table to define all of our possible state combinations, in so doing find 15 state pairs, placing an X in every square that represents a pair of states that are distinguishable one from the other.

All of the X's represent the combinations of final states with non final states, the states that we know already as distinguishable. If we can fill this table up with all of the distinguished states, any remaining states can be collapsed into new states that are also unique.

Taking a blank table cell defined by a pair of undetermined states, say A and F, how can we decide whether A and F are distinguishable from one another? Taking one symbol leading from A and F say 0, where do we end up upon leaving each of the two states; B and E respectively. Are B and E distinguishable in our table? Not yet ... Had they have been distinguishable then we could deduce that A and F are also unique. Looking at 1 from both A and F, we go to D and C in our table we can see that they are not yet distinguishable. If however at some point later on we determine that they are, then we need to come back to change our evaluation of D and C. In order to achieve this, in our verifying algorithm we need to add a reminder as to which verifications have already been made, we can add to these verified cells the pairs of nodes that have been checked and linked so that if we find that cell to be distinguished, we might easily return to the designated cells marking those as distinguished also.

What does A D depend upon; B E with 0 and on A D with 1. The auto reference of A D to itself is already a designation of self similarity, as such this can be ignored. But it might be distinguished, if B and E are distinguished so we can also add B,E to the cell A,D. We can represent the states dependencies by depicting them with a directed graph.

Continuing on, what does A,C depend upon; both A and C lead to B which differentiates nothing but they also lead to D and F and as such require that A,C be marked inside of the cell D,F.

Equivalent Transitive Relationships

Once we have looked at all of the empty squares and filled in any dependencies found we can look again at our picture and use the table to collapse it down into a more refined form, making all of the states that are indistinguishable from one another, into a single state.

There is a lot of math behind this operation, as yet unmentioned, this process is entirely based upon the equivalence relationship between states, deciding whether one state is distinguishable from another is exactly that, an equivalence relation. These are equivalent transitive relationships, we should never have any problems reducing our machine to a minimised machine, once these relationships have been clearly established.

We see also that C,F depend upon B,E, what does this mean; We will not go back and put C,F in the B,E line, that would be a cycle, we never go back to add things to cells only when moving forwards, going back would mead that we have found a cycle and that we are not ever going to get a distinguishable characteristic.

So we leave that and we move on to C,D it depends upon A,F which is already marked above so we are not going to put anything there either. The same also for B,E so we are not going to put anything there, so we move on ...

D,F depends on E,E so nothing distinguished there, and it depends on A,C which is already marked, so we have ended up finding all these cycles, none of which were dependant upon any of the cells that have X's in them.

We have now covered our entire table and have not found any cells that distinguish themselves, other than those already marked with an X. We will see examples later on in which we will find distinguishing features but for now we will see how to collapse our original state machine down into its minimised form.

Collapsing The Equivalents States

This means that any of our cells that do not have dependencies marked within them, are the same state, A,F are the same state, A,D are the same and also A,C.

We have now checked all of our cells except for B,E, B,E becomes another class, different from A,C,D,F but B and E remain undifferentiated from each other.

A,C,D,F all satisfy one class of equivalence, where as B,E is of another, with this knowledge in mind we can now redraw our state machine with only two nodes, A,C,D,F and B,E.

It is now much easier to see that our state machine is a machine that will determine whether a string has an odd number of 0's.

When you are defining an architecture for a computer that you are building, you will have a big finite state machine that represents the microcode, defined without worrying about it being minimal, you would cleanly define it to fit your needs then before implementing it, turning it into hardware where you want it to be small and compact, before that you will minimise it.

Effectively what we have done here is to construct our dependency graph using the table, all of the states that we put X's in within the table are completely disconnected from the dependency graph, what we are really trying to do when minimising the machine, is to construct the dependency graph starting from the X's and then moving backwards on the arrows to the dependant cells or node combinations, to see which we can reach.

A computer could be used to make the dependency graph, the nodes are pairs of vertices and the edges are whether they are dependant upon each other by single strings. If you wanted to know whether they are dependant upon each other for a length of more than one string, you just follow a path of length two. C,D is dependant upon C,F by a separation of two strings, all that we need to do to find out whether or not any of the nodes in our graph are dependant upon any others is to take the nodes that have X's and flood them back through the graph along the arrows, any nodes that they can reach should also get an X, anything that they can reach is also distinguishable.

directed
graph of dependant states with distinguished states

Our traversal of the graph whilst filling in the table, is in effect a backwards traversal, we are looking at the nodes that depend upon others and marking the dependency in the child node, so that we can later return to mark those parent nodes as being distinguished only if we find a related node to the child that distinguishes it.

Another Machine

This time round we will build up the dependency graph first and then begin to fill in the table as we go along, with no distinguished states already crossed.

We will start off by creating a list of nodes that represent distinguishable pairs, no two nodes can be the same node when one is a final state and the other a normal node, as such final states are clearly distinguishable from non final states, giving us:

All of which become crosses in out table.

Continuing on to create our directive graph of pairs of state, the pair A,D, what does that depend upon; B,D, which is not in our distinguished list, we can add that to our graph.

Moving on to A,C, on a 0 it goes to B,F, for the first time we have arrived at a state that is distinguishable, on a 0 from C we arrive at state F where as on a 0 from A we arrive at the B. So we can add A,C, as being linked to B,F, as previously mentioned by proxy this means that A,C, is also distinguished. In this algorithm, we do not bother checking the 1 if it has already been distinguished by the 0, we carry on.

Now A,B, this goes to B,E, which is distinguishable, so A,B, will also get an X. B,D, depends upon D,D, indistinguishable, and it depend on D,E, which is distinguished.

B,D, is shown to be distinguished by D,E, as it is already in our graph, when we cross it out we can also trace back to all nodes that have lead towards it and mark those as distinguished too. This is why we write in associated nodes throughout the table, facilitating the task when a branch is shown to be distinguished. Now that B,D, gets an X so we can establish from our table that A,D, should also get an X.

B,C, next, what does it depend upon; E,F, which is not yet defined, and D,D, so we need to add E,F, to both the graph and the table, if the graph ever shows us that E,F is distinguished we can follow the path back to B,C. The table is particular a way for us to store the edges back to previous nodes.

C,D, depends upon D,F, and D,D, now D,F, has an X so now C,D, gets an X too.

Only one more remains, E,F, which depends upon itself and as such is not distinguishable.

Now E and F can collapse merging into each other, as can B and C, giving us this more refined machine, the minimum machine.

Analysing now what we have done, we have created two graphs one with all of the nodes that are clearly distinguishable ore dependant upon nodes that are, and a graph that is independent from that with all of the nodes that are dependant upon each other but not upon any of the nodes with X's those parts of the original machine with states that are indistinguishable from one another.

This is really a graph searching problem, an algorithm of discrete math in disguise, though it may look like a dynamic programming table.

The algorithm that we have just worked thought is effective in O(n^3) time, there also exists a solution to this problem that is achievable in O(n log n) time.

n(log(n))Minimiseing DFM
Minimiseing DFM

What questions can we answer about FSM's, about regular sets?

Sometimes known as decision algorithms, it is really only algorithms that answer yes or no but the inputs are binary strings that represent finite state machines. Some questions about finite state machines might be: Does it accept an infinite number of things? Does this FSM accept any string of length 10? Do these two FSM's accept the same language? Do this pair of FSM's have anything in common that they accept?

These questions would be solved by writing a program, the input to the program would be a finite state machine, or several and binary strings. The most common question asked of a FSM is simply whether or not it accepts a particular input string.

UNIX has a built in utility called LEX that is specifically designed for this task.

Membership

Are two FSM's equal?