Tuesday, February 26, 2013

Warshall’s Algorithm: Who’s Connected to Who?



Warshall’s Algorithm: Who’s Connected to Who?


            I’m currently taking a class in which we are going over some of the popular algorithms in the history of Computer Science and implementing them ourselves using Java. The first algorithm I was introduced to is called Warshall’s algorithm and it’s used for computing transitive closure and was originally published in 1959. If you are to look this up, you will most likely find references to the Floyd-Warshall algorithm, Floyd’s algorithm, etc. This is because a few years later the original algorithm was expanded upon, and I plan on going over that soon, but for now we will focus on the algorithm at hand.
            So first of all, what is transitive closure? And why do we care? Transitive closure is found by calculating all of the ways in which the elements inside of a set can connect to each other through direct and transitive relations. An example of when a transitive relation occurs would be if point A connects to point B and point B connects to point C, then point A connects to point C through a transitive relation. A direct relation would be the connections, A -> B, and B -> C. This relation of points can be thought of as a network of nodes and we are trying to find out any way in which each pair of nodes can connect either directly or by first bouncing through other nodes to create a path.
            Say there are four nodes in our network and we are representing them as the Set (A, B, C, D). We then declare a Relation of the set R = {(A,A),(A,C),(B,A),(C,B),(C,D),(D,A),(D,D). This relation can be represented using a 4x4 square matrix. If N is the amount of elements in the set, then the matrix used for representation will be of size N by N. The first row of the matrix for our relation R will represent each possible connection from point A to the other points. Looking at R, you can see that A connects to A and C from (A, A) and (A,C). To represent this, the first row of the matrix will be 1010. The matrix representation or what is commonly known as the adjacency matrix for the relation R will be:
1 0 1 0
1 0 0 0
0 1 0 1
1 0 0 1

            For a relation this small you can probably look through manually and figure out whether it’s possible for each point to reach all of the others, but for large networks it would take forever. So this is why we have an algorithm to help us out. To find the find the transitive closure of this relation we need to go from R to R*. R* is basically R raised to an exponential power at which raising it any further will not affect the value of the elements within. Raising a matrix to the power of two finds all of the connections that are separated by one intermediate point. Raising a matrix to the power of three will find any connections separated by two points, and so on. For a matrix the size of four, the highest you need to go is the fourth power, for a matrix the size N the highest power you need to go is N, because after that point you can see from matrix multiplication that the values don’t change and if you’re looking for connections within a network you’ll begin creating a cycles after that point rather than finding new connections since you've already traveled through all of the other vertices.
            Matrix multiplication, going from R to R^2, to R^3…R*, is probably the easiest method you can look at to visualize a procedural path to reach the transitive closure of a relation. Warshall’s algorithm takes a more efficient approach and is a bit trickier to take on by hand and I will just explain a portion of the process. You can see the pseudo-code of the process below taken from Rosen, Kenneth H., Discrete Mathematics and its Applications.

           
           
Looking at the pseudo-code you can see that the for-loops are stepping through each element of the size N matrix N times. First it’s going to iterate through j as i remains fixed at one. According to Pearson Addison-Wesley, “If an element in row i and column j is 0 in R(k-1), it has to be changed to 1 in R(k) if and only if the element in its row i and column k and the element in its column j and row k are both 1’s in R(k-1).”
            Implementing Warshall’s algorithm in Java is pretty straightforward if you follow the pseudo-code. The hardest part for me was just figuring out how to read and write matrices, but if you’ve done that before you’ll probably pick this up quickly. First we need to handle our imports, create a class, and declare our variables.





            I’ve declared our input adjacency matrix, which is a two-dimensional array, the size of our matrix, and the String filename which we will use identify an input file. Now I will show my implementation of Warshall’s algorithm.





            I’ve created a method that takes in a matrix. If you want to be proper you can copy the matrix you take in so that you don’t overwrite it and return the copy matrix. Now I will create a print method that just prints our matrix so that we can test out our code and see if it’s working.



            This print method just goes through each element of each row and column and prints it to screen. I also added a space between elements a new line at the end of each row to make it more easily readable, all stuff you can play around with until it’s pretty. Next up I’m going to create a Main class that will read in a matrix from filename specified from user input and print the original matrix along with the transitive closure matrix to screen.






            Alright, so now we have a little functioning program that reads in an input file, prints the input file, computes transitive closure and prints the transitive closure matrix. As you may have noticed in my code I am taking the matrix size from the first line in the input file so be sure to setup your input file the right way if you’re testing this out. So now I am going to create an input file containing the matrix from earlier and try this out. First I’ll show the input file contents and then the output of the program.

Filetest.txt-
4
1 0 1 0
1 0 0 0
0 1 0 1
1 0 0 1



            And if everything works out and there isn't any syntax errors, this is what we should get! So using this algorithm you can take in a matrix of whatever size (processing time does begin to stack up when taking in a matrix size 1000) and see what connections exist either directly or through intermediate vertices within a network, graph, or relation. And as we can see with the matrix we were working with earlier, it is possible for each node in this network to connect to each of the other nodes.