Simplex method

(This thread was started in response to this post.)

The Simplex method is a popular method for solving linear programming problems. The idea behind the algorithm is simple. In 2D, each linear inequality in your problem is just a line where you are allowed to pick points on one side but not the other. Putting several such inequalities together, you cut out a many-sided polygon like the following:
http://www.keymath.com/Images/keymat...s/06/DAA61.gif

If your linear system has more than two variables, you can cut out a similar shape in higher dimensions called a "polytope". It can be shown that the objective function of your linear programming problem is maximized or minimized at the vertices of the resulting polytope (see Thm. 2.5 from (1)). The Simplex method takes advantage of the fact that there are only finitely many vertices to test. In principle, one could compute each vertex (see below for a method) and find out which one maximizes the objective function. The Simplex method is smarter. It starts at an arbitrary vertex and travels to another adjacent vertex where the objective function is higher. After a relatively small number of iterations, this process terminates at a vertex where the objective function is maximized.

To actually perform this computation, we need a way to easily compute adjacent vertices. What follows is a method for computing all vertices; this turns out to also give adjacent vertices easily.

A general linear programming problem can be formulated as follows: we have n variables (x1, ..., xn) = x. We have an objective function c1*x1 + ... + cn*xn = c^T x which we want to maximize or minimize subject to some constraints on the allowed values of x. Each constraint is of the form a_j1*x1 + ... + a_jn*xn ~ bj where ~ is =, <=, or >=. Such a problem can be converted into an equivalent, "standard" form by modifying constraints and variables. For instance, the constraint x1 <= 10 can be converted to equality (which is algebraically simpler) by introducing an extra "slack" variable, x0 = 10 - x1, and an extra constraint, x0 >= 0. Then x1 + x0 = 10 is equivalent to x1 <= 10. Through this and similar operations, we end up with the standard form of the constraints given by Ax = b and x >= 0 where A is a matrix with n rows (one per variable, after modification) and m columns (one per constraint, after modification). This process is detailed in (1), section 2.1, and in (2). An important note: if the rows of A are linearly dependent, some constraint is a consequence of the others and can be dropped. The rows of A can then be taken to be linearly independent. In general the rank of A is taken to be m, the number of rows.

One can characterize all vertices x = (x1, ..., xn) of a polytope of a linear program in standard form as follows. Denote the ith column of the A matrix by ai. Arbitrarily choose (n-m) of the xi to be 0. Make a new matrix B out of the columns ai corresponding to the m remaining xi's. Note that each ai is a length m column vector, so B is an m by m square matrix. If B is not invertible, we cannot make a vertex this way. If B is invertible, we may compute the m remaining xi, collected in the vector x_B, by using x_B = B^-1 b. The resulting x vector is then such a vertex, *provided* the components of x_B are all non-negative (remember we need x >= 0). Moreover, all vertices arise in this manner (see Thm. 2.3 and Thm. 2.4 of (1) for proof). At this point we in principle have a very slow method for solving linear programs--convert to standard form, find all vertices, and figure out which one maximizes the objective function. There can be a great many ways to pick (n-m) variables to zero out, though, which makes this method completely unworkable for virtually all real problems.

A "pivot operation" converts a vertex x into an adjacent vertex x'. In the notation of the previous paragraph, we had chosen (n-m) of the xi to be 0. In pivoting, we keep all but 1 of these choices the same when computing x'. The resulting matrix B' is the same as B except that one column is different. First, we must ensure x' >= 0--that is, that we stayed on the polytope. Second, for a pivot to be worthwhile, the objective function must increase in going from x to x'. These conditions can be restated more algorithmically, as is done in Thm. 2.6 of (1). As one might suspect, if no pivot satisfies both conditions, we've found our global optimum (see Thm. 2.7 of (1)).

The Simplex algorithm fundamentally reduces to the above. There is a lot of "bookkeeping" involved, though. To keep track of all the relevant variables in a "cleaner" format, a tableaux is used. There is also an issue with standard form: it's unclear whether or not any starting vertex even exists (i.e. if the linear program has a solution), and if one does, how does one go about finding it? Computing with tableauxs and dealing with starting vertices are discussed at length in Chapter 3 of (1). The terminology used is defined in the previous chapters. If you want, I can summarize that chapter as well, though for my own personal knowledge I'm pretty satisfied with the above. Another form, the feasible canonical form, gives a trivial starting vertex. In principle, I could convert that form's trivial starting vertex back to standard form and use pivoting through the application of Thm. 2.7 of (1) to compute the next adjacent vertex, which essentially finishes off the algorithm. The only remaining issues should be computational black boxes--like computing matrix inverses or perhaps testing for their existence. Tableauxs are presumably a clean way of doing essentially this, but from a purely computational perspective. Since I don't actually need to implement the algorithm, I'm personally fine without them. The really clever bits of the algorithm seem to end here.

References:
(1). Dr. Juncheng Wei, Lecture Notes available from http://www.math.cuhk.edu.hk/~wei/LP.html. See chapters 1, 2, and 3. Note that there are various small errors (typographic or grammatical) and some omissions (like glossing over rank conditions much of the time). These have not been vetted for publishing, but seem quite good regardless.
(2). Wikipedia's Simplex Algorithm page, http://en.wikipedia.org/wiki/Simplex_algorithm. The explanation is very much lacking in details. This is really not suitable to learn from exclusively, though it does give some examples and touches on the important aspects of the algorithm. In particular note the Cycling section, which discusses a potential (though apparently rare) pitfall of standard Simplex.