CMU Advanced Algorithms

The Final Three Lectures

2009-11-23T10:59:00.000-08:00

Today's lecture, and the two lectures next week are the final three lectures of the course. Today we will do a bit more on approximation algorithms, in particular we will discuss the max-cut problem.

Next week we'll talk a bit more about online algorithms via solving LPs, and also on using the ideas behind online learning algorithms to solve LPs offline.

Lecture 21

2009-11-18T13:52:00.000-08:00

The deterministic integrality gap instance is super-simple: take all the d-bit strings that have at least d/2 1's in them. There are d sets, one for each dimension, with the set S_i containing all strings with the i-th bit 1.

Now OPT is at least d/2: else we would have covered less than d/2 positions, and hence missed some element.
But the LP solution picks each x_S to a fraction 2/d: hence each element is fractionally covered to at least 1.

For d = log n, the universe has n elements. So this gives a Omega(log n) integrality gap. You should try to see whether you can get an integrality gap larger than this. Also, to fill in the details of the randomized construction I sketched in class.

Lectures 20 and 21 (today and Wednesday)

2009-11-16T18:04:00.000-08:00

Danny lectured today on online algorithms, in particular, on ski-rental, list-update, and paging/k-server problems. Some directions to think about, before we briefly return to online algorithms in a post-thanksgiving lecture.

It is a easy exercise to show that in ski-rental, you cannot do better than 2-r/p using any deterministic algorithm. (I always think of r=1, in which case this is 2-1/p.) Can you get a randomized algorithm that has a competitive ratio bounded away from 2, independent of p?

For example: consider the algorithm that buys at day 2p/3 with probability 1/2, and at day p with the remaining probability 1/2? What is the competitive ratio for this algorithm?
- If the sequence has length less than 2p/3, we are optimal.
- If the sequence has length i in [2p/3,p), the algorithm's expected cost is 0.5(2p/3+p) + 0.5(i), and the optimal cost is i---giving us c(alg)/c(opt) = (5p/6i + 1/2). For i in this interval, this ratio is bounded above by (5/4+1/2) = 1.75.
- If the sequence has length >= p, the algorithm's expected cost is 0.5(2p/3+p) + 0.5(2p-1) <= 5p/6+p = 11/6p. Hence c(Alg)/c(opt) <= 11/6.
The worst of the three options is 11/6, which is the competitive ratio for this algorithm. This shows that one can do strictly better than 2 (independent of the value of p) using randomization. How much better can you get for the ski-rental problem using randomization? How do you show lower bounds for randomized online algorithms? More about this in an upcoming lecture.

Note that the analysis for randomized algorithms computes the expected cost of the online algorithm for every fixed sequence, and compares that to the best offline algorithm's cost on the same sequence. It is as though the adversary chooses the sequence sigma knowing the algorithm but not the random choices made by the online algorithm---such an adversary is known as an oblivious adversary. We will discuss more about adversary models in a future post.

For the paging problem where you have a set of n pages, but can only store k pages in a cache, the generalization of the cat-and-mouse game is clear: we now have n-k mice and n positions. It's fun to think about what the algorithm for this case should be.

For Wednesday's lecture, we will do a quick introduction to approximation algorithms. If you are already familiar with the topic, this lecture may be less useful to you---but it seemed wrong to do a course on algorithms without at least touching upon this topic, and talking about some of the concepts used and the techniques commonly employed.

Lecture 19 notes

2009-11-11T14:11:00.000-08:00

I think I was over-simplifying the proof of Fact 2 (the expected number of violated constraints by a random r-set is at most md/r, where m is the total number of constraints, d the dimension and r the sample size): however, here is a just-as-simple proof of that fact.

Note that given a set R, a constraint h (not in R) is violated by the optimal point for R only if h is in the optimal basis for (R union h). Take a random h outside a random r-set R---now h is a random element of (R union h)---and hence h is one of the d basis constraints with probability d/(r+1). Hence the expected number of violated constraints is (m-r)d/(r+1) <= md/r. QED.

This is a "backwards-analysis" style proof: instead of considering what happens when we add a random constraint h to a random r-set R, we consider what happens when we remove a random constraint from a random (r+1)-set R'.

Homework #3

2009-11-10T15:07:00.000-08:00

There was a bug in problem #4b of the homework (thanks, Srivatsan): I've removed that part.

Long-delayed Posts II: Cutting Planes

2009-11-06T06:34:00.001-08:00

This is a vast topic dealing with the following question: you are given a convex polytope P in R^n (say, as the intersection of some halfspaces), and you only care about the integer points inside P (i.e., S = P intersect Z^n). How do you come up with some halfspace h such that S lies in h, but some portion of P lies outside the halfspace (and hence is "cut off")?

Ideally you want to find a set H of halfspaces so that P'(P,H) = P intersect (intersection of halfspaces in H) equals convex-hull(S) --- note that this P' is the smallest convex body that can contain all of S.

As a simple illustration, if you have the following inequalities

3x + 4y <= 17 2x + 6y <= 22 x, y >= 0

Graphing this, we get:

Adding the first two inequalities gives us

5x + 10y <= 39

x + 2y <= 7.80

But since we only care about integer values of x,y, we can round down the RHS, and get the following "valid" inequality:

x+2y <= 7.

Similarly, the fact that x>=0 along with the second inequality implies that 6y <= 22, and using the same rounding-down idea, we get another "valid" inequality:

y <= 3.

And similarly (using the first inequality, and y >= 0), one gets

x <= 5.

Note that each of these cuts off some portion of the original polytope; though we are still left with a fractional vertex (5, 0.5). (How should we cut this fractional vertex off?)

Note that we used some simple rules-of-thumb to generate these inequalities that were valid for all integer points S inside P, but cut off some portions of the polytope. One can ask: are there automated procedures to generate such valid inequalities in a systematic fashion? Are these procedures to generate these cutting planes "complete" --- do they eventually give leave us with the convex hull of S? How much time do these procedures take?

You can refer to entire books on the subject (e.g., Schrijver's book on Linear and Integer Programming). There has been much interest in some of these advanced techniques: the Lovasz-Schrijver method, the Sherali-Adams method, and the Lassere method. We may blog about this sometime later, but before we end, let us see how the simple idea about rounding down the right hand side is applicable to the matching polytope.

The Matching Polytope

One can use same ideas as above to get the Edmonds' matching polyope formulation from the naive matching constraints: start with

x(delta v) <= 1 for all v
x_e >= 0

now take a set A of vertices, and add up the constraint x(delta v) <= 1 for v in A. You get

2 \sum_{e subsetof A} x_e + \sum_{e in delta A} x_e <= |A|.

Now since the x_e's are non-negative, drop the second summation to get

\sum_{e subsetof A} x_e <= |A|/2.

But since all integer solutions give an integer value on the left, you can replace the right by

\sum_{e subsetof A} x_e <= floor(|A|/2).

Integrality of the Bipartite Perfect Matching Polytope

2009-11-05T10:40:00.000-08:00

Just to elaborate on what Yuan mentioned in class yesterday: (yet) another way to prove the integrality of the bipartite perfect matching polytope BPM is via the following fact:

Given a Delta-regular bipartite graph, it can be decomposed into Delta perfect matchings.

(The proof is via Hall's theorem: show that there exists a perfect matching in any regular bipartite graph, remove this perfect matching from the graph, and repeat.)

Suppose there was a non-integer vertex x* of BPM, note that since it is defined by some set of tight constraints with integer coefficients, this vector x* must be rational. Now you can scale the numbers by some integer C to get an integer vector Cx*. But this integer vector can be viewed as a bipartite multigraph where every vertex i has degree C(\sum_j x*_ij) = C(1) = C. Now we can use the fact above: this vector Cx* can be written as the sum of C perfect matchings.

Hence x* can be written as 1/C*this sum of perfect matchings, or a convex combination of perfect matchings, which means that it cannot itself be a vertex (extreme point) of the BPM.

Lecture 17: Separation for Odd Cuts

2009-11-05T08:13:00.000-08:00

Yesterday we reduced the separation problem for the matching polytope to the following problem:

given an undirected graph G, find a minimum cut (A,A')
such that A has odd cardinality,
and A does not contain the special vertex s.

We showed that if the weight of the edges crossing such a min-cut was strictly less than 1, then A would correspond to a violated constraint in the (exponentially-sized) LP. And one could find the min-cut (B, B') in the graph in polynomial time. But what if the side B not containing the special vertex s in this min-cut turned out to be of even cardinality? Well, It turns out we need a little more work.

To handle this, we do the following: first, we recurse on G/B (which is the graph obtained by shrinking the set B to a single node), and on G/B' obtained by shrinking B' to a single node. In the latter, we treat the new node obtained from shrinking B' as the "special" node.

Why would one of the two recursions succeed in finding a minimum odd cut? Suppose (A, A') was a minimum odd cut we were looking for (with s not in A). Recall that (B, B') was the mincut but B was even. And s is in neither of A or B,

If A lay completely inside B or B', we'd find it in one of the recursive calls.
If A contains B, we'd find it the recursive call on G/B.
Suppose A intersects both B and B' but does not contain B. Then consider the 4 parts: A cap B, A cap B', A' cap B, and A' cap B'. |A| is odd, |B| is even. So either |A cap B| is odd, or |A cap B'| is odd.
1. Say it's the former: |A cap B| is odd. A' cap B' contains s, so it's non-empty.
  
  Now we bust out the following fact (which is called submodularity of the cut function): if cut(X) is the weight of all edges going out of some set X,
  cut(X) + cut(Y) >= cut(X cap Y) + cut(X cup Y).
  
  So cut(A) + cut(B) >= cut(A cap B) + cut(A cup B) = cut(A cap B) + cut(A' cap B').
  
  cut(A) = min-odd-cut.
  cut(B) = min-cut.
  cut(A' cap B') >= min-cut, since A' cap B' is non-empty.
  cut(A cap B) >= min-odd-cut.
  
  Putting these together, the only way to satisfy submodularity is when we have equality everywhere. so cut(A cap B) = min-odd-cut, and we will find it recursively.
2. If it's the latter: |A cap B'| is odd. Since A does not contain all of B, A' cap B is non-empty. Now apply submodularity to A and B', and repeat the same argument.
This shows that at least one of the recursive calls to G/B or G/B' still has an odd-cut with the same value as cut(A), and hence we will find it recursively.

Finally, we need to bound the runtime of the entire procedure, but that can be bounded by poly(n).

Long-delayed Posts I: the Hirsch Conjecture

2009-11-05T07:39:00.000-08:00

In Lecture 16, when we'd mentioned the simplex algorithm, I'd mumbled something about the diameter of polytopes and the Hirsch conjecture --- here are some details.

If you consider a polyhedron defined in d dimensions by the intersection of n half-spaces, then the Hirsch conjecture says that the diameter of the 1-skeleton of such a polyhedron is at most n-d. We've been using n dimensions (variables) and m half-spaces (constraints), but I will stick with the n,d notation --- you can do the translation. Kalai and Kleitman showed (in this two-page paper) that the diameter is at most n^{2 + log d} --- and this is essentially the best result known.

This problem was of interest to us, because if the diameter of some LP polytope were indeed large, then the simplex algorithm starting from a bad vertex might require a large number of steps. Of course, we only care about polynomiality, so we would be happy with the polynomial Hirsch conjecture: the diameter of the 1-skeleton is at most poly(n,d).

Some more resources:

A series of blog posts by Gil Kalai's blog: here is one of them. His posts contain a huge wealth of information about the conjecture, its variants and generalizations.
A recent survey by Santos and Kim.

Lecture 15 notes

2009-10-28T16:24:00.001-07:00

Some notes about today's lecture:

I mentioned that there are integrality results for some natural LPs for MST, branchings, shortest paths, matchings. Basically, this means that one can write LPs for these problems such that the basic solutions (i.e., vertices of the resulting polytope) are all 0-1 vectors. It was all a bit too fast towards the end, and so we will try to cover integrality for LPs in another lecture later.
For the branchings LP to make sense for negative edge weights, we should throw in a constraint that the total number of arcs chosen is at most n-1 (or that at most one arc should be chosen from the out-arcs of each non-root vertex). Else negative edge weights might cause the LP to choose too many edges, or even to choose edges to an extent greater than 1.
Finally, Srivatsan wondered about the fact that the size of the LP we wrote for branchings was exponentially sized: what's up with that? Two answers to this.

One: we can use the LP purely for analytical purposes. If we use Edmonds' cycle shrinking algorithm from Lecture 5, we end up showing an integer primal solution of some value, and a feasible dual solution of the same value. And since these primal and dual solutions are feasible, and of equal value, both must be optimal!

Second: wait until next lecture, we will show how to solve this exponentially sized LPs under certain conditions. For now, just observe that that if you are given a feasible (fractional) solution x, and it does not satisfy all the constraints, you can find some violating contraint in polynomial time. (In case there is a set S such that strictly less than 1 unit of x_e crosses it, there is a min-cut separating some vertex from the root with value strictly less than 1.) Note that if x is not in the branching polytope, this efficient procedure gives us a hyperplane that separates x from the polytope, and hence is called a "separation oracle" for this polytope.
Finally, I mentioned that Edmonds' result shows that every vertex of the polytope defined by the branching LP was integral. This is a slightly subtle point, so I'd like to elaborate.

Edmonds shows that given edge-costs, the branching found by his algorithm wrt these edge costs is also an optimal LP solution for those edge-costs (since he gives a branching that is feasible for the primal, and a matching set of weights that are feasible for the dual).

Now suppose the branching polytope has a fractional vertex (i.e., a extreme point z of this polytope such that not every coordinate of z is 0 or 1). One can show (using convexity etc) there exists some cost vector c for which this extreme point z is the unique optimal solution to the branching LP. But that would contradict the fact that for those edge costs, Edmonds' algorithm would find an integral optimal solution.

Next Monday, we'll give a quick overview of simplex, and then some details about the ellipsoid algorithm.

Notes on Lecture 14

2009-10-28T06:02:00.000-07:00

A proof of the inequality we used (courtesy Danny):

To prove: Log[1 + b] >= 2 b/(2 + b) for b>=0
Note that it's equal when b=0.
You take the derivative of
Log[1 + b] - 2 b/(2 + b), and get
b²/((1 + b) (2 + b)²). This is non-negative
for b>=0. QED.

Wednesday's Lecture

2009-10-26T18:36:00.000-07:00

For Wednesday's lecture (it will be on basics of LPs and duality), have a quick look over the proof of optimality of branchings we did way back in Lecture 5. We'll then give some algorithms to solve LPs in the next couple of lectures.

Monday's Lecture

2009-10-24T09:07:00.001-07:00

Sorry to change things around, but I'll cover some probabilistic inequalities (a.k.a. large-deviation bounds, concentration bounds, Chernoff-type bounds, whatever you might want to call them) in Monday's lecture. Wednesday we'll finally get started on LPs...

Clarification of Today's Lecture

2009-10-21T20:08:00.000-07:00

I've posted the complete and correct argument that I should have used in today's lecture. It's here. I suggest you read it right now while the lecture is still fresh in your mind.

Notes on Matchings

2009-10-19T17:30:00.000-07:00

I've posted a whole bunch of notes on matchings on the course webpage.

Also, if you're interested in blocking flows and would like some more details (or if you just got lost in the discussion today (sorry!)), have a look at Chapter 18 of Dexter Kozen's book (Google book version here). It describes Dinic's algorithm that uses blocking flows, and also gives a faster O(n^3) time version. Using the dynamic trees that Danny will talk about on Wednesday, one can greatly speed up blocking-flow computations, as well as implementations of push-relabel.

Monday's lecture

2009-10-17T19:28:00.000-07:00

Since Danny has to miss Monday's lecture, we'll cover some stuff about matchings in graphs instead, and come back to dynamic trees and speeding up maxflows on Wednesday.

Lecture 10 Notes

2009-10-13T15:00:00.000-07:00

I've added some notes on push-relabel on the webpage. Also, a slight change in schedule: Danny will be lecturing tomorrow on dynamic (and splay) trees and faster implementations of push-relabel.

Max-flow notes

2009-10-09T15:09:00.000-07:00

I assume that most of you are familiar with the notation used for flows on graph, as well as some of the basic results. However, in case you'd like to brush up on some details, we've just posted some notes off the webpage. (CMU/Pitt only, though parts of the book are available through Google books.)

In particular, please have a quick read over the first two chapters for Monday.

Lecture 16: gives the notation (flows, residual graphs, augmenting paths), as well as Ford and Fulkerson's proof that max-flow = min-cut using augmenting paths.
Lecture 17: gives examples where augmenting paths might take a long time (or converges to a non-optimal flow in graphs with non-integer capacities). Then gives two heuristics of Edmonds and Karp to ensure fast convergence.

If you feel like reading the third chapter too (which gives a max-flow algorithm using blocking flows), go ahead --- but I won't rely on that for Monday.

Lecture 8 notes

2009-10-05T13:56:00.000-07:00

T(n) = sqrt(n) T(sqrt(n)) + sqrt(n) and T(4) = 1 does solve to T(n) = O(n).
Just guess T(n) = n - sqrt(n).

Note that I chose the base case to make life easy for myself, choosing the more natural base case T(1) = 1 will not really change anything but the constants. OTOH, if you chose T(n) = sqrt(n) T(sqrt(n)) + n, it would give you T(n) = O(n log log n). But that's a different story, it's not what we needed today.

HW2 clarifications/corrections.

2009-10-04T07:30:00.000-07:00

For problem 1(c), we'd be happy with an implicit representation of the solution. (Thanks, Yuan).

For problem 2, we're looking for a di-diameter of O(alpha(n)), not just alpha(n). (Thanks, Jeremiah.)

Notes from Lecture #6

2009-09-28T16:00:00.000-07:00

Some points from today's lecture:

Firstly, here's a discussion of the pointer machine versus the RAM machine models from the Handbook of Data Structures (just a convenient online ref, you can check out the wikipedia articles on these machine models, or check out other books):

As you can see, random access of arrays (given i, access A[i]) is possible in unit time on a RAM machine, but not in the pointer machine model.

Secondly, we went over Seidel's algorithm for Undirected Unweighted APSP really fast at the end, so here's the description again:

UUAPSP(adjacency matrix A for G)

if A is the adjacency matrix for the complete graph K_n, then return d = J_n - I_n, where J_n is the n*n all-ones matrix and I_n is the identity matrix.
compute B, the adjacency matrix for G^2. (This can be done by setting B_ij = 1 whenever (A^2 + A)_ij > 0, and takes one matrix multiply.)
D = UUAPSP(B)
compute D*A, this takes another matrix multiply.
let d_ij = 2D_ij - 1( (DA)_ij < (D_ij * degree_j) ).
return the matrix (d_ij)

Note that if the original graph is connected, the recursion bottoms out in O(log n) calls. And each call requires two matrix multiplies (computing A^2 and D*A), and another O(n^2) work. Hence the total runtime is O(M(n) log n), where M(n) is time for one matrix multiply.

For the correctness, (DA)_ij = \sum_k D_ik A_kj = sum_{k neighbor of j} D_ik, which we're comparing to D_ij * degree_j in step 5,

We know by Claim 1 that d_ij (the distance in G) should either be 2D_ij or 2D_ij-1, where D_ij is the distance in G^2.

Suppose d_ij should be 2D_ij. In this case, each term in the summation is at least D_ij by Claim 2, and hence the sum is at least D_ij * degree_j. In this case, we (correctly) do not subtract anything and set d_ij = 2D_ij.

On the other hand, suppose d_ij should be 2D_ij - 1. Now Claim 3 implies that each term in the summation is at most D_ij, and at least one term is strictly less than D_ij. Hence the sum is less than D_ij * degree_j. In this case we (correctly) subtract 1, and set d_ij = 2D_ij - 1.

This completes the proof of correctness of Seidel's algorithm for computing the distances. The algorithm to compute (an implicit description of) the shortest paths will be covered as part of the homeworks that accompany this fine television series.

In next lecture, we will cover some other shortest path algorithms: e.g., using the van Emde Boas priority queues, and Goldberg's scaling algorithm to compute valid potentials faster than Bellman Ford. And the homework should be out by tomorrow or Wednesday.

Lecture 5 notes

2009-09-23T17:13:00.000-07:00

Some notes on today's lecture:

The algorithm we saw today was independently due to Chu and Liu (1965), Edmonds (1967) and Bock (1971).
The best running time for the min-cost rooted arborescence is apparently O(m + n log n) due to Gabow, Galil, Spencer and Tarjan, in the same paper that gave the O(m log \beta(m,n)) algorithm for undirected MSTs.
The variant of Prim's algorithm Or had mentioned fails on this example:
I didn't explicitly say why the final two claims proved optimality for the branching F*, here is the reason --- if there were a branching F' that had smaller cost, then by the second claim it would have cost strictly less than \sum_S w*_S. But since w* was a valid weight function, that would violate the first claim.

HW1: updates

2009-09-22T07:01:00.000-07:00

Some changes to problem 2, thanks for Jeremiah for asking the question: please have a look at the new PDF file...

Union-Find Notes (Part 1)

2009-09-20T15:59:00.000-07:00

I've written some notes on Danny's lecture last Wednesday. It doesnt have all the proofs yet (I'll add them soon), but hopefully it will be useful to recap things for lecture tomorrow, when Danny will finish the analysis.

Homework 1 posted

2009-09-14T19:13:00.000-07:00

I've posted the first assignment on the course website.
http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f09/www/