Graph Sparsifier - Nonuniform sampling

티스토리 뷰

공부/CS theory

Graph Sparsifier - Nonuniform sampling

구사과 2024. 5. 21. 04:38

Part 1. Uniform Sampling

Given an undirected unweighted graph $G = (V, E)$, A $(1 \pm \epsilon)$ cut sparsifier $H = (V, F, w)$ satisfies the following:

$F \subseteq E$
For all $\emptyset \subsetneq S \subsetneq V$ it holds that $\delta_w(H, S) \in [(1 - \epsilon)\delta_{\mathbf{1}}(G, S),(1 + \epsilon)\delta_{\mathbf{1}}(G, S)]$

This is a sparsification that preserves the cuts. Compare this to the distance spanner, which preserves the shortest path.

Some remarks:

$\delta_w(G, S)$ is a cut function which is a sum of $w(e)$ for edges $e$ that goes between $S$ and $V - S$
The original definition of cut sparsifier does not have $F \subseteq E$, but all algorithms in our scope satisfy that condition, so we will just state it in the definition.
We assume that the input graph is unweighted.

4. Non-uniform sampling

In part 1 of the article, we discussed about two shortcomings of uniform sampling:

It doesn't sparsify if $k$ is large.
$k$ cannot be efficiently computed.

We will get back to the second issue later, and let's focus on the first issue. Consider the following dumbbell, where two large complete graphs ($K_n$) are connected by a single edge:

Note that every sparsifier should sample that edge between the $K_n$, while the edges in $K_n$ are unimportant. This motivates the need for non-uniform sampling - we should sample important edges with higher probabilities and vice versa.

We need to know how important each edge is. One idea is to use the cut between two endpoints of edges to estimate its importance - in other words, $p_e = \frac{c \log n}{\epsilon^2 k(e)}$ where $k(u, v)$ is the min $u-v$ cut. This makes sense since edges with smaller cuts are more important. For the above example, this yields a sparsifier. But in general, we don't know if the resulting graph is sparse or if it preserves all cuts. It turns out that both of these are true:

Non-uniform sampling with cuts. Sample each edges with $p_e = \frac{c \log n}{\epsilon^2 k(e)}$, where $k(u, v)$ is the min $u-v$ cut.
Theorem 1. The resulting graph has $O(\frac{n \log n}{\epsilon^2})$ edges whp.
Theorem 2 (FHHP11). The resulting graph preserves all cuts.

Theorem 1 is not hard to prove, and we will come back to that later, but Theorem 2 was an open problem for at least a decade, as the original pioneer of non-uniform sampling didn't know how to prove this. Instead, they resorted to the other function of $k(e)$, which we call a strength between two vertices.

Definition ($k$-strong subgraph). A $k$-strong subgraph is a $k$-connected vertex-induced subgraph.
Definition ($k$-strong component). A $k$-strong component is a maximal $k$-strong subgraph.

This definition looks artificial at a glance. However, the following simple algorithm for $k$-strong component will motivate the intuition. Given a graph $G$, if a min-cut is at least $k$, the whole graph is a $k$-strong component. Otherwise, divide the graph into two pieces $S, V\setminus S$ by finding any cut of size $<k$ between them, and recursively compute a $k$-strong component of $G[S], G[V\setminus S]$.

We can see the following:

Each resulting partition is a $k$-strong subgraph (Trivial)
No $k$-strong subgraph can go across more than one partition (Otherwise, you can find a cut of size $<k$ in it - consider the first time where that subgraph was in more than one partition)
Each $k$-strong subgraph is contained in at most one partition
Each resulting partition is a $k$-strong component
Every $k$-strong component is computed by this algorithm, regardless of the choice of cut we made (as long as the size $<k$)

If we modify our algorithm to divide our graph by minimum cut instead, it will be equivalent to the $k$-strong component algorithm for all $k$—specifically, we can take a partition until the min-cut hits $k$. Then, we can see that the structure of the $k$-strong component is _laminar_—they have a tree structure.

Now we are ready to define $k(e)$ and the non-uniform sampling with strengths.

Definition (strength of edge). For an edge $e$, the strength of an edge $k(e)$ is the maximum $k$ where both endpoints of the edge belong to some $k$-strong component.
Non-uniform sampling with strengths. Sample each edges with $p_e = \frac{c \log n}{\epsilon^2 k(e)}$. If $p_e > 1$, cap it to $1$.

The following lemma proves that non-uniform sampling with strengths is sparse in expectation - concentration quickly follows from Chernoff's bound.

Lemma 3. $\sum_{e \in E} \frac{1}{k(e)} \le (n-1)$
Proof. We apply induction on $n$. For any graph on $n$ vertices, let $\kappa$ be its min-cut. If $\kappa > 0$, we have $\kappa$ edges of $k(e) = \kappa$, which contributes to above quantity by $1$. If $\kappa = 0$, then it contributes to the above quantity by $0$. By inductive hypothesis other edges in remaining induced subgraphs contribute at most $n-2$. $\blacksquare$
Corollary. This proves Theorem 1 - you can observe that $k(e)$ is at most the cut between $u, v$, so the sum of $\frac{1}{k(e)}$ should be not smaller than the cut choices. The difficulty lies in showing all cuts are preserved.

Then we will show that this sampling preserves all cuts.

Theorem 4. Non-uniform sampling with strength preserves all cuts.
Proof. We first use a trick to convert our weighted sparsifier to an unweighted one, so that we can use Chernoff's bounds.

Let $q = \frac{\epsilon^2}{c \log n}$, then each edge in our sparsifier will have a weight $q \times k(e)$, and we pick it with probability $\frac{1}{q \times k(e)}$. We will consider a graph $F_1, F_2, \ldots, F_{|E|}$ where each graph $F_i$ only contains edges with $k(e) \geq i$ in the same vertex set. For each graph, we pick an edge with probability $\frac{1}{q\times k(e)}$, and set the weight $q$. We can see:

For each cut $S$, the sum of cut function for $F_i$ is same as the original graph.
For each cut $S$, If a cut value is concentrated around its expectation for all $F_i$, it is preserved for $G$.
- A random choice for edge with strength $k(e)$ corresponds to a random choice for $F_1, \ldots, F_{k(e)}$. If they preserve all cuts, we can sum them. Note that choices in $F_i$ are not independent, but it doesn't matter because expectation has linearity, and whp in poly(n) subproblem implies whp in total (union bound)

So, what we want to argue is the following:

Claim. For each $i$, if we select edges from $F_i$ with probability $p_e$, then all cuts are concentrated around the expectation ($1\pm \epsilon$) with high probability.

Proof. If each cut in this sampled graph has an expected weight of at least $\frac{1}{q}$, we can apply the argument in Ch3.1 of the previous article, and we are done. Let $S$ be a non-trivial cut and $\delta(S) = |E(S, V \setminus S)|$ be its size. If this cut is empty, we are done. Otherwise, for each edge in this cut, their strength is at most the size of this cut, so their sampling probability is at least $\frac{1}{q \times \delta(S)}$. Multiply this by $\delta(S)$, and we have the expectation of at least $\frac{1}{q}$, and we are done. If $\delta(S)$ is too small, then every edge constituting the cut will be sampled and no randomization argument is necessary.

Remark. Now, let's go back and see why we used strength instead of connectivity. The concept of connectivity sometimes gets weird. In the graph below, edge 0 - 1 has connectivity 3, while all other edges have connectivity 2. In the only non-trivial cut in $F_3$, the connectivity (3) is not at most the size of the cut (1).

5. Spectral Sparsification

So far, we know two cost function selections that work: connectivity and strength. Another selection of cost function works which is effective resistance:

Definition. Given a graph $G$, an effective resistance $R(u, v)$ between $u, v$ is the minimum of $\sum_{e \in E} f(e)^2$, where $f$ is a set of unit flow from $u$ to $v$.

Let's look at several characteristics of this function. Note that the sum of $f(e)$ for all cycles is zero. This is because the derivative through the cycle should be zero, as the resistance is minimized. As a consequence, there is a function $P(v)$ for all vertices $v \in V$, such that $f(u, v) = P(v) - P(u)$. A way to derive this is to consider a spanning forest and compute the value $P$ wrt the forest - then the non-forest edge has no choice other than accepting it. Here, $P(v)$ is the voltage, and $f(u, v)$ is the current, and then many of this makes sense:

First Kirchhoff - the sum of current for each vertex should be zero. (Flow conservation)
Second Kirchhoff - the sum of voltage should be zero throughout the cycle. (Potential exists)
Ohm's Law - $V = IR$ ($f(u, v) = P(v) - P(u)$, since $R = 1$)

This analogy to the physics system may make more sense if you consider a graph weighted. Then, the effective resistance is defined as minimizing $c(e) f(e)^2$, where $c(e)$ will work as a resistance in the given circuit.

Now, we will introduce (without proof) the following result:

Non-uniform sampling with resistance (Spectral Sparsifier). Sample each edges with $p_e = \frac{c \log n}{\epsilon^2 R(e)}$.
Theorem 5 (SS08). The resulting graph has $O(\frac{n \log n}{\epsilon^2})$ edges whp and preserve all cuts.

Actually, the guarantee is somewhat stronger. Let $L(G)$ be the Laplacian matrix for $G$ and $L(H)$ for the sparsifier, we have:

Theorem 6 (Reason why it is called spectral). For all $x : V \rightarrow \mathbb{R}$, $(1 - \epsilon) x^TL(G)x \le x^T L(H) x \le (1+\epsilon)x^T L(G) x$

Since $x^T L(G) x = \sum_{(i, j) \in E} w(i, j) (x_i - x_j)^2$, this implies a cut sparsifier.

Spectral sparsifiers are key ingredients for linear algebraic algorithms, and they are somewhat equivalent to the fast Laplacian solver; in a sense, one fast algorithm implies one in the other. Currently, the dependency seems on the solver side, as most fast Laplacian solvers use spectral sparsification or other similar combinatorial algorithms.

6. Algorithm for sparsification

Karger's original algorithm for cut sparsification was somewhat involved, but people obtained quite a simple algorithm over time, some of them even working for spectral sparsifiers. The main idea is that small cuts are hard and large cuts are easy, but small cuts are relatively easy to sparsify. For example, any $t$ disjoint spanning trees will preserve all cuts of size at most $t$.

You don't even have to compute disjoint spanning trees. Given a graph $G$, repeat the following for $t$ times - compute the spanning forest, add it to the sparsifier, and remove it from $G$. Let's call this a $t$-bundle spanning tree of $G$. This can be computed via the HdLT algorithm, for example. Now, for every cut of size at most $t$, all of its edges belong to a $t$-bundle spanning tree of $G$, so they can be preserved.

For the remaining edges, we sample half of them and recursively compute the sparsifier.

We can prove the following:
Theorem 7. Executing the above algorithm for $t = O(\log (n)/\epsilon^2)$ gives a $(1\pm \epsilon)$ cut sparsifier of size $O(\frac{n \log^2 n}{\epsilon^2})$.

Partial Proof. We will use a little more naive analysis: We will prove that $t = O(\log^3 (n)/\epsilon^2)$ gives such a sparsifier.

Given a graph $G$, we define the elevation of $G$ $\text{ele}(G)$ as following: Take $\frac{2 c \log n}{\epsilon^2}$ -bundle spanning trees, and add them in $\text{ele}(G)$ with weight $1$. Then, sample the half of remaining edge, and add them in $\text{ele}(G)$ with weight $2$. We show that $\text{ele}(G)$ is a $(1\pm\epsilon)$ cut sparsifier of $G$. Surely $ele(G)$ preserves the expectation for all cuts. Consider a cut of $G$. If all edges of it are in $\text{ele}(G)$ with weight $1$, we are done. Otherwise, since there exists an edge in cut, not in spanning tree/spanner, the cut has at least $\frac{2 c \log n}{\epsilon^2} + 1$ edges. In this cut, if we replace each deterministic weight $1$ edge with weight $2$ edges with probability $1/2$, we can apply Lemma 2 (from the previous article), and we are done. If this replacement has concentration around the mean, then tying some edges exactly in their expected value would not hurt the concentration, which is exactly what happens in our computation of $ele(G)$.

Our algorithm can be described as repeatedly picking some subset of edges and replacing it with its elevation. Let $\epsilon^\prime$ be the choice of epsilon there. As shown above, that process increases the multiplicative error by $\epsilon^\prime$ for all cuts. Hence, we will have $(1\pm\epsilon^\prime)^{\log n}$ errors in the end. If we take $\epsilon^\prime = \frac{\epsilon}{2 \log n}$, we can obtain an $(1\pm\epsilon)$ cut sparsifier.

Remark. I believe that a little more careful analysis can remove a $\log^2(n)$ factor for the size. One idea is to show that the error blowup is in geometric terms, like $\epsilon^\prime, \epsilon^\prime / 2, \epsilon^\prime / 4, \ldots$.

Note that the only property we used for spanning trees is that it's sparse and it saturates each cut. Distance spanners can do the same thing. We can compute the $t$-bundle $\alpha$-stretch spanner, and the proof should work out similarly. In fact, using a spanner will give us a stronger guarantee:

Theorem 8. (Kou14) Executing the above algorithm for $t = O(\frac{\log^2 n}{\epsilon^2})$-bundle $O(\log n)$-spanner gives a $(1\pm\epsilon)$ spectral sparsifier of size $O(n \frac{\log^3 n}{\epsilon^2})$.

References

'공부 > CS theory' 카테고리의 다른 글

Shortest path with negative edges (1)	2024.08.14
Skew heap (0)	2024.07.25
Graph Sparsifier - Uniform Sampling (0)	2024.05.18
Graph Spanners (0)	2024.04.14
Randomized algorithms - 2023.12.08 (0)	2023.12.08

공유하기 링크

페이스북
카카오스토리
트위터

공지사항

최근에 올라온 글

Total

Today

Yesterday

티스토리 뷰