School of Computer Science researchers have won the Best Paper award at ACM-SIAM Symposium on Discrete Algorithms (SODA) for their new approach to solving linear systems.
Assistant Professor Richard Peng and Professor Santosh Vempala, who holds the Frederick Storey Chair in Computing, discovered that by combining symbolic computing tools with random matrix theory, sparse linear systems can be solved slightly faster than directly invoking matrix multiplication / inversion.
SODA is the premier conference in algorithms and is known to be highly competitive. This year’s conference received 637 submissions and accepted 180 papers. Although up to three papers can be selected, the program committee chose just one paper for the best paper award.
The history of linear systems
Solving a linear system is one of the most foundational computational problems in science and engineering.
“This problem has been around for a long time, even before computers,” said Vempala. “Much of modern computation runs on linear systems — including recommendation engines like web search, machine learning, and scientific computing simulations.”
Algorithms for solving systems of linear equations are taught as early as grade school, with the most common ones involving manipulating matrices, which in turn represent coefficients on the variables. Yet as their applications become more complex, these methods become less efficient as the matrices become more costly to manipulate.
“The matrix-based algorithms work fine when you have a room full of mechanical calculators trying to solve a system on 50 variables, where the matrix fits on a single large blackboard,” said Peng. “One of the big differences in modern data is this matrix gets really big: instead of 50x50, it’s a billion x billion. Most of the entries start off as 0s, so it’s often not even feasible to write down dense matrices such as the inverse due to storage constraints.”
One example of a large sparse matrix is modern networks. The matrix for a website has every vertex link to another website, creating a gigantic matrix. This matrix is also naturally sparse: it’s mostly made up of 0s. Similar situations also arise in everything from weather simulation to measuring road traffic. As a result, tremendous computing power, including supercomputers, has been devoted to solving linear systems and problems related to them. For a long time, problems related to solving linear systems routinely consumed the most computational power, until the recent emergence of blockchains.
Since the 1970s, computer scientists have preferred using iterative methods to solve large sparse linear systems due to the difficulty of storing dense matrices corresponding to inverses. Surprisingly little is known about the best possible runtime guarantees of these algorithms on arbitrary systems; analyzing iterative methods is particularly complex because more and more bits need to be maintained with each step to ensure overall accuracy.
Speedups via batching
Peng and Vempala realized they could solve the systems using fewer steps by considering many partial solutions in parallel. Instead of iterating one vector at a time, they iterate on a block of s vectors. This reduces the number of iterations, and effectively the overhead in bit complexity, from n to n / s.
On the other hand, the interactions between the batches of s vectors across different steps become more complex. Instead of a single value, they become s-by-s blocks.
More surprisingly, the interactions across steps are also structured. In the Gram matrix, which measures the pair-wise projections between steps, the interactions between steps that add up to the same value are exactly the same. Effectively steps 1 and 7, steps 2 and 6, and steps 3 and 5 all have the same interactions.Turning this idea into an algorithm relies on generalized tools developed from symbolic computing, namely for Hankel matrices, to work in this more general block setting.
Symbolic computing methods are limited because they only work under exact arithmetic. Many previous works studied systems over finite fields, where there are no round-off errors. So existing literature does not address the bit-complexity issue at the crux of the runtime efficiency of iterative methods.
In order to bound the bit complexity of intermediate steps, a major undertaking in this research became lower bounding how close the block Krylov space is to being degenerate, or equivalently, how stable is the computation. A simplified version of this phenomenon arises inpicking n random numbers between 0 and 1: the closest pair should be roughly n-2 apart. For the block Krylov method, however, this analysis has to be done to the batches of high-dimensional vectors that arise throughout the iterative steps.
The research critically builds upon ideas developed from smoothed analysis: the study of near-degeneracies in problems under small random perturbations. This is a field of study going back two decades, originating with the study of perturbed linear programs.
The new algorithm is a significant contribution to not only linear systems solving, but also the understanding of optimization and random matrix theory. “The conjecture was that you cannot solve a system faster than matrix multiplication,” said Vempala. “The fact you can go beyond that opens up entire new lines of research.”
The researchers will present the paper, Solving Sparse Linear Systems Faster than Matrix Multiplication, at SODA Jan. 10 through 13.
Image: A block Krylov space, K, formed from a set of starting vectors B and a matrix A (above), and the block-Hankel structure of its Gram matrix with equal blocks highlighted in red.