TITLE: Algorithmic Challenges of Modern Distributed Data
Growing amounts of collected and processed data, on the one hand, and throughput, real-time response, and privacy constraints, on the other hand, increasingly lead to computing systems in which the entirety of data is no longer local to any single processing unit. Both cloud computing and computing at the edge pose new unique algorithmic challenges. How does one minimize the number of communication rounds, total information exchanged, and surplus computation while still providing answers in a timely manner?
Apart from giving an overview of the variety of arising challenges, this talk will focus on new techniques developed for processing frameworks similar to MapReduce and Spark. We will show how to obtain significant improvements on direct implementations of classic or straightforward algorithms for graph combinatorial optimization problems and computing PageRank. Our techniques allow for compressing several rounds of computation into exponentially fewer rounds of MapReduce computation. In particular, this line of work has led to the first approximate maximum matching algorithms with sublogarithmic round complexity when the space per machine is linear or sublinear in the number of nodes in the graph.
This is a very quickly developing area of research and many intriguing questions remain open.
Krzysztof Onak is a research scientist in the Mathematics of AI group at the IBM T.J. Watson Research Center. His main research interests concern big data computation with limited resources, including algorithms for modern parallel and distributed systems, sublinear-time algorithms, and streaming. Onak received his master’s degree from the University of Warsaw and his Ph.D. from the Massachusetts Institute of Technology. Before joining IBM, he was a Simons Postdoctoral Fellow at Carnegie Mellon University.