2021-03-01 | Peilin Zhong:Processing Massive Datasets via Efficient Data Representations



The computation on massive data has been the foundation of many important breakthroughs in machine learning and data mining. In this talk, I will show how to process massive data via efficient data representations.

As a concrete example, I will present and dive into a recent result (STOC'20) that develops a (1+\varepsilon)-approximate parallel algorithm for computing shortest paths in undirected graphs, using \poly(\log n) parallel time and m\poly(\log n) work/total space for n-nodes m-edges graphs. For (1+\varepsilon)-approximation, all prior algorithms with \poly(\log n) parallel time require at least \Omega(mn^{c}) work/total space for some constant c>0. Improving this long-standing upper bound obtained by Cohen (STOC'94) has been open for more than 25 years. We developed several new tools of independent interest. One of them is an efficient representation of the input graph --- low hop emulator --- a \poly(\log n)-approximate emulator graph in which every shortest path has at most O(\log\log n) hops (edges).

In the talk, I will also give a brief overview of how to use efficient data representations to develop algorithms for other modern machine learning tasks such as training generative adversarial networks.



31  10:00--11:00



Peilin Zhong is a fifth year Ph.D. student in computer science at Columbia University (Under supervision of Alexandr Andoni, Clifford Stein, and Mihalis Yannakakis). He was part of the Yao Class at Tsinghua University and graduated in 2016 with a bachelor’s degree in engineering. He is one of the recipients of the 2019 Google PhD fellowship.

His research interests are mainly in algorithm design and analysis. In particular, he is interested in parallel and massively parallel algorithms, sketching, streaming algorithms, graph algorithms, machine learning, high dimensional geometry, metric embedding, and other algorithms related to large-scale data computation.



Zoom ID: 65928464685