Speaker: Geoffrey D. Sanders
Staff Scientist, Center for Applied Scientific Computing,
Lawrence Livermore National Laboratory
Thursday, March 15, 2018
Location: Klaus Advanced Computing Building, Room 1116E
Time: 11:00am – 12:00pm
Host: Eisha Nathan
Distributed Algorithms for Pattern Matching in Massive Labelled Graphs
Subgraph pattern matching is important graph analytic with application to social networks, bioinformatics, and general complex network analysis. Broadly speaking, the task is to detect where a small labelled template graph G0 (e.g. 10 vertices) exists within a large labelled background graph G (e.g. billions of vertices). We present a new distributed memory pattern matching approach that aggressively prunes vertices and edges that cannot participate in an exact match. This family of algorithms proceeds iteratively by checking local and semi-local vertex and edge constraints, until the set of possible matching structure is no longer reduced. Under certain assumptions on G0 (unique labels and no edge participating in more than one cycle) we prove this approach yields the subgraph that contains all possible matches. For more general G0, a full check is required for full precision, on the reduced subgraph of G.
Fully precise pattern matching algorithms are fundamentally expensive: known worst-case performance has polynomial complexity with the power being the order of the number vertices in G0. However, when G is sparse and near matches of G0 are rare within G, our distributed algorithms are highly-precise and typically run in near-linear time. We present HPC weak and strong scaling studies at scales orders of magnitude larger than previous pattern matching approaches (background graphs with up to one trillion edges on thousands of processors).
It is difficult to validate pattern matching algorithms on massive real-world graphs, as there is typically no ground truth available. Recently, Kepner et al. proposed nonstochastic Kronecker graph generation to build massive scale-free graphs with ground truth for benchmarking and validating a wide class of graph algorithms. We extend these graph generators to build edge and vertex labelled graphs with ground truth number of specific template patterns and prove several mathematical results that aide in this design.
Geoffrey Sanders is a Staff Scientist in the Computational Mathematics group at the Center for Applied Scientific Computing. Geoff’s current research focus is on developing distributed graph analysis techniques for the efficient computation of challenging data mining tasks that involve topology and metadata jointly.
A native of Reno, Nevada, Geoffrey earned his Bachelor’s degree in Mathematics at the University of California, San Diego in 2002, his Master’s in Applied Mathematics at CU Boulder in 2005, and his PhD at the same institution in 2008.