Jiajia Li is a recently graduated Ph.D. student from the School of Computational Science and Engineering (CSE) with a knack for optimizing tensor algorithms and creating memorable pun-filled paper titles. Li’s latest work, Hierarchical Storage of Sparse Tensors, nicknamed HiCOO, focuses on both of these attributes.
HiCOO was awarded the prestigious title of best student paper at this year’s International Conference for High Performance Computing, Networking, Storage, and Analysis, commonly referred to as Supercomputing (SC).
Winning the best student paper award at this well-established and increasingly competitive conference program, which boasted an impressive 288 technical paper submissions with 68 ultimately being accepted this year, is no small feat.
However, for Li, who holds a second Ph.D. degree in computer architecture, the journey to effectively alter a process is half the battle and half the fun in research.
“For me, it isn’t about the outcome, but rather what I can explore in the process along the way,” she said. “Tensors algorithms are a way to break down data by organizing or viewing it in a certain manner to find what connects the different factors. Sparse tensors are just a specific kind of tensor.”
For sparse tensor computations, there is tension among storage, speed, and flexibility. With traditional methods, you can usually get just two of these three traits. For example, a computation can be fast and flexible at the price of more storage. Or, it can be compact and flexible, but also slow.
“We wanted to create a storage format for tensors that was all three: compact, fast, and flexible,” said Li.
Authors of HiCOO, which includes Li and CSE Associate Professors Rich Vuduc and Jimeng Sun, were inspired by the work of Lawrence Berkeley National Lab Staff Scientist Aydin Buluç which focuses on sparse matrices, the lower dimensional analogue of a tensor.
“We saw an opportunity to extend those ideas to sparse tensors,” explained Li. “We believe HiCOO is smaller, faster, and simpler to update [when the data is changing] than state-of-the-art alternatives. So, we think it will be easier to use in tensor libraries, tools, and data mining applications, which today include e-commerce, healthcare, security, and deep learning, to name a few.”
Vuduc said, “An example of this use comes from Jimeng Sun’s work where his team is trying to find structure in electronic health records (EHRs). There is always a way to organize the data along certain dimensions, for example, a patient could be considered one dimension, the diagnosis they receive would be a second, and their treatment a third.”
As one could imagine, the way in which these dimensions could be combined could yield endless outcomes. Which is why the ability to store data of this size in a way that respects order, quickly computes, and uses as minimal amount of storage as possible is a critically needed function.
“Basically, the formats that existed already could give you two of those areas: All the competition could either have a small data structure which took less storage or if you wanted to access the data in a different order it could but would be slower,” said Li. “HiCOO breaks the mold of its predecessors by proposing a new storage format for sparse tensors [called Hierarchical COOrdinate], that accomplishes all three.”
Read the PNNL press release of HiCOO here.