TITLE: Design Principles For Cluster Computing Systems
Cluster computing systems – i.e., collections of parallel interconnected machines and the infrastructure software running atop them – drive many important applications such as search, data analytics, and drug discovery, and are at the heart of key innovations in computer science and beyond.
Today, these cluster systems are facing constant disruption. The emergence of new complex analytics algorithms, performance-hungry distributed applications, and new cluster computing use-cases is stretching cluster systems beyond the original targets they were designed to meet. The growing mismatch between application needs and cluster system designs is hurting application performance and robustness, and curtailing future application innovation.
In this talk, I will describe four general design principles for cluster systems that enable them to offer excellent performance to applications and support innovation while gracefully accommodating disruptions. These principles shed light on the need for: (a) software stacks that enable the execution strategy for a cluster application (e.g., when and where to execute application components) to adapt over time; (b) systems that manage cluster applications’ memory (or “intermediate state”) as a separate first-class entity; (c) domain-specific abstractions and tools to ensure cluster networks provably meet applications’ requirements; and (d) principled division of labor between cluster software and hardware to accelerate application performance. I will illustrate the importance of these principles using examples of clusters systems we’ve built based upon them.
Aditya Akella is a professor of computer sciences and an H. I. Romnes Faculty Fellow at UW-Madison. He received his B. Tech. from IIT Madras (2000), and Ph.D. from CMU (2005). His research spans the computer systems area, with a focus on formal methods applied to networks, hardware acceleration, serverless computing, and systems for big data. Akella’s research has been incorporated into several real-world systems, including content distribution networks, data analytics stacks, and production datacenter networks. Akella has received many awards including Professor-of-the-Year (2017 and 2019), Vilas Associate (2017), IRTF Applied Networking Research Prize (2015), ACM SIGCOMM Rising Star Award (2014), NSF CAREER Award (2008), and several best paper awards. Akella co-leads CloudLab (http://cloudlab.us), a testbed for foundational cloud research.