TPUs Go Brrr

Hi, I am Simon.


I am a Computer Science Ph.D. student at Stanford University. Currently during my first year, I am rotating with Prof. Azalia Mirhoseini in the Scaling Intelligence Lab and Prof. Christopher Ré in the Hazy Research Lab.

I studied Electrical Engineering and Computer Sciences during my undergrad at Berkeley. I was luckily involved in the SLICE lab working with Prof. Sophia Shao, and RISE lab working with Prof. Ion Stoica.

I am broadly interested in computer systems and machine learning. Most recently, I spent some time pre-training language models at Cohere. Previously, I designed GPUs at Apple, scaled out distributed systems at Anyscale, and make cars drive themselves at NVIDIA DRIVE.

If you are interested in my journey, please check out the rest of this site. Feel free to contact me at simonguo [@] stanford dot edu.

Resume CV

Research

I believe the future of computing would be specialized and distributed to enable intelligence at scale and to be ubiquitous. To do that, my research spans across the stack, including efficient learning methods, hardware-software codesign, parallelism and efficient compilation, systems for machine learning, etc. I published at machine learning and computer systems conferences.

KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini

* indicates equal contribution

To appear at DL4C (Best Paper) & SSI-FM Workshop at International Conference on Learning Representations (ICLR), 2025

Benchmark and environment to evaluate LLMs' ability to generate efficient GPU Kernels


arxiv
BAM! Just Like that, Simple and Efficient Parameter Upcycling for Mixture of Experts

Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli

Conference on Neural Information Processing Systems (NeurIPS), 2024
NGSM (Spotlight) and ES-FoMo-II Workshop at International Conference on Machine Learning (ICML), 2024

Upcycling MoE with Mixture-of-Attention for more efficient MoE pre-training


arxiv
Parallelism in Bundle Adjustment for SLAM

Simon Zirui Guo, Yakun Sophia Shao

ACM Student Research Competition at IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

Speed up SLAM by exploiting structural sparsity and custom kernels on Tensor Cores


Extended Abstract
Gemmini: An Open-Source, Full-System DNN Accelerator Design and Evaluation Platform

Hasan Genc, Seah Kim, Vadim Vadimovich Nikiforov, Simon Zirui Guo, Borivoje Nikolić, Krste Asanović, Yakun Sophia Shao

First Workshop on Open-Source Computer Architecture Research (OSCAR) at ACM/IEEE International Symposium on Computer Architecture (ISCA), 2022

Design Space Exploration for DNN accelerators across the stack


Workshop Presentation
D3: A Dynamic Deadline-Driven Approach for Building Autonomous Vehicles

Ionel Gog, Sukrit Kalra, Peter Schafhalter*, Joseph E. Gonzalez, Ion Stoica

* worked as undergraduate research assistant for author

In Proceedings of European Conference on Computer Systems (EuroSys), 2022

OS for self-driving cars and robots


ACM