TPUs Go Brrr
I am a Computer Science Ph.D. student at Stanford University. Currently during my first year, I am rotating with Prof. Azalia Mirhoseini in the Scaling Intelligence Lab, Prof. Christopher Ré in the Hazy Research Lab, and Prof. Tatsunori Hashimoto.
I studied Electrical Engineering and Computer Sciences during my undergrad at Berkeley. I was luckily involved in the SLICE lab working with Prof. Sophia Shao, and RISE lab working with Prof. Ion Stoica.
I am broadly interested in computer systems and machine learning. Most recently, I spent some time pre-training language models at Cohere. Previously, I designed GPUs at Apple, scaled out distributed systems at Anyscale, and make cars drive themselves at NVIDIA DRIVE.
If you are interested in my journey, please check out the rest of this site. Feel free to contact me at simonguo [@] stanford dot edu.
I believe the future of computing would be specialized and distributed to enable intelligence at scale and to be ubiquitous. To do that, my research spans across the stack, including efficient learning methods, hardware-software codesign, parallelism and efficient compilation, systems for machine learning, etc. I published at machine learning and computer systems conferences.
Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini
* indicates equal contribution
To appear at International Conference on Machine Learning (ICML), 2025
DL4C (Best Paper) & SSI-FM Workshop at International Conference on Learning Representations (ICLR), 2025
Benchmark and environment to evaluate LLMs' ability to generate efficient GPU Kernels
Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli
Conference on Neural Information Processing Systems (NeurIPS), 2024
NGSM (Spotlight) and ES-FoMo-II Workshop at International Conference on Machine Learning (ICML), 2024
Upcycling MoE with Mixture-of-Attention for more efficient MoE pre-training
Simon Zirui Guo, Yakun Sophia Shao
ACM Student Research Competition at IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022
Speed up SLAM by exploiting structural sparsity and custom kernels on Tensor Cores
Hasan Genc, Seah Kim, Vadim Vadimovich Nikiforov, Simon Zirui Guo, Borivoje Nikolić, Krste Asanović, Yakun Sophia Shao
First Workshop on Open-Source Computer Architecture Research (OSCAR) at ACM/IEEE International Symposium on Computer Architecture (ISCA), 2022
Design Space Exploration for DNN accelerators across the stack
Ionel Gog, Sukrit Kalra, Peter Schafhalter*, Joseph E. Gonzalez, Ion Stoica
* worked as undergraduate research assistant for author
In Proceedings of European Conference on Computer Systems (EuroSys), 2022
OS for self-driving cars and robots