Research

I believe the future of computing would be specialized and distributed to enable intelligence at scale and to be ubiquitous. To do that, my research spans across the stack, including efficient learning methods, hardware-software codesign, parallelism and efficient compilation, systems for machine learning, etc. I published at machine learning and computer systems conferences.

Kevin: Multi-Turn RL for Generating CUDA Kernels

Carlo Baronio*, Pietro Marsella*, Ben Pan*, Simon Guo, Silas Alberti

EXAIT & ES-FoMo-III Workshop at International Conference on Machine Learning (ICML), 2025

Multi-Turn RL Training for Generating CUDA Kernels

KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini

* indicates equal contribution

International Conference on Machine Learning (ICML), 2025
DL4C (Best Paper) & SSI-FM Workshop at International Conference on Learning Representations (ICLR), 2025

Benchmark and environment to evaluate LLMs' ability to generate efficient GPU Kernels

arxiv

BAM! Just Like that, Simple and Efficient Parameter Upcycling for Mixture of Experts

Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli

Conference on Neural Information Processing Systems (NeurIPS), 2024
NGSM (Spotlight) and ES-FoMo-II Workshop at International Conference on Machine Learning (ICML), 2024

Upcycling MoE with Mixture-of-Attention for more efficient MoE pre-training

arxiv

Parallelism in Bundle Adjustment for SLAM

Simon Zirui Guo, Yakun Sophia Shao

ACM Student Research Competition at IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

Speed up SLAM by exploiting structural sparsity and custom kernels on Tensor Cores

Extended Abstract

Gemmini: An Open-Source, Full-System DNN Accelerator Design and Evaluation Platform

Hasan Genc, Seah Kim, Vadim Vadimovich Nikiforov, Simon Zirui Guo, Borivoje Nikolić, Krste Asanović, Yakun Sophia Shao

First Workshop on Open-Source Computer Architecture Research (OSCAR) at ACM/IEEE International Symposium on Computer Architecture (ISCA), 2022

Design Space Exploration for DNN accelerators across the stack

Workshop Presentation

D3: A Dynamic Deadline-Driven Approach for Building Autonomous Vehicles

Ionel Gog, Sukrit Kalra, Peter Schafhalter*, Joseph E. Gonzalez, Ion Stoica

* worked as undergraduate research assistant for author

In Proceedings of European Conference on Computer Systems (EuroSys), 2022

OS for self-driving cars and robots

Hi, I am Simon.

Resume CV

Research

Kevin: Multi-Turn RL for Generating CUDA Kernels

KernelBench: Can LLMs Write Efficient GPU Kernels?

arxiv

BAM! Just Like that, Simple and Efficient Parameter Upcycling for Mixture of Experts

arxiv

Parallelism in Bundle Adjustment for SLAM

Extended Abstract

Gemmini: An Open-Source, Full-System DNN Accelerator Design and Evaluation Platform

Workshop Presentation

D3: A Dynamic Deadline-Driven Approach for Building Autonomous Vehicles

ACM