Samarth Narang

Samarth Narang

ML Compiler Engineer & Software Developer

I'm a passionate ML Compiler Engineer currently working at Qualcomm, specializing in developing high-performance AI inference solutions for edge devices. With a strong background in compiler construction, machine learning, and systems programming, I focus on optimizing AI workloads for on-device execution.


Outside of work, I like (but fail) to keep life balanced with my hobbies:

  • 🍜 Major foodie
  • 🏋️‍♂️ Regular at the gym
  • ✈️ Always planning my next trip
  • ⚽ Sports enthusiast (following almost every sport!)
  • ♟️ Former competitive chess player

Work Experience

ML Compiler Engineer
Qualcomm
January 2024 - Present
  • One of the core contributors to a new MLIR-based compiler stack for Qualcomm’s AI inference on Hexagon NPUs. Played a key role in prototyping phases and benchmarking of the Triton-based compilation path.
  • Part of the team that extended a sequence of MLIR passes including multi-level tiling, fusion, vectorization, and multithreading, culminating in lowering to llvm for NPU backend codegen.
  • Working on tiling algorithms used by Qualcomm's proprietary Hexagon NPU Compiler to facilitate on-device AI inference.
  • Periodically handle critical performance issues faced by customers while running LLMs on Qualcomm NSPs.
Machine Learning Engineer
DeGirum
June 2023 - January 2024

ML Compiler Backend Development

  • Collaborated in the development of an AI compiler for DeGirum's hardware accelerator, focusing on performance and extending the reach of compatible models.
  • Implemented SIMD parallelism, vector processing, and pipeline strategies to minimize data movement and memory footprint.
  • Expanded the existing compiler to support Large Language Models (LLMs) through in-depth understanding of Transformer-based architectures.
  • Benchmarked CPU cores using FPGA to offload certain operations in real-time execution of models.

ML Deployment Infrastructure

  • Compiled ML models using the DeGirum compiler and orchestrated their seamless integration within the DeGirum Ecosystem.
  • Leveraged Flask API for robust and efficient integration, ensuring smooth communication between components.
  • Developed comprehensive unit tests utilizing the PyTest framework.
  • Implemented CI/CD pipelines to establish an automated workflow for deployment and testing processes.
Software Engineering Co-op
DeGirum
May 2022 - May 2023

Machine Learning Team (08/16/22 - 05/30/23)

  • Worked extensively with the PyTorch framework to compile deep learning models on company specific compilers.
  • Facilitated Quantization for models from FP32 to UInt8 precision to expand the variety of model choices.
  • Successfully ported 132 models (both quant and float) from the popular timm repository to DeGirum's Model Zoo.
  • Created custom Python modules and packages to facilitate code reuse and maintainability.

Embedded SW Team (02/01/22 - 08/15/22)

  • Built a UART interface monitor using RISC-V Assembly for field engineers as a debug tool.
  • Developed ROM code routines for MBIST operations.
  • Redesigned existing MBIST testing code to decrease test completion time by 40%.
  • Implemented a series of embedded C tests for Pre-Silicon RTL validation.
  • Refactored and expanded functionality of the Verilog Test Bench.

Education

Master of Science, Computer Science
University of Texas at Austin
August 2024 - August 2025
GPA: 3.94

Relevant Coursework: Compiler Construction and Implementation of Programming Languages, Virtualization, NLP, Reinforcement Learning, Generative AI in Deep Learning, Optimization

Bachelor of Science, Computer Science and Mathematics
University of Massachusetts Amherst
August 2020 - May 2023
GPA: 3.98

Awards: Summa Cum Laude Honors, Dean's List (all semesters), Chancellor's Scholarship ($16,000/year)

Relevant Coursework: Operating Systems, Computer Networking, Computer Systems, Algorithms and Data Structures, Machine Learning, Programming in JavaScript, Regression Analysis

Publications

Tensor Evolution: A Framework for Fast Evaluation of Tensor Computations using Recurrences

Authors: Javed Absar, Samarth Narang, Muthu Baskaran

Conference: 6th Compilers for Machine Learning Workshop, at CGO 2025

Software Projects

LLVM Compiler Infrastructure

Open Source Contributor with several contributions to MLIR, Clang, LLVM Optimizations, and Flang.

CaptureMyHippo

A non-conventional social media application that allows users to record their memories for family and loved ones. Built with a serverless architecture using AWS services.

Skills: Flutter, Python, Flask, AWS Lambda, AWS DynamoDB

NoFinishLine

iOS application for tracking workouts and searching an integrated workout database with levels, categories, and instructions. Features a RESTful backend with AWS integration.

Skills: Flutter, Python, Flask, AWS Lambda, AWS DynamoDB, Boto3

CryptoPriceAlert

Mobile application for cryptocurrency traders to set price alerts on their favorite assets. Integrates with major trading platforms' APIs for real-time price data.

Skills: Flutter, Python, Flask, Firebase

Technical Skills

Proficient

C C++ Python MLIR LLVM PyTorch Assembly Git Unix Flutter

Familiar

Java RISC-V ISA SQL TensorFlow