I'm a passionate ML Compiler Engineer currently working at Qualcomm, specializing in developing high-performance AI inference solutions for edge devices. With a strong background in compiler construction, machine learning, and systems programming, I focus on optimizing AI workloads for on-device execution.
Outside of work, I like (but fail) to keep life balanced with my hobbies:
- 🍜 Major foodie
- 🏋️♂️ Regular at the gym
- ✈️ Always planning my next trip
- ⚽ Sports enthusiast (following almost every sport!)
- ♟️ Former competitive chess player
Work Experience

- One of the core contributors to a new MLIR-based compiler stack for Qualcomm’s AI inference on Hexagon NPUs. Played a key role in prototyping phases and benchmarking of the Triton-based compilation path.
- Part of the team that extended a sequence of MLIR passes including multi-level tiling, fusion, vectorization, and multithreading, culminating in lowering to llvm for NPU backend codegen.
- Working on tiling algorithms used by Qualcomm's proprietary Hexagon NPU Compiler to facilitate on-device AI inference.
- Periodically handle critical performance issues faced by customers while running LLMs on Qualcomm NSPs.

ML Compiler Backend Development
- Collaborated in the development of an AI compiler for DeGirum's hardware accelerator, focusing on performance and extending the reach of compatible models.
- Implemented SIMD parallelism, vector processing, and pipeline strategies to minimize data movement and memory footprint.
- Expanded the existing compiler to support Large Language Models (LLMs) through in-depth understanding of Transformer-based architectures.
- Benchmarked CPU cores using FPGA to offload certain operations in real-time execution of models.
ML Deployment Infrastructure
- Compiled ML models using the DeGirum compiler and orchestrated their seamless integration within the DeGirum Ecosystem.
- Leveraged Flask API for robust and efficient integration, ensuring smooth communication between components.
- Developed comprehensive unit tests utilizing the PyTest framework.
- Implemented CI/CD pipelines to establish an automated workflow for deployment and testing processes.

Machine Learning Team (08/16/22 - 05/30/23)
- Worked extensively with the PyTorch framework to compile deep learning models on company specific compilers.
- Facilitated Quantization for models from FP32 to UInt8 precision to expand the variety of model choices.
- Successfully ported 132 models (both quant and float) from the popular timm repository to DeGirum's Model Zoo.
- Created custom Python modules and packages to facilitate code reuse and maintainability.
Embedded SW Team (02/01/22 - 08/15/22)
- Built a UART interface monitor using RISC-V Assembly for field engineers as a debug tool.
- Developed ROM code routines for MBIST operations.
- Redesigned existing MBIST testing code to decrease test completion time by 40%.
- Implemented a series of embedded C tests for Pre-Silicon RTL validation.
- Refactored and expanded functionality of the Verilog Test Bench.
Education

Relevant Coursework: Compiler Construction and Implementation of Programming Languages, Virtualization, NLP, Reinforcement Learning, Generative AI in Deep Learning, Optimization

Awards: Summa Cum Laude Honors, Dean's List (all semesters), Chancellor's Scholarship ($16,000/year)
Relevant Coursework: Operating Systems, Computer Networking, Computer Systems, Algorithms and Data Structures, Machine Learning, Programming in JavaScript, Regression Analysis
Publications
Tensor Evolution: A Framework for Fast Evaluation of Tensor Computations using Recurrences
Authors: Javed Absar, Samarth Narang, Muthu Baskaran
Conference: 6th Compilers for Machine Learning Workshop, at CGO 2025
Software Projects
LLVM Compiler Infrastructure
Open Source Contributor with several contributions to MLIR, Clang, LLVM Optimizations, and Flang.
CaptureMyHippo
A non-conventional social media application that allows users to record their memories for family and loved ones. Built with a serverless architecture using AWS services.
NoFinishLine
iOS application for tracking workouts and searching an integrated workout database with levels, categories, and instructions. Features a RESTful backend with AWS integration.
CryptoPriceAlert
Mobile application for cryptocurrency traders to set price alerts on their favorite assets. Integrates with major trading platforms' APIs for real-time price data.