Performance

Practical patterns, zero-copy views, and safe mutation loops for faster LLVM passes. Table of Contents Why LLVM ships its own containers Core Value Types (which own nothing) StringRef ArrayRef / MutableArrayRef Twine Small-Size Optimized Containers SmallVector<T, N> SmallString SmallPtrSet<T*, N> “Hashy” Workhorses DenseMap<KeyT, ValueT> / DenseSet Custom Keys providing DenseMapInfo<Key> with: StringMap Erasing While Iterating Arenas, Uniquing, and more BumpPtrAllocator FoldingSet Error handling the LLVM way IR-Centric Must-Knows Traversal Idioms Mutation Safety CFG Helpers Range and Iterator (Halloween!) Candy Choosing the Right Data Structure (A Decision Matrix) Common “Shooting Yourself in the Foot” Pitfalls Micro-Benchmarks Compile and Run Conclusion When to use SmallVector vs std::vector, why DenseMap feels like cheating, how StringRef & ArrayRef avoid copies, and the iterator tricks that make LLVM code elegant and fast. ...

SIMD, warps, occupancy, coalescing, shared memory, spills, and matrix units—mapped to real compiler decisions. Table of Contents TL;DR 1. Execution Model, Decoded SIMT vs SIMD (why is it confusing?) Warps/Wavefronts CTA (Cooperative Thread Array) / Workgroup Occupancy (It’s not a religion) 2. Memory Hierarchy (where performance is won and lost) Coalesced Access (the golden rule) Shared Memory (on-chip scratchpad) Spills (the invisible tax) 3. Math Units: Matrix Engines, Precision, and Shapes 4. Scheduling and Latency Hiding Warp Scheduling Divergence and Predication 5. Vendor Term Crosswalk 6. Checklists you will actually use 7. Quick Reference Cheat Sheet GPUs aren’t mysterious - just picky. Most performance cliffs are not about the math; they’re about how warps step, how memory is fetched, and how often the registers spill. This post decodes the jargon; and to be candid, it is me “spilling” my notes, trying to explain myself. ...

Performance

Data Structure and Iterator Kung Fu in LLVM

Demystifying GPU Terminology: A Compiler Engineer’s Field Guide