Lumina Framework
Overview
Modular benchmarking and profiling framework for multi & mix-modal LLMs across CPU and GPU, includes paper artifacts.
Lumina orchestrates repeatable LLM workloads over llama.cpp backend and collects counters via profilers, ncu, nsys, perf and papi. Includes automation to sweep sequence lengths, batch sizes and quantization levels, producing publication-quality charts. This work also underpins my first author paper, Beyond the Shadows: A Deep Dive into Profiling Modern Mixed-Modal and Multi-Modal Transformer Models, accepted at SAMOS 2025. The paper includes experiment configurations, reproducibility notes, and traces analyzing cache behavior, attention efficiency, and memory bandwidth across heterogeneous hardware.
Features
- Reproducible benchmarking with seeded scenarios
- GPU/CPU counter collection and aggregation
- Quantization and context-length sweeps
- Publication-ready plots and exports