Lumina Framework
Modular benchmarking and profiling framework for multi & mix-modal LLMs across CPU and GPU, includes paper artifacts.

Overview
Lumina orchestrates repeatable LLM workloads over llama.cpp backend and collects counters via profilers, ncu, nsys, perf and papi. Includes automation to sweep sequence lengths, batch sizes and quantization levels, producing publication-quality charts. This work also underpins my first author paper, Beyond the Shadows: A Deep Dive into Profiling Modern Mixed-Modal and Multi-Modal Transformer Models, accepted at SAMOS 2025. The paper includes experiment configurations, reproducibility notes, and traces analyzing cache behavior, attention efficiency, and memory bandwidth across heterogeneous hardware.
Key Features
Reproducible benchmarking with seeded scenarios
GPU/CPU counter collection and aggregation
Quantization and context-length sweeps
Publication-ready plots and exports