Lumina Framework

PythonC++llama.cppggmlTransformersBenchmarkingVLMEvalKitProfiling

Overview

Modular benchmarking and profiling framework for multi & mix-modal LLMs across CPU and GPU, includes paper artifacts.

Lumina orchestrates repeatable LLM workloads over llama.cpp backend and collects counters via profilers, ncu, nsys, perf and papi. Includes automation to sweep sequence lengths, batch sizes and quantization levels, producing publication-quality charts. This work also underpins my first author paper, Beyond the Shadows: A Deep Dive into Profiling Modern Mixed-Modal and Multi-Modal Transformer Models, accepted at SAMOS 2025. The paper includes experiment configurations, reproducibility notes, and traces analyzing cache behavior, attention efficiency, and memory bandwidth across heterogeneous hardware.

Features

Reproducible benchmarking with seeded scenarios
GPU/CPU counter collection and aggregation
Quantization and context-length sweeps
Publication-ready plots and exports

GitHub Repository Related paper

Turning Crisis into Code

Had a rough start to the week when my laptop was stolen, and I was panicking about the Lumina project. My supervisor came in clutch and let me use his computer. So I just stayed and worked through the night to catch up. Took this picture in the morning when I was completely wiped, but it felt good to not let the setback win. A crazy night, but worth it.

Early Bird Gets the Code

Just a typical morning in the BSC office. I was the first one there. It was awesome having the whole place to myself to just focus and code before the day got busy. A quiet office and a fresh start were the perfect combo for making progress on the Lumina project.

Making It All Work Together

This was a tough but super important day. I was integrating the VLMEvalKit benchmarking suite into llama.cpp. It was a classic developer moment: messy desk, lots of coffee, and staring at code until it finally makes sense. This was the key step that allowed Lumina to start collecting all the profiling data for my research.

Fresh Air, Fresh Code

This was my usual break spot at BSC. After being stuck inside for hours, deep in the C++ for the Lumina framework, stepping out into the Barcelona sun was a total reset. It's funny how some of the trickiest bugs got solved out here, not at the desk. Just a few minutes of fresh air was all it took to see the problem in a new way. Plus, the campus was just a cool place to hang out.

New Machine, Same Mission

My laptop was stolen mid-project, so I set up a fresh development box and rebuilt everything from scratch. New Linux install, CUDA + Nsight (nsys/ncu), perf/PAPI, llama.cpp, VLMEvalKit—the whole toolchain. I cloned the repos, restored my dotfiles, pulled datasets, and re-ran the baseline sweeps. It turned a bad day into momentum: Lumina kept moving.

Doner Nights in Barcelona

After a long day of working on Lumina, I found myself in line for some Turkish döner in the heart of Barcelona. It became a ritual late nights of coding followed by a walk through the city lights to recharge. The mix of familiar food and a foreign city made the grind feel a little lighter and reminded me why I loved this journey.