oruccakir
Back to Projects

Lumina Framework

Modular benchmarking and profiling framework for multi & mix-modal LLMs across CPU and GPU, includes paper artifacts.

Overview

Lumina orchestrates repeatable LLM workloads over llama.cpp backend and collects counters via profilers, ncu, nsys, perf and papi. Includes automation to sweep sequence lengths, batch sizes and quantization levels, producing publication-quality charts. This work also underpins my first author paper, Beyond the Shadows: A Deep Dive into Profiling Modern Mixed-Modal and Multi-Modal Transformer Models, accepted at SAMOS 2025. The paper includes experiment configurations, reproducibility notes, and traces analyzing cache behavior, attention efficiency, and memory bandwidth across heterogeneous hardware.

Key Features

1

Reproducible benchmarking with seeded scenarios

2

GPU/CPU counter collection and aggregation

3

Quantization and context-length sweeps

4

Publication-ready plots and exports

Technologies

PythonC++llama.cppggmlTransformersBenchmarkingVLMEvalKitProfiling