oruccakir
← Back to Projects

Transformers.cpp

C++LibTorchTransformersAtttentionLLMsTokenizersInference

Overview

Reconstructing modern multi & mix modal Transformer architectures with LibTorch in C++ without Python dependencies.

A C++ library for Transformer based LLMs, featuring multi & mix modal support, optimized attention, and hybrid CPU and GPU inference for high-speed, low-memory deployment.

Features

  • Manual LibTorch implementation of Llama, Llava, Mistral, DeepseekV3, LlavaNext, and CLIPVision
  • Optimized attention with MQA and GQA
  • Dynamic Key Value cache support
  • Multi-modal & mix-modal hybrid CPU and GPU inference
  • Flexible tokenization BPE, SentencePiece, HuggingFace-compatible
  • Supports Vision Trasnformers using OpenCV
GitHub Repository
Researching the Internals

Researching the Internals

Before starting my internship at BSC, I spent long nights researching the internals of Transformer architectures. The papers pinned on the board behind me are my handwritten diagrams of attention flows, embedding layers, and decoder stacks. This early exploration laid the foundation for the work I would later pursue at BSC and eventually expand into my Transformers.cpp project.

No Rest, Just Transformers.cpp

No Rest, Just Transformers.cpp

The day after I finished my internship at BSC in Barcelona, I flew back to Turkey and immediately started working on the Transformers.cpp project at the Kasırga Microprocessors Lab. No pause, no break, just straight into building C++ implementations of LLaMA, LLaVA, and more. This photo at the tech center marks the very first morning of that new chapter.

First LLaMA Milestone

First LLaMA Milestone

This night marks the moment I successfully implemented the full LLaMA architecture in my Transformers.cpp project. After weeks of debugging attention, RoPE, and the KV cache system, everything finally came together. Once I saw the outputs match Hugging Face’s reference, I took a break to breathe and enjoy the night a small pause after a huge milestone.

Let them

Let them speak.

People talked a lot, but in the end the results stayed and that’s what really mattered.

1 / 7