Back to Projects
Transformers.cpp
Reconstructing modern multi & mix modal Transformer architectures with LibTorch in C++ without Python dependencies.

Overview
A C++ library for Transformer based LLMs, featuring multi & mix modal support, optimized attention, and hybrid CPU and GPU inference for high-speed, low-memory deployment.
Key Features
1
Manual LibTorch implementation of Llama, Llava, Mistral, DeepseekV3, LlavaNext, and CLIPVision
2
Optimized attention with MQA and GQA
3
Dynamic Key Value cache support
4
Multi-modal & mix-modal hybrid CPU and GPU inference
5
Flexible tokenization BPE, SentencePiece, HuggingFace-compatible
6
Supports Vision Trasnformers using OpenCV
Technologies
C++LibTorchTransformersAtttentionLLMsTokenizersInference