← Back to Projects
Transformers.cpp
C++LibTorchTransformersAtttentionLLMsTokenizersInference
Overview
Reconstructing modern multi & mix modal Transformer architectures with LibTorch in C++ without Python dependencies.
A C++ library for Transformer based LLMs, featuring multi & mix modal support, optimized attention, and hybrid CPU and GPU inference for high-speed, low-memory deployment.
Features
- Manual LibTorch implementation of Llama, Llava, Mistral, DeepseekV3, LlavaNext, and CLIPVision
- Optimized attention with MQA and GQA
- Dynamic Key Value cache support
- Multi-modal & mix-modal hybrid CPU and GPU inference
- Flexible tokenization BPE, SentencePiece, HuggingFace-compatible
- Supports Vision Trasnformers using OpenCV