oruccakir
Back to Projects

Transformers.cpp

Reconstructing modern multi & mix modal Transformer architectures with LibTorch in C++ without Python dependencies.

Overview

A C++ library for Transformer based LLMs, featuring multi & mix modal support, optimized attention, and hybrid CPU and GPU inference for high-speed, low-memory deployment.

Key Features

1

Manual LibTorch implementation of Llama, Llava, Mistral, DeepseekV3, LlavaNext, and CLIPVision

2

Optimized attention with MQA and GQA

3

Dynamic Key Value cache support

4

Multi-modal & mix-modal hybrid CPU and GPU inference

5

Flexible tokenization BPE, SentencePiece, HuggingFace-compatible

6

Supports Vision Trasnformers using OpenCV

Technologies

C++LibTorchTransformersAtttentionLLMsTokenizersInference