Skip to content



The performance analytics of Colossal-AI demonstrate that our software is the fastest and most cost efficient solution for your deep learning infrastructure needs.

Top performance results in a nutshell


faster training time


inference acceleration


larger batch sizes


lower GPU memory consumption


larger model size on same hardware


longer sequence length


saved GPU resources


fine-tuning speedup


suffices for model development



PyTorch_logo_black PyTorch is a machine learning framework for Python. 120x
larger model sizes

deepspeed-logo-uppercase-bold-white-1.15-1 Microsoft DeepSpeed is a deep learning optimization library. 3x
higher throughput

NVDA_BIG NVIDIA NeMo Megatron is a framework to build and deploy LLMs. 5x
faster training


model-as-brain-green ViT model

The Vision Transformer, or ViT, is a model for image classification that employs a Transformer-like architecture over patches of the image. ViTs are being adopted in a wide range of computer vision tasks, from image classification to object detection and segmentation.

Colossal-AI vs. Megatron

  • Achieve 14x larger batch sizes with Colossal-AI
  • and 5x faster training for ViT
Scaling ViT with GPU RAM & Throughput

model-as-brain-green GPT-3 model

Using text on the internet, GPT-3 is trained to generate realistic human text. GPT-3 has been used to create articles, poetry, stories, news reports and dialogue using just a small amount of input text that can be used to produce large amounts of quality copy.

Colossal-AI vs. Megatron

  • You can save 50% of your GPU resources with Colossal-AI
  • and achieve a 10.7% acceleration
Performance on GPT-3
Colossal-AI for GPT-3

model-as-brain-green GPT-2 model

GPT-2 is an unsupervised deep learning transformer-based language model created by OpenAI back in February 2019 for the single purpose of predicting the next word(s) in a sentence. GPT-2 is an acronym for “Generative Pretrained Transformer 2”.

Colossal-AI vs. Megatron

  • You benefit from a 11x lower GPU memory consumption with Colossal-AI
  • and superlinear scaling efficiency due to its tensor parallelism
Scaling GPT-2 with TFLOPs
Scaling GPT-2 with GPU RAM

Colossal-AI vs. PyTorch

  • Scale to a 24x larger model size on the same hardware
Scaling GPT-2 with Model Size-1

Colossal-AI vs. DeepSpeed

  • Benefit from a 3x speedup on the same computing devices
Scaling GPT-2 with Throughput

model-as-brain-greenBERT model

BERT is an open source machine learning framework for natural language processing (NLP). BERT is designed to help computers understand the meaning of ambiguous language in text by using surrounding text to establish context.

Colossal-AI vs. Megatron

  • Colossal-AI propels you to a 2x faster training
  • or enables you to run a 50% longer sequence length
Scaling BERT with Max Batch Size
Scaling BERT with Pipeline Parallelism
Scaling BERT with Sequence Length

model-as-brain-green GPT-3 model

Colossal-AI vs. NVIDIA FastTransformer

  • Colossal-AI enables you to achieve 50% inference acceleration on the same hardware infrastructure
GPT-3 Inference, Padding = 128, TP = 2 grey bg
GPT-3 Inference, Padding = 128, TP = 4 grey bg

model-as-brain-greenOPT model

Meta AI has introduced a large language model trained on billions of parameters called OPT (Open Pre-trained Transformers). It can be used to generate creative text, solve simple math problems, answer reading comprehension questions.

Colossal-AI vs. DeepSpeed

  • With Colossal-AI, a 45% speedup fine-tuning OPT is possible
  • at low cost in lines
1 GPU Performance on OPT 01
8 GPU Performance on OPT

Don't wait, accelerate!

Speed up and scale deep learning with Colossal-AI.

Try open source