Live Webinar: June 25th

Join our Builder's Roundtable to learn all about fine-tuning LLMs

Efficient
GenAI inference

Build and scale production applications on the latest optimized models and fine tunes

Try APIs Free

Self-hosted Demo

Text Gen API

LLMs for chat, summarization, and structured output

Media Gen API

Diffusion models for stunning image and video

OctoStack

Turnkey GenAI stack in your environment

Innovators Choose OctoAI

“Working with the OctoAI team, we were able to quickly evaluate the new model, validate its performance through our proof of concept phase, and move the model to production. Mixtral on OctoAI serves a majority of the inferences and end player experiences on AI Dungeon today.”

Read story

Nick Walton

CEO & Co-Founder Latitude

Read story

GenAI production stack: SaaS or in your environment

The foundation of OctoAI is systems and compilation technologies we’ve pioneered, like XG Boost, TVM, and MLC, giving you an enterprise system that runs in our SaaS or your private environment.

Diagram of OctoAI GenAI systems stack showing OctoAI's solutions, models, and AI serving stack powered by broad hardware

Enterprise-grade inference

Achieve AI Independence

Free yourself from any single model, model provider, cloud, or hardware setup.

speedometer icon for speed and performance in OctoAI peach

Optimize Performance & Cost

Run GenAI inference at the lowest price and latency on our optimized serving layer.

Future Proof Applications

Rapidly iterate with new models and infrastructure without rearchitecting anything.

Customize Freely

Mix and match models, fine tunes, and AI assets at the model serving layer.

SOC 2 Type II certified

Your data security and privacy is a top priority for OctoAI. We continually invest in security capabilities and practices in our platform and processes.

Learn more

OctoAI is SOC 2 Type II certified as of fall 2023

New Solution

OctoStack from OctoAI: GenAI in your environment

OctoStack is a turnkey GenAI serving stack to run your optimized models in your environment on your GPUs. Lower your total cost of ownership and deploy models with greater agility while ensuring data privacy.

Learn more

Overview diagram of how OctoStack by OctoAI would work in your infrastructure environment

What’s New at OctoAI

Customer & Product Updates

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

May 9, 2024

7 minutes

Visit the blog

Latest Models

Hermes 2 Pro Llama 3

The first fine-tune from Nous Research, and has a updated version of OpenHermes 2.5 Dataset. This model is great for conversational and reasoning tasks for your AI apps. Function calling support for this model is coming soon!

Chat

Coding

Experimental

Llama 3 Instruct

The most recent release from Meta. This model is instruction tuned for chat and is optimized for helpfulness and safety. This model is performing well above common benchmarks for open-source chat models.

Chat

Coding

Mixtral-8x22B Instruct

Strong mathematics and coding capabilities, with a 64K tokens context window to allow for precise information recall from large documents and can be used for chat, question and answer, and other instruction based tasks. Fluent in English, French, Italian, German, and Spanish.

Chat

Coding

Mixtral 8x22B fine-tuned

Over the coming weeks we will be utilizing the newest and strongest fine-tunes from the community. Come back often to see what new fine-tune will be here for testing. After testing several fine-tuned versions of this model we will select the top performing to persistently host on OctoAI.

Chat

Experimental

See all models

Customer & Product Updates

A Framework for Selecting the Right LLM

Jun 11, 2024

4 minutes

GitView launches AI code review analysis for engineering teams using OctoAI

Jun 4, 2024

2 minutes

30 Days of Llama 3: Newest Member of the Herd is Living up to the Hype

May 17, 2024

3 minutes

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

May 9, 2024

7 minutes

Visit the blog

Demos & Webinars

Selecting the right GenAI model for production

Watch our on-demand webinar as our engineers review all steps of model evaluation, testing, when to use checkpoints vs LoRAs, and how to get the best results.

How to Bring GenAI to your Datastore

Watch and learn how our engineer experts build data workflows on your Snowflake data with OctoStack, leverage RAG, and integrate GenAI into your data pipeline.

Webinar

On-demand

OctoStack

OctoStack: GenAI in your Environment

Watch and learn how OctoStack delivers a turnkey GenAI stack in your environment to run your models next to your data — privately and securely.

Webinar

On-demand

OctoStack

Build a GenAI video generation pipeline

Learn how to generate amazing looking 1 minute long videos on open source GenAI models on OctoAI all for under $3.

LLM

Text-to-image

Image to Video

View all demos & webinars

Your choice of models and fine tunes

Start building in minutes. Gain the freedom to run on any model or checkpoint on our efficient API endpoints.

Try APIs Free

Self-hosted Demo

octoai asset create --name checkpoint-panda --upload-from-hf-repo NeuralNovel/Panda-7B-v0.1 \
--engine text/mistral-7b-instruct \
--data-type fp16 \
--format safetensors \
--type checkpoint \
--transfer-api sts

EfficientGenAI inference

Text Gen API

Media Gen API

OctoStack

Innovators Choose OctoAI

GenAI production stack: SaaS or in your environment

Enterprise-grade inference

Achieve AI Independence

Optimize Performance & Cost

Future Proof Applications

Customize Freely

SOC 2 Type II certified

OctoStack from OctoAI: GenAI in your environment

What’s New at OctoAI

Customer & Product Updates

A Framework for Selecting the Right LLM

GitView launches AI code review analysis for engineering teams using OctoAI

30 Days of Llama 3: Newest Member of the Herd is Living up to the Hype

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

Latest Models

Hermes 2 Pro Llama 3

Llama 3 Instruct

Mixtral-8x22B Instruct

Mixtral 8x22B fine-tuned

Customer & Product Updates

A Framework for Selecting the Right LLM

GitView launches AI code review analysis for engineering teams using OctoAI

30 Days of Llama 3: Newest Member of the Herd is Living up to the Hype

Fine-tuned Mistral 7B delivers over 60x lower costs and comparable quality to GPT 4

Demos & Webinars

Selecting the right GenAI model for production

How to Bring GenAI to your Datastore

OctoStack: GenAI in your Environment

Build a GenAI video generation pipeline

Your choice of models and fine tunes

Efficient
GenAI inference