All LLM Models

Browse 12 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Featured only

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

Mistral 7B Instruct v0.2

Mistral AI · 7.2B · runs from 3.6 GB

1.4M 3.2K

Mistral 7B Instruct v0.2 is a 7.2B-parameter open language model from Mistral AI in the Mistral family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Mistral 7B Instruct v0.3

Mistral AI · 7.2B · runs from 2.7 GB

3.1M 2.6K

Mistral 7B Instruct v0.3 is the latest instruction-tuned release of Mistral AI's original 7-billion-parameter model, delivering meaningful improvements in instruction following, function calling, and multilingual support over its predecessors. With an extended 32K-token vocabulary and refined chat capabilities, v0.3 remains one of the most capable sub-10B models available. At 7.2 billion parameters it sits comfortably in the sweet spot for local inference, running well on GPUs with 6–8 GB of VRAM at full precision and even on 4 GB cards with 4-bit quantization. It is an excellent default choice for anyone getting started with local LLMs who wants strong conversational performance without heavy hardware.

Chat

Mistral Nemo Instruct 2407

Mistral AI · 12.2B · runs from 4.8 GB

451.4K 1.7K

Mistral Nemo Instruct 2407 is a 12.2B-parameter open language model from Mistral AI in the Mistral family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Mistral Small 24B Instruct 2501

Mistral AI · 23.6B · runs from 7.8 GB

56.8K 957

Mistral Small 24B Instruct is Mistral AI's January 2025 release targeting the mid-range parameter sweet spot. At 24 billion parameters it sits between lightweight 7B models and heavier 70B-class offerings, delivering strong instruction-following, reasoning, and coding performance without demanding top-tier hardware. This model fits comfortably on a single GPU with 16–24 GB of VRAM at common quantization levels, making it an attractive option for users with cards like the RTX 4090 or RTX 3090 who want a noticeable step up from 7B models. It strikes an appealing balance between quality and resource requirements for serious local use.

Chat

Mistral Small 3.2 24B Instruct 2506

Mistral AI · 24.0B · runs from 7.3 GB

588.9K 593

Mistral Small 3.2 24B Instruct 2506 is a 24.0B-parameter open language model from Mistral AI in the Mistral family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Mistral 7B Instruct v0.1

Mistral AI · 7B · runs from 3.5 GB

448.7K 1.8K

Mistral 7B Instruct v0.1 was the first instruction-tuned variant of the original Mistral 7B, fine-tuned for conversational and instruction-following tasks. While it has since been superseded by v0.2 and v0.3, it remains a solid lightweight chat model and an important milestone in the open-weight model ecosystem. Its hardware requirements are identical to the base Mistral 7B, running smoothly on GPUs with as little as 6 GB of VRAM when quantized. Users seeking the best Mistral 7B experience should generally prefer the newer v0.3 release, but v0.1 is still useful for reproducibility and benchmarking purposes.

Chat

Mixtral 8x7B Instruct v0.1

Mistral AI · 46.7B · runs from 20.4 GB

806.7K 4.7K

Mixtral 8x7B Instruct v0.1 is Mistral AI's flagship Mixture-of-Experts model, combining eight expert networks of 7 billion parameters each for a 46.7B total weight count while activating only about 12.9 billion parameters per token. This sparse architecture delivers performance that rivals much larger dense models at a fraction of the inference cost, excelling across reasoning, code generation, and multilingual tasks. Because the full weights must still be loaded into memory, you will need around 24–48 GB of VRAM depending on quantization level, making it best suited for multi-GPU desktop setups or high-VRAM workstation cards. If your hardware can accommodate it, Mixtral offers one of the best performance-per-active-parameter ratios available for local deployment.

Chat

Mistral 7B v0.1

Mistral AI · 7B · runs from 3.5 GB

539.9K 4.1K

Mistral 7B v0.1 is the original base model from Mistral AI that helped reshape expectations for small open-weight language models when it launched in late 2023. As a pretrained foundation model without instruction tuning, it is designed for fine-tuning, research, and custom downstream tasks rather than direct conversational use. With 7 billion parameters and support for grouped-query attention and sliding-window attention, it remains a popular starting point for practitioners building specialized models. Its modest VRAM requirements of roughly 6 GB at 4-bit quantization keep it accessible on a wide range of consumer GPUs.

Chat

Mixtral 8x7B v0.1

Mistral AI · 46.7B · runs from 19.8 GB

58.2K 1.8K

Mixtral 8x7B v0.1 is a 46.7B-parameter open language model from Mistral AI in the Mixtral family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Magistral Small 2506

Mistral AI · 23.6B · runs from 7.2 GB

34.0K 608

Magistral Small 2506 is a 23.6B-parameter open language model from Mistral AI in the Mistral family. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Mistral Small Instruct 2409

Mistral AI · 22.2B · runs from 10.2 GB

127.5K 393

Mistral Small Instruct 2409 is a 22.2B-parameter open language model from Mistral AI in the Mistral family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Mixtral 8x22B v0.1

Mistral AI · 140.6B · runs from 60.5 GB

4.7K 239

Mixtral 8x22B v0.1 is a 140.6B-parameter open language model from Mistral AI in the Mixtral family. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

All LLM Models

Understanding LLM VRAM Requirements

Model List

Mistral 7B Instruct v0.2

Mistral 7B Instruct v0.3

Mistral Nemo Instruct 2407

Mistral Small 24B Instruct 2501

Mistral Small 3.2 24B Instruct 2506

Mistral 7B Instruct v0.1

Mixtral 8x7B Instruct v0.1

Mistral 7B v0.1

Mixtral 8x7B v0.1

Magistral Small 2506

Mistral Small Instruct 2409

Mixtral 8x22B v0.1