All LLM Models

Browse 5 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

Model List

Mixtral 8x7B Instruct v0.1

Mistral AI · 46.7B

693.5K 4.6K

Mixtral 8x7B Instruct v0.1 is Mistral AI's flagship Mixture-of-Experts model, combining eight expert networks of 7 billion parameters each for a 46.7B total weight count while activating only about 12.9 billion parameters per token. This sparse architecture delivers performance that rivals much larger dense models at a fraction of the inference cost, excelling across reasoning, code generation, and multilingual tasks. Because the full weights must still be loaded into memory, you will need around 24–48 GB of VRAM depending on quantization level, making it best suited for multi-GPU desktop setups or high-VRAM workstation cards. If your hardware can accommodate it, Mixtral offers one of the best performance-per-active-parameter ratios available for local deployment.

Chat

Mistral 7B v0.1

Mistral AI · 7B

539.9K 4.1K

Mistral 7B v0.1 is the original base model from Mistral AI that helped reshape expectations for small open-weight language models when it launched in late 2023. As a pretrained foundation model without instruction tuning, it is designed for fine-tuning, research, and custom downstream tasks rather than direct conversational use. With 7 billion parameters and support for grouped-query attention and sliding-window attention, it remains a popular starting point for practitioners building specialized models. Its modest VRAM requirements of roughly 6 GB at 4-bit quantization keep it accessible on a wide range of consumer GPUs.

Chat

Mistral 7B Instruct v0.3

Mistral AI · 7.2B

1.8M 2.5K

Mistral 7B Instruct v0.3 is the latest instruction-tuned release of Mistral AI's original 7-billion-parameter model, delivering meaningful improvements in instruction following, function calling, and multilingual support over its predecessors. With an extended 32K-token vocabulary and refined chat capabilities, v0.3 remains one of the most capable sub-10B models available. At 7.2 billion parameters it sits comfortably in the sweet spot for local inference, running well on GPUs with 6–8 GB of VRAM at full precision and even on 4 GB cards with 4-bit quantization. It is an excellent default choice for anyone getting started with local LLMs who wants strong conversational performance without heavy hardware.

Chat

Mistral 7B Instruct v0.1

Mistral AI · 7B

448.7K 1.8K

Mistral 7B Instruct v0.1 was the first instruction-tuned variant of the original Mistral 7B, fine-tuned for conversational and instruction-following tasks. While it has since been superseded by v0.2 and v0.3, it remains a solid lightweight chat model and an important milestone in the open-weight model ecosystem. Its hardware requirements are identical to the base Mistral 7B, running smoothly on GPUs with as little as 6 GB of VRAM when quantized. Users seeking the best Mistral 7B experience should generally prefer the newer v0.3 release, but v0.1 is still useful for reproducibility and benchmarking purposes.

Chat

Mistral Small 24B Instruct 2501

Mistral AI · 24B

178.1K 955

Mistral Small 24B Instruct is Mistral AI's January 2025 release targeting the mid-range parameter sweet spot. At 24 billion parameters it sits between lightweight 7B models and heavier 70B-class offerings, delivering strong instruction-following, reasoning, and coding performance without demanding top-tier hardware. This model fits comfortably on a single GPU with 16–24 GB of VRAM at common quantization levels, making it an attractive option for users with cards like the RTX 4090 or RTX 3090 who want a noticeable step up from 7B models. It strikes an appealing balance between quality and resource requirements for serious local use.

Chat