9 Local LLM Models for Raspberry Pi

13 Min Read
image - 9 Local LLM Models for Raspberry Pi

This article presents a concise overview of the top LLM (Large Language Model) choices that are optimized to run efficiently on Raspberry Pi. In recent times, advancements in quantization and model optimization have made it possible to harness the power of language models on small hardware like the Raspberry Pi.

We will explore five carefully selected LLM models that are not only compatible with Raspberry Pi’s 4GB RAM variant but also offer impressive capabilities despite hardware limitations. Each model will be accompanied by a brief description highlighting its unique features, dataset training, and quantization sizes.

Related: Best SBC for Running LLM

Note: I am not guaranteeing that it will work on your RPI, but I know from people’s experiences which kind of model with what kind of quantization might work on RPI. So, I collected the smallest but most capable ones, but it’s always worth a try.

By running LLMs on Raspberry Pi, users can unlock a world of possibilities. They can create truly intelligent assistants with advanced natural language understanding, personalized chatbots for specific applications, engaging story-telling experiences, language translation tools, and even AI-powered educational aids. With the LLM’s enhanced reasoning and comprehension abilities, the potential for innovative and impactful applications is limitless.


Phi-2-Orange is a two-step fine-tuned model of Phi-2. The first step of fine-tuning uses a broad range of training data, including Open-Orca/SlimOrca-Dedup, migtissera/Synthia-v1.3, LDJnr/Verified-Camel, LDJnr/Pure-Dove, LDJnr/Capybara, and meta-math/MetaMathQA. The second step involves a DPO fine-tune using Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences-cleaned.

Phi-2-Orange uses ChatML as the prompt format, with or without the system instruction. It has shown impressive performance in evaluations done using mlabonne’s useful Colab notebook llm-autoeval.

This model fits well into the article’s topic as it is a state-of-the-art language model that can operate efficiently on 4GB RAM. Its unique two-step fine-tuning process and use of diverse training data make it a compelling choice for those interested in exploring the capabilities of compact yet powerful language models. The detailed description provided here aims to help readers understand the model’s features and capabilities, aiding their search for the best SOTA LLM for 4GB VRAM.

Dolphin 2.6 Phi-2

Based on the Phi-2 architectureRaspberry Pi or any single-board computer (SBC)
Ability to generate text that is highly compliant to any requestsAt least 6 to 8 GB of RAM

The Dolphin 2.6 Phi-2 is a highly advanced language model developed by Eric Hartford and Fernando Fernandes. This model is based on the Phi-2 architecture and has been sponsored by Convai. It has undergone significant improvements in its latest 2.6 version, including a fix to a training configuration issue and the reintroduction of samantha-based empathy data.

One of the key features of this model is its ability to generate text that is highly compliant to any requests, even unethical ones. However, it’s important to note that the model is uncensored and users are advised to implement their own alignment layer before exposing the model as a service.

The Dolphin 2.6 Phi-2 model is a great generalist model, capable of providing a wealth of human history knowledge without any internet connection. It’s like having the entire human history in your Raspberry Pi.

However, when running this model, it’s crucial to ensure that your Raspberry Pi or any single-board computer (SBC) should have at least 6 to 8 GB of RAM. This is because the Dolphin 2.6 Phi-2 model is resource-intensive and requires a substantial amount of memory to function optimally.

Phi-2 (Others based on This model)

Quantization RequiredYes, requires Q5 or Q6
Output SizeVariable, depends on the application
LatencyLow, optimized for edge devices
CompatibilityRaspberry Pi 4B and later versions

With its compact size and impressive NLP capabilities, Phi-2 is an excellent choice for running natural language processing tasks on the Raspberry Pi. By utilizing Q5 or Q6 quantization, users can optimize the model’s memory footprint without sacrificing performance, resulting in fast and accurate predictions even on constrained hardware. Whether you’re building chatbots, analyzing social media feeds, or extracting insights from unstructured text data, Phi-2 delivers top-notch results at minimal cost.


Key FeaturesFit For
1. 3 billion parameters and 8k context– Various NLP tasks
2. Trained on 627 billion token dataset– Running on resource-constrained devices
3. Quantizable to 4 bits for efficiency– Language translation
4. Apache 2.0 license for commercial use– Text generation and more

BTLM-3B-8K (Bittensor Language Model) is a powerful language model boasting 3 billion parameters and an 8k context length, making it suitable for various natural language processing tasks. Trained on an extensive dataset of 627 billion tokens from SlimPajama, BTLM-3B-8K sets a new standard for 3 billion parameter models, remarkably outperforming models trained on significantly larger token datasets, even rivaling the performance of open 7 billion parameter models.

One of the most impressive features of BTLM-3B-8K is its ability to be quantized to just 4 bits, enabling it to run on resource-constrained devices with as little as 3GB of memory. This optimization makes it a perfect candidate for deployment on small hardware platforms like the Raspberry Pi with 4GB RAM, bringing advanced language processing capabilities to edge devices.

Notably, BTLM-3B-8K comes with an Apache 2.0 license, allowing for commercial use and promoting its integration into various applications, from chatbots and language translation to text generation and more. While running the model on Raspberry Pi may not yield the fastest performance, its adaptability and efficiency are remarkable breakthroughs for making complex language models accessible on lightweight hardware.

TinyLlama-1.1B-Chat-v1.0 (Others based on it)

Model NameTinyLlama-1.1B-Chat-v1.0
PlatformRaspberry Pi (Q5 K or Q6 K quantization)
ParametersOver 1 billion
Fine-tuningConversational AI tasks
Text ResponsesHuman-like
Use CasesChatbots, education, voice assistants

The TinyLlama-1.1B-Chat-v1.0 is an effective language model for conversational AI tasks on Raspberry Pi, offering human-like text responses despite having only 1 billion parameters. It runs efficiently with Q5 or Q6 quantization. Although not exceptional, personal experience shows it’s useful for many NLP tasks such as text generation and chatbots.


Key FeaturesFit For
1. 1.3B parameter Transformer-based LLM– Generating text
2. State-of-the-art performance in NLP– Translating languages
3. Creative content generation– Writing Python code
4. Informative question answering– Writing poems, stories, emails, etc.
5. Under development with potential– Varied tasks and creative content

A 1.3B parameter Transformer-based LLM trained on synthetic NLP data, demonstrating state-of-the-art performance in common sense, language understanding, and logical reasoning. It can be used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

Here are some of the things that phi-1.5 can do:

  • Write poems, draft emails, create stories, and summarize texts
  • Write Python code
  • Translate languages
  • Answer questions in an informative way

phi-1.5 is still under development, but it has the potential to be a powerful tool for a variety of tasks.


Key FeaturesFit For
1. Usability for conversation and storytelling– Guiding with natural language
2. Diverse fine-tuning for versatility– Roleplay, storywriting, conversation
3. Efficient model size (around 3GB)– Consumer hardware like Raspberry Pi

Metharme 1.3B is a cutting-edge LLM model, leveraging the foundations of EleutherAI’s Pythia 1.4B Deduped. Designed with a focus on enhanced usability for conversation, roleplaying, and storywriting, Metharme offers an exceptional feature – natural language guidance. This means users can effortlessly guide the model through various tasks using ordinary language, similar to other instruct models.

Incorporating a unique training approach, Metharme underwent supervised fine-tuning, encompassing a diverse dataset that includes regular instructions, roleplay scenarios, fictional narratives, and synthetically generated conversations. This diverse training regimen results in a well-rounded, versatile LLM capable of handling various creative tasks.

Weighing in at approximately 3GB, Metharme’s efficient model size makes it compatible with consumer hardware, like the Raspberry Pi (4GB RAM recommended). Although its performance might not be lightning-fast, the model’s ability to run on such modest hardware is a testament to the advancements in quantization and model optimization.


Expert ModelsCoding, Conversation, Reasoning, Generalist
ArchitectureMixture of Experts (MoE)
Parameter EfficiencyNot all parameters are used when running, resulting in better performance
Hardware RequirementRaspberry Pi or any Single Board Computer (SBC) with at least 8GB RAM
TasksProgramming, Dialogues, Story Writing, and more

The phixtral-4x2_8 is a remarkable model for your Raspberry Pi. It’s the first Mixture of Experts (MoE) made with four microsoft/phi-2 models. This model is inspired by the Mixtral-8x7B-v0.1 architecture and outperforms each individual expert.

The model is composed of four expert models, each specializing in a different area: coding, conversation, reasoning, and a generalist. This unique composition allows the model to handle a wide range of tasks with high proficiency.

One of the key features of this model is its efficient use of parameters. Because it’s based on the MoE architecture, not all parameters are used when running the model. This results in better performance.

However, it’s important to note that when running this model, your Raspberry Pi or any Single Board Computer (SBC) should have at least 8GB of RAM.


The NeuralBeagle14-7B is a powerful 7B open-source AI model that has quickly climbed the ranks to become a top contender among large language models. This advanced AI model is making waves with its 7 billion parameters.

NeuralBeagle14-7B is not just any model; it’s a hybrid, created by combining the best features of two existing models, Beagle and Mar Coro. This fusion has been further enhanced by a unique technique called the Lazy Merge Kit.

NeuralBeagle14-7B is a top performer in its category and has been tested on different platforms, showing it can do a lot of different tasks. This makes it a great choice for many applications.

But remember, this model needs a lot of memory to work. So, if you’re using a Raspberry Pi or a similar device, make sure it has at least 8GB RAM.

Share This Article
SK is a versatile writer deeply passionate about anime, evolution, storytelling, art, AI, game development, and VFX. His writings transcend genres, exploring these interests and more. Dive into his captivating world of words and explore the depths of his creative universe.