9 Local LLM Models For Raspberry Pi

This article presents a concise overview of the top LLM (Large Language Model) choices that are optimized to run efficiently on Raspberry Pi. In recent times, advancements in quantization and model optimization have made it possible to harness the power of language models on small hardware like the Raspberry Pi.

Contents

We will explore five carefully selected LLM models that are not only compatible with Raspberry Pi’s 4GB RAM variant but also offer impressive capabilities despite hardware limitations. Each model will be accompanied by a brief description highlighting its unique features, dataset training, and quantization sizes.

Related: Best SBC for Running LLM

Note: I am not guaranteeing that it will work on your RPI, but I know from people’s experiences which kind of model with what kind of quantization might work on RPI. So, I collected the smallest but most capable ones, but it’s always worth a try.

By running LLMs on Raspberry Pi, users can unlock a world of possibilities. They can create truly intelligent assistants with advanced natural language understanding, personalized chatbots for specific applications, engaging story-telling experiences, language translation tools, and even AI-powered educational aids. With the LLM’s enhanced reasoning and comprehension abilities, the potential for innovative and impactful applications is limitless.

Phi-2-Orange

Phi-2-Orange is a two-step fine-tuned model of Phi-2. The first step of fine-tuning uses a broad range of training data, including Open-Orca/SlimOrca-Dedup, migtissera/Synthia-v1.3, LDJnr/Verified-Camel, LDJnr/Pure-Dove, LDJnr/Capybara, and meta-math/MetaMathQA. The second step involves a DPO fine-tune using Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences-cleaned.

Phi-2-Orange uses ChatML as the prompt format, with or without the system instruction. It has shown impressive performance in evaluations done using mlabonne’s useful Colab notebook llm-autoeval.

This model fits well into the article’s topic as it is a state-of-the-art language model that can operate efficiently on 4GB RAM. Its unique two-step fine-tuning process and use of diverse training data make it a compelling choice for those interested in exploring the capabilities of compact yet powerful language models. The detailed description provided here aims to help readers understand the model’s features and capabilities, aiding their search for the best SOTA LLM for 4GB VRAM.

Check

Dolphin 2.6 Phi-2

Features	Requirements
Based on the Phi-2 architecture	Raspberry Pi or any single-board computer (SBC)
Ability to generate text that is highly compliant to any requests	At least 6 to 8 GB of RAM

The Dolphin 2.6 Phi-2 is a highly advanced language model developed by Eric Hartford and Fernando Fernandes. This model is based on the Phi-2 architecture and has been sponsored by Convai. It has undergone significant improvements in its latest 2.6 version, including a fix to a training configuration issue and the reintroduction of samantha-based empathy data.

One of the key features of this model is its ability to generate text that is highly compliant to any requests, even unethical ones. However, it’s important to note that the model is uncensored and users are advised to implement their own alignment layer before exposing the model as a service.

The Dolphin 2.6 Phi-2 model is a great generalist model, capable of providing a wealth of human history knowledge without any internet connection. It’s like having the entire human history in your Raspberry Pi.

However, when running this model, it’s crucial to ensure that your Raspberry Pi or any single-board computer (SBC) should have at least 6 to 8 GB of RAM. This is because the Dolphin 2.6 Phi-2 model is resource-intensive and requires a substantial amount of memory to function optimally.

Check Out

Phi-2 (Others based on This model)

Specification	Description
Quantization Required	Yes, requires Q5 or Q6
Output Size	Variable, depends on the application
Latency	Low, optimized for edge devices
Memory	8GB LPDDR4x SDRAM
Compatibility	Raspberry Pi 4B and later versions

With its compact size and impressive NLP capabilities, Phi-2 is an excellent choice for running natural language processing tasks on the Raspberry Pi. By utilizing Q5 or Q6 quantization, users can optimize the model’s memory footprint without sacrificing performance, resulting in fast and accurate predictions even on constrained hardware. Whether you’re building chatbots, analyzing social media feeds, or extracting insights from unstructured text data, Phi-2 delivers top-notch results at minimal cost.

Check Out

BTLM-3B-8K

Key Features	Fit For
1. 3 billion parameters and 8k context	– Various NLP tasks
2. Trained on 627 billion token dataset	– Running on resource-constrained devices
3. Quantizable to 4 bits for efficiency	– Language translation
4. Apache 2.0 license for commercial use	– Text generation and more

BTLM-3B-8K (Bittensor Language Model) is a powerful language model boasting 3 billion parameters and an 8k context length, making it suitable for various natural language processing tasks. Trained on an extensive dataset of 627 billion tokens from SlimPajama, BTLM-3B-8K sets a new standard for 3 billion parameter models, remarkably outperforming models trained on significantly larger token datasets, even rivaling the performance of open 7 billion parameter models.

One of the most impressive features of BTLM-3B-8K is its ability to be quantized to just 4 bits, enabling it to run on resource-constrained devices with as little as 3GB of memory. This optimization makes it a perfect candidate for deployment on small hardware platforms like the Raspberry Pi with 4GB RAM, bringing advanced language processing capabilities to edge devices.

Notably, BTLM-3B-8K comes with an Apache 2.0 license, allowing for commercial use and promoting its integration into various applications, from chatbots and language translation to text generation and more. While running the model on Raspberry Pi may not yield the fastest performance, its adaptability and efficiency are remarkable breakthroughs for making complex language models accessible on lightweight hardware.

Check Out

TinyLlama-1.1B-Chat-v1.0 (Others based on it)

Specification	Description
Model Name	TinyLlama-1.1B-Chat-v1.0
Platform	Raspberry Pi (Q5 K or Q6 K quantization)
Parameters	Over 1 billion
Fine-tuning	Conversational AI tasks
Text Responses	Human-like
Use Cases	Chatbots, education, voice assistants

The TinyLlama-1.1B-Chat-v1.0 is an effective language model for conversational AI tasks on Raspberry Pi, offering human-like text responses despite having only 1 billion parameters. It runs efficiently with Q5 or Q6 quantization. Although not exceptional, personal experience shows it’s useful for many NLP tasks such as text generation and chatbots.

Check Out

phi-1.5

Key Features	Fit For
1. 1.3B parameter Transformer-based LLM	– Generating text
2. State-of-the-art performance in NLP	– Translating languages
3. Creative content generation	– Writing Python code
4. Informative question answering	– Writing poems, stories, emails, etc.
5. Under development with potential	– Varied tasks and creative content

A 1.3B parameter Transformer-based LLM trained on synthetic NLP data, demonstrating state-of-the-art performance in common sense, language understanding, and logical reasoning. It can be used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

Here are some of the things that phi-1.5 can do:

Write poems, draft emails, create stories, and summarize texts
Write Python code

Translate languages
Answer questions in an informative way

phi-1.5 is still under development, but it has the potential to be a powerful tool for a variety of tasks.

Check Out

Metharme_1.3B

Key Features	Fit For
1. Usability for conversation and storytelling	– Guiding with natural language
2. Diverse fine-tuning for versatility	– Roleplay, storywriting, conversation
3. Efficient model size (around 3GB)	– Consumer hardware like Raspberry Pi

Metharme 1.3B is a cutting-edge LLM model, leveraging the foundations of EleutherAI’s Pythia 1.4B Deduped. Designed with a focus on enhanced usability for conversation, roleplaying, and storywriting, Metharme offers an exceptional feature – natural language guidance. This means users can effortlessly guide the model through various tasks using ordinary language, similar to other instruct models.

Incorporating a unique training approach, Metharme underwent supervised fine-tuning, encompassing a diverse dataset that includes regular instructions, roleplay scenarios, fictional narratives, and synthetically generated conversations. This diverse training regimen results in a well-rounded, versatile LLM capable of handling various creative tasks.

Weighing in at approximately 3GB, Metharme’s efficient model size makes it compatible with consumer hardware, like the Raspberry Pi (4GB RAM recommended). Although its performance might not be lightning-fast, the model’s ability to run on such modest hardware is a testament to the advancements in quantization and model optimization.

Check Out

Check Out (GGML)

Phixtral-4x2_8

Feature/Requirement	Description
Expert Models	Coding, Conversation, Reasoning, Generalist
Architecture	Mixture of Experts (MoE)
Parameter Efficiency	Not all parameters are used when running, resulting in better performance
Hardware Requirement	Raspberry Pi or any Single Board Computer (SBC) with at least 8GB RAM
Tasks	Programming, Dialogues, Story Writing, and more

The phixtral-4x2_8 is a remarkable model for your Raspberry Pi. It’s the first Mixture of Experts (MoE) made with four microsoft/phi-2 models. This model is inspired by the Mixtral-8x7B-v0.1 architecture and outperforms each individual expert.

The model is composed of four expert models, each specializing in a different area: coding, conversation, reasoning, and a generalist. This unique composition allows the model to handle a wide range of tasks with high proficiency.

One of the key features of this model is its efficient use of parameters. Because it’s based on the MoE architecture, not all parameters are used when running the model. This results in better performance.

However, it’s important to note that when running this model, your Raspberry Pi or any Single Board Computer (SBC) should have at least 8GB of RAM.

Check Out

NeuralBeagle14-7B

The NeuralBeagle14-7B is a powerful 7B open-source AI model that has quickly climbed the ranks to become a top contender among large language models. This advanced AI model is making waves with its 7 billion parameters.

NeuralBeagle14-7B is not just any model; it’s a hybrid, created by combining the best features of two existing models, Beagle and Mar Coro. This fusion has been further enhanced by a unique technique called the Lazy Merge Kit.

NeuralBeagle14-7B is a top performer in its category and has been tested on different platforms, showing it can do a lot of different tasks. This makes it a great choice for many applications.

But remember, this model needs a lot of memory to work. So, if you’re using a Raspberry Pi or a similar device, make sure it has at least 8GB RAM.

Check Out

9 Local LLM Models for Raspberry Pi

Phi-2-Orange

Dolphin 2.6 Phi-2

Phi-2 (Others based on This model)

BTLM-3B-8K

TinyLlama-1.1B-Chat-v1.0 (Others based on it)

phi-1.5

Metharme_1.3B

Phixtral-4x2_8

NeuralBeagle14-7B

Recent

Hallucination in LLM is Advantage

Best Open Source TTS

8 Best LLM For Low End Smartphone (1 – 4 GB RAM)

6 Best Mamba Based LLM (Open Source)

Where imagination meets innovation