8 Foss LLM That You Can Run Your PC

Are you tired of relying on cloud-based services for your AI needs? Want to take control of your own data and privacy? Look no further than Foss LLM, the cutting-edge language model that runs directly on your personal computer. Here are just a few reasons why this technology is changing the game:

  • Local processing means faster response times and reduced latency – no more waiting for AI responses from distant servers!
  • Enhanced privacy and security – keep your data safe and secure right on your own device.
  • Greater flexibility in use cases – Foss LLM can be used for a wide range of applications, from productivity tools to creative projects.
  • Improved performance with less reliance on internet connectivity

How to run these models

You can run these models using Oobabooga Web Ui, and there are usually two versions of these models: one for a GPU and one for a CPU, so you don’t need a powerful GPU to run these models.

Orca Mini v2 (7,13b)

Orca Mini v2 (7, 13b) stands as a significant advancement over its predecessor, Orca Mini v1. This second iteration boasts remarkable improvements across various domains, rectifying the limitations that hindered its predecessor’s usability. Developed through a partnership with Eric Hartford, it leverages the robustness of an Uncensored LLaMA-13b model. The training process involves meticulous fine-tuning on datasets enriched with explanations, benefiting from insights provided by WizardLM, Alpaca, and Dolly-V2 datasets. Additionally, the implementation follows the methodologies outlined in the Orca Research Paper, optimizing dataset construction. The Orca Mini v2 family encompasses two distinct models with parameter counts of 7 billion and 13 billion, offering a versatile toolkit for a wide array of applications.


StableLM-Base-Alpha-3B-v2 represents a significant advancement in compact language models. Building upon the original Alpha models, this iteration introduces architectural enhancements like SwiGLU (Shazeer, 2020) and relies on superior data sources for improved performance. With a context length of 4096 tokens, it offers a broader understanding of text.

Key enhancements include the use of high-quality data like RefinedWeb and C4 instead of The Pile v2 Common-Crawl scrape. Notably, the model’s ability to sample web text at an increased rate of 71% as opposed to 35% has led to noteworthy improvements in downstream performance, showcasing the potential of compact models to outperform in various applications.


Dolly-v2-12b is an open-source, large language model that has been trained on the Databricks machine learning platform, licensed for commercial use. This model is capable of instruction-following behavior and is based on pythia-12b.

What makes Dolly-v2-12b unique is its training on ~15k instruction/response fine-tuning records called databricks-dolly-15k, generated by Databricks employees in capability domains from the InstructGPT paper. The domains include brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Even though it is not a state-of-the-art model, Dolly-v2-12b has shown surprisingly high-quality instruction following behavior, which is not typical of the foundation model on which it is based.

The model is also available in smaller sizes, including dolly-v2-7b, a 6.9 billion parameter model based on pythia-6.9b, and dolly-v2-3b, a 2.8 billion parameter model based on pythia-2.8b. These smaller models can run on personal computers and still deliver high-quality results.


One of the most recent developments in this area is the StableLM-Tuned-Alpha model, which is a suite of 3B and 7B parameter decoder-only language models designed specifically for chat-like applications.

Built on top of the StableLM-Base-Alpha models, the StableLM-Tuned-Alpha models have been further fine-tuned on various chat and instruction-following datasets, making them highly suitable for use in open-source community chat applications. What’s more, these models are available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, which means that anyone can use them for free as long as they adhere to the licensing terms.

One of the most appealing aspects of the StableLM-Tuned-Alpha models is that they are open source and licensed for commercial usage. This means that businesses and organizations can use them to develop commercial chat applications without having to pay hefty licensing fees, making them an affordable and accessible option for companies of all sizes.

However, it’s worth noting that these models can be quite resource-intensive, with model sizes ranging from 2.5GB to almost 10GB. As a result, midrange to high-end GPUs may be needed to run them effectively, which could pose a challenge for individuals or organizations with limited computing resources.


Vicuna-13b-GPTQ is one of the latest open-source models to make waves in the text generation community. This local model boasts an impressive 13 billion parameters, making it one of the most powerful LLMs available for non-commercial use.

One of the standout features of the Vicuna-13b-GPTQ is its performance, which comes very close to that of Chat GPT in almost 90% of the usage. This makes it an excellent choice for those looking for a powerful and efficient text generation tool for a range of applications.

While the Vicuna-13b-GPTQ is a highly capable model, it does require a decently good GPU to run effectively. However, the investment is well worth it for those looking to generate high-quality text quickly and easily.

Although the model is open source, it is important to note that it is not allowed for commercial usage. However, this does not detract from its value as one of the best models available for roleplaying and other text generation tasks.


One such model is ChatGLM-6B, an open source bilingual language model with 6.2 billion parameters. With the quantization technique, it is now possible to deploy this model locally on consumer-grade graphics cards, making it more accessible for users who don’t have access to expensive hardware.

ChatGLM-6B is built on the General Language Model (GLM) framework and is optimized for Chinese question-answering and dialogue. Despite its relatively small size, the model has been trained on approximately 1 trillion tokens of Chinese and English corpus. This is supplemented by supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback. This approach has allowed the model to generate answers that are in line with human preferences, despite having only 6.2 billion parameters.

The quantization technique used in ChatGLM-6B allows for the model to be run on GPUs with only 6GB of memory at the INT4 quantization level. This means that even users with limited resources can deploy the model on their personal computers. While the model is optimized for Chinese language processing, it is also bilingual and can be used for English language tasks.

One of the most impressive aspects of ChatGLM-6B is its ability to generate high-quality responses despite its relatively small size. This is due to its optimization for Chinese QA and dialogue, as well as the various training techniques used to supplement the data. The model can be fine-tuned for specific tasks, making it a versatile tool for a wide range of applications.

Cerebras-GPT 13B

The Cerebras-GPT family of language models is a groundbreaking development in the field of AI research. Designed to facilitate research into LLM scaling laws, these models utilize open architectures and datasets, making them ideal for use by researchers around the world.

Of particular interest is the Cerebras-GPT 13B model. As the largest and most complex model in the family, this model provides researchers with a powerful tool for tackling complex language tasks. With a total of 13 billion parameters, the Cerebras-GPT 13B model is capable of generating highly sophisticated text outputs.

One of the key advantages of the Cerebras-GPT family of models is their scalability. All models in the family have been trained in accordance with Chinchilla scaling laws, which allow for compute-optimal training. This means that researchers can achieve optimal performance with minimal computational resources, making the Cerebras-GPT models an ideal choice for those working on limited hardware.

The Cerebras-GPT 13B model was trained on the Andromeda AI supercomputer, which comprises 16 CS-2 wafer-scale systems. This allowed for efficient scaling of training across nodes using simple data parallelism. Additionally, Cerebras’ weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage, further increasing training efficiency.


WizardLM is an innovative open-source language model that promises to revolutionize the field of natural language processing. The core contributors of WizardLM are working hard to prepare the model trained with full evolved instructions (approximately 300k). The team recently released the 7B version of WizardLM, which is trained with 70k evolved instructions. Although this version outperforms ChatGPT on high-complexity instructions, it lags behind ChatGPT on the entire test set, and WizardLM is still in its early stages of development.

The team behind WizardLM is constantly working to improve the model, train on larger scales, add more training data, and innovate more advanced large-model training methods. They are also looking for highly motivated students to join them as interns to create more intelligent AI together.

It is important to note that there are two versions of WizardLM available: one for CPU and one for GPU. Additionally, users can leverage this model on the obbabooga web UI. It is recommended to use Demo 1-4 of WizardLM-7B as evenly as possible to prevent one of them from being too heavy and responding slowly. Moreover, the demo currently supports only single-turn conversations in English, but support for other languages will be introduced in the future.

Sujeet Kumar
Sujeet Kumar
SK is a versatile writer deeply passionate about anime, evolution, storytelling, art, AI, game development, and VFX. His writings transcend genres, exploring these interests and more. Dive into his captivating world of words and explore the depths of his creative universe.

Latest articles

Related articles