In recent years, the field of natural language processing has witnessed significant advancements, thanks in part to the development of large language models (LLMs). These models have revolutionized the way we process, understand, and generate human language.
One of the most exciting aspects of LLMs is their potential to drive innovation and create value in various industries. As a result, many companies and organizations have developed their own commercial open-source LLMs. These models are available to the public under open-source licenses, allowing developers and researchers to use and build upon them freely.
In this article, we will explore the best commercial open-source LLMs currently available. We will provide an overview of each model, including its unique features, training data, and performance on various natural language processing tasks. Whether you are a researcher, developer, or industry professional, this article will provide valuable insights into the state-of-the-art LLMs and how they can benefit your work.
These are The Best Commercial Open Source LLM
1. Falcon 40B and 7B
Falcon is an exciting new addition to the world of open source AI models. Initially, Falcon had royalty requirements for commercial use, but it has now been fully open sourced, making it accessible to a wider range of users. This model comes in two variants: Falcon 40B and Falcon 7B, referring to their respective parameter counts of 40 billion and 7 billion.
One of the standout features of Falcon is its impressive performance, surpassing Metaโs llama model in several areas. With Falcon, you have a powerful base model that can be fine-tuned to cater to your specific needs. This flexibility allows users to adapt the model for various applications and domains, making it highly versatile.
What sets Falcon apart is its extensive training on diverse datasets in multiple languages. This model has been exposed to a wide range of linguistic and cultural contexts, enabling it to grasp nuances and subtleties across different languages. Whether youโre working with English, Spanish, French, or any other supported language, Falcon is designed to deliver reliable and accurate results.
Itโs worth noting that Falcon operates under the Apache license, ensuring that users have the freedom to modify and distribute the model as per their requirements. Additionally, the model has been trained on refined web datasets, which means you have the opportunity to explore the specifics of the data it was trained on. This transparency allows users to gain insights into the training process and better understand the underlying foundations of Falconโs capabilities.
Overall, Falcon 40B and 7B present highly capable open source AI models that provide a strong starting point for various natural language processing tasks. Their versatility, extensive language support, and diverse training data make them valuable assets for developers and researchers alike.
2. MPT-7B
MPT-7B is a powerful decoder-style transformer, one of the MosaicPretrainedTransformer (MPT) models, that has been pretrained from scratch on an enormous amount of 1T tokens of English text and code. Developed by MosaicML, MPT-7B features a modified transformer architecture that is optimized for efficient training and inference.
The architectural changes of MPT-7B include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi). These modifications allow for MPT models to be trained with high throughput efficiency and stable convergence, resulting in faster training times and more accurate results. Additionally, MPT models can be served efficiently with both standard HuggingFace pipelines and NVIDIAโs FasterTransformer.
MPT-7B is licensed for the possibility of commercial use, making it a valuable tool for companies and organizations that require advanced language processing capabilities. With its impressive pretrained model and optimized architecture, MPT-7B is a top-of-the-line option for natural language processing tasks.
3. MPT-7B-StoryWriter-65k+
MPT-7B-StoryWriter-65k+ is a large language model that specializes in reading and writing fictional stories with an exceptional context length. This model was developed by finetuning the MPT-7B model with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. MPT-7B-StoryWriter-65k+ can also extrapolate beyond the 65k tokens thanks to ALiBi at inference time. This model is released under the Apache 2.0 license, which allows for its free use and distribution by anyone. With its exceptional story-writing capabilities and extensive context length, MPT-7B-StoryWriter-65k+ is a valuable tool for anyone interested in creating engaging and intricate fictional stories.
4. Dolly-v2 7B and 12B
Dolly-v2 is a family of large language models, including the 7b and 12b models. Developed by Databricks, the Dolly-v2-12b model is an instruction-following model trained on the Databricks machine learning platform. It is licensed for commercial use and is based on the Pythia-12b model.
The Dolly-v2-12b model was trained on approximately 15,000 instruction/response fine-tuning records generated by Databricks employees in capability domains such as brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. While it may not be a state-of-the-art model, Dolly-v2-12b does exhibit surprisingly high-quality instruction-following behavior that is not characteristic of the foundation model on which it is based.
Dolly-v2-12b is released under a permissive license (CC-BY-SA), which allows for commercial use.
5. StableLM 3b and 7b
StableLM 3b and 7b are two decoder-only language models developed by Stability AI and pre-trained on a diverse collection of English datasets. These models have 3 billion and 7 billion parameters, respectively, and have a sequence length of 4096, which allows them to push beyond the context window limitations of existing open-source language models.
While StableLM 3b and 7b are not as capable as some other large language models, they can be fine-tuned on your own datasets to achieve greater capability. The models are designed to be used primarily for decoding tasks, such as language generation or machine translation.
The base model checkpoints for StableLM 3b and 7b are licensed under the Creative Commons license (CC BY-SA-4.0). This means that if you use these models, you must give credit to Stability AI, provide a link to the license, and indicate if changes were made.
6. OpenLLaMA
OpenLLaMA 7B is a large language model developed as a part of the OpenLLaMA project, which aims to replicate the success of the Meta LLaMA model. The model is trained on 300 billion tokens and exhibits comparable performance to the original LLaMA and GPT-J models across a majority of tasks, while outperforming them in some tasks.
One of the unique features of the OpenLLaMA project is its sensitivity to the beginning of the sentence. In communicating with its users, the project team has realized that many existing implementations of LLaMA do not prepend the BOS token (id=1) at generation time, which can degrade the results. Therefore, the team recommends always prepending the BOS token when using its 200B checkpoint. However, the new 300B checkpoint is less sensitive to the BOS token and can be used either way.
The base model checkpoints for OpenLLaMA (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0). Under this license, users must give credit to Stability AI, provide a link to the license, and indicate if any changes were made.