In recent years, the field of natural language processing has witnessed significant advancements, thanks in part to the development of large language models (LLMs). These models have revolutionized the way we process, understand, and generate human language.
One of the most exciting aspects of LLMs is their potential to drive innovation and create value in various industries. As a result, many companies and organizations have developed their own commercial open-source LLMs. These models are available to the public under open-source licenses, allowing developers and researchers to use and build upon them freely.
In this article, we will explore the best commercial open-source LLMs currently available. We will provide an overview of each model, including its unique features, training data, and performance on various natural language processing tasks. Whether you are a researcher, developer, or industry professional, this article will provide valuable insights into the state-of-the-art LLMs and how they can benefit your work.
MPT-7B is a powerful decoder-style transformer, one of the MosaicPretrainedTransformer (MPT) models, that has been pretrained from scratch on an enormous amount of 1T tokens of English text and code. Developed by MosaicML, MPT-7B features a modified transformer architecture that is optimized for efficient training and inference.
The architectural changes of MPT-7B include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi). These modifications allow for MPT models to be trained with high throughput efficiency and stable convergence, resulting in faster training times and more accurate results. Additionally, MPT models can be served efficiently with both standard HuggingFace pipelines and NVIDIA’s FasterTransformer.
MPT-7B is licensed for the possibility of commercial use, making it a valuable tool for companies and organizations that require advanced language processing capabilities. With its impressive pretrained model and optimized architecture, MPT-7B is a top-of-the-line option for natural language processing tasks.
MPT-7B-StoryWriter-65k+ is a large language model that specializes in reading and writing fictional stories with an exceptional context length. This model was developed by finetuning the MPT-7B model with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. MPT-7B-StoryWriter-65k+ can also extrapolate beyond the 65k tokens thanks to ALiBi at inference time. This model is released under the Apache 2.0 license, which allows for its free use and distribution by anyone. With its exceptional story-writing capabilities and extensive context length, MPT-7B-StoryWriter-65k+ is a valuable tool for anyone interested in creating engaging and intricate fictional stories.
Dolly-v2 7B and 12B
Dolly-v2 is a family of large language models, including the 7b and 12b models. Developed by Databricks, the Dolly-v2-12b model is an instruction-following model trained on the Databricks machine learning platform. It is licensed for commercial use and is based on the Pythia-12b model.
The Dolly-v2-12b model was trained on approximately 15,000 instruction/response fine-tuning records generated by Databricks employees in capability domains such as brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. While it may not be a state-of-the-art model, Dolly-v2-12b does exhibit surprisingly high-quality instruction-following behavior that is not characteristic of the foundation model on which it is based.
Dolly-v2-12b is released under a permissive license (CC-BY-SA), which allows for commercial use.
StableLM 3b and 7b
StableLM 3b and 7b are two decoder-only language models developed by Stability AI and pre-trained on a diverse collection of English datasets. These models have 3 billion and 7 billion parameters, respectively, and have a sequence length of 4096, which allows them to push beyond the context window limitations of existing open-source language models.
While StableLM 3b and 7b are not as capable as some other large language models, they can be fine-tuned on your own datasets to achieve greater capability. The models are designed to be used primarily for decoding tasks, such as language generation or machine translation.
The base model checkpoints for StableLM 3b and 7b are licensed under the Creative Commons license (CC BY-SA-4.0). This means that if you use these models, you must give credit to Stability AI, provide a link to the license, and indicate if changes were made.
OpenLLaMA 7B is a large language model developed as a part of the OpenLLaMA project, which aims to replicate the success of the Meta LLaMA model. The model is trained on 300 billion tokens and exhibits comparable performance to the original LLaMA and GPT-J models across a majority of tasks, while outperforming them in some tasks.
One of the unique features of the OpenLLaMA project is its sensitivity to the beginning of the sentence. In communicating with its users, the project team has realized that many existing implementations of LLaMA do not prepend the BOS token (id=1) at generation time, which can degrade the results. Therefore, the team recommends always prepending the BOS token when using its 200B checkpoint. However, the new 300B checkpoint is less sensitive to the BOS token and can be used either way.
The base model checkpoints for OpenLLaMA (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0). Under this license, users must give credit to Stability AI, provide a link to the license, and indicate if any changes were made.