15 Best Local LLM for Coding [SOTA]

20 Min Read
New Project 20 - 15 Best Local LLM for Coding [SOTA]

Large language models (LLMs) are a type of artificial intelligence (AI) that are trained on massive datasets of text and code. This allows them to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

In recent years, there has been a growing interest in developing LLMs for coding applications. This is because LLMs can be used to generate code, debug code, and even write entire programs.


Out of all the code-crunching champs we tested, WaveCoder-Ultra-6.7B by Microsoft is a real powerhouse. This bad boy uses a fancy way of learning (instruction-following, they call it) to tackle those pesky coding problems. Trained on a bunch of super-useful code snippets, WaveCoder-Ultra-6.7B can handle four major coding tasks like a boss:

  • Code Generation: Need some fresh code written? Just tell this thing what you want, and it’ll whip it up for you in no time.
  • Code Summary: Don’t have time to untangle a giant mess of code? WaveCoder-Ultra-6.7B can break it down into a clear and short summary for you.
  • Code Translation: Talking to a computer in the wrong language? No problem! This LLM can translate your code from one programming language to another.
  • Code Repair: Got a bug in your code acting like a gremlin? WaveCoder-Ultra-6.7B can find and fix those errors for you, like a code-cleaning superhero.

The results show that WaveCoder-Ultra-6.7B scores a super high 79.9 on this “HumanEval” thing, which basically means it’s really good at understanding code just like a human would. It also does well in different areas, like explaining code (scoring a 45.7) and fixing it (with a score of 52.3).

Sure, it might not be the absolute best at everything (looking at you, GPT-4), but WaveCoder-Ultra-6.7B is a great option because it focuses specifically on code.


CodeQwen1.5-7B-Chat is another cool addition to the local LLM scene for coding. It’s a decoder-only transformer model that’s been fine-tuned specifically to tackle code, based on the powerful Qwen1.5.

CodeQwen1.5-7B-Chat boasts several strengths:

  • Superior Code Generation: It excels at generating code, making it a valuable asset for programmers seeking to streamline their workflow.
  • Long Context Understanding: With a context window of a whopping 64K tokens, CodeQwen1.5-7B-Chat can handle complex coding tasks that require understanding larger codebases.
  • Multilingual Mastery: Supporting a staggering 92 programming languages, CodeQwen1.5-7B-Chat transcends language barriers for developers.
  • Beyond Code: Its capabilities extend beyond just code generation. CodeQwen1.5-7B-Chat impresses with its ability to handle text-to-SQL translation and even tackle bug fixes.

Deepseek Coder

Deepseek Coder is a top-tier collection of coding language models designed for coding tasks. It offers three sizes, ranging from 1.3 billion to 33 billion parameters. These models are renowned for their coding abilities and have achieved state-of-the-art performance in various coding benchmarks.

Deepseek Coder’s unique strength lies in its dedicated focus on coding. It was trained from scratch on a vast dataset comprising 87% code and 13% natural language in English and Chinese. This makes it versatile for users worldwide.

One standout feature is its support for project-level code completion and infilling, thanks to pre-training on a project-level code corpus. Whether you’re working with different programming languages or tackling various coding challenges, Deepseek Coder is a reliable choice for enhancing your coding productivity.

WizardCoder-Python-34B-V1.0 (7 to 34B)

The WizardCoder-Python-34B-V1.0 is a language-specialized variant of Code Llama, which has been meticulously fine-tuned on an extensive dataset of 100 billion tokens of Python code. Its exceptional performance is underscored by its second-place ranking in a recent benchmark evaluation, where it outperformed GPT-4 (with a score of 73.2 compared to 67.0), ChatGPT-3.5 (73.2 compared to 72.5), and Claude2 (73.2 compared to 71.2) as of March 15, 2023.

With Python being a highly benchmarked language for code generation and Python along with PyTorch playing pivotal roles in the AI community, the specialization of this model offers an added layer of utility. The WizardCoder-Python-34B-V1.0 is poised to become an indispensable tool for developers and AI practitioners, enabling more efficient and accurate code generation for Python-based projects. Its finely-tuned expertise in Python programming makes it an excellent choice for those seeking precise and effective coding solutions.

Model NameWizardCoder-Python-34B-V1.0
Training Data100 billion tokens of Python code
Benchmark PerformanceSecond place with a score of 73.2, outperforming GPT-4, ChatGPT-3.5, and Claude2
UtilityEfficient and accurate code generation for Python-based projects
Use CasesDevelopers and AI practitioners for precise coding solutions


Phind-CodeLlama-34B-v1 is an impressive open-source coding language model that builds upon the foundation of CodeLlama-34B. It exhibits exceptional performance, achieving a remarkable 67.6% pass rate at rank 1 on HumanEval. Notably, its superiority is further highlighted by its fine-tuning on proprietary datasets. It boasts an impressive 69.5% pass rate on HumanEval for CodeLlama-34B-Python, a variant fine-tuned on an internal Phind dataset.

The model’s reliability is substantiated by OpenAI’s decontamination methodology, ensuring the credibility of its results. The Phind-CodeLlama-34B-v1 is unique in its structural approach, employing instruction-answer pairs instead of mere code completion examples. Trained for two epochs on around 80,000 meticulously curated programming problems and solutions, the model stands out. Leveraging advanced technologies like DeepSpeed ZeRO 3 and Flash Attention 2, it was trained efficiently in just three hours using 32 A100-80GB GPUs. With a sequence length of 4096 tokens, Phind-CodeLlama-34B-v1 is at the forefront of efficient and effective open-source coding language models.

Model NamePhind-CodeLlama-34B-v1
SpecializationNone specified
Performance67.6% pass rate at rank 1 on HumanEval, 69.5% for CodeLlama-34B-Python
Training DataFine-tuned on proprietary datasets
ReliabilityDecontamination methodology for credibility
Training DetailsTrained for two epochs on 80,000 programming problems and solutions
Technology UsedDeepSpeed ZeRO 3 and Flash Attention 2 for efficient training


The Moe-2x7b-QA-Code is a top-notch language model that’s perfect for answering questions and handling code-related queries. It uses a Mixture of Experts (MoE) architecture and has been trained on a wide range of technical data, including documentation, forums, and code repositories. This makes it very accurate and context-aware.

This model is a great fit for the article “Best LLM For Coding”. It’s specialized in code-related queries, making it a fantastic resource for readers interested in coding. Being open-source, it’s accessible to everyone. Its high performance in understanding language shows its effectiveness. So, if you’re looking for a helpful tool in coding, Moe-2x7b-QA-Code is a great choice!


WizardCoder-15B is an innovative LLM (Language Model for Coding) designed specifically for coding tasks. Developed using the Evol-Instruct method, WizardCoder focuses on adapting prompts to the domain of code-related instructions. This specialized training approach ensures that the model excels in understanding and generating code instructions accurately.

The foundation of WizardCoder-15B lies in the fine-tuning of the Code LLM, StarCoder, which has been widely recognized for its exceptional capabilities in code-related tasks. By utilizing a newly created instruction-following training set, WizardCoder has been tailored to provide unparalleled performance and accuracy when it comes to coding.

Although WizardCoder-15B is primarily designed for coding tasks and not intended for conversation or other logic-based tasks, its coding capabilities are on par with, if not better than, renowned LLMs like ChatGPT, Bard, and Claude. Whether you need assistance in code generation, understanding programming concepts, or debugging, WizardCoder-15B is a reliable and powerful tool that can significantly enhance your coding experience.

Model NameWizardCoder-15B
SpecializationCoding tasks
Training MethodEvol-Instruct method
FoundationFine-tuned from StarCoder
PerformanceComparable to ChatGPT, Bard, and Claude for coding tasks
Use CasesCode generation, understanding programming concepts, debugging

Stable Code 3B

Stable Code 3B is a state-of-the-art Large Language Model (LLM) developed by Stability AI. It is a 3 billion parameter model that allows accurate and responsive code completion. This model is on par with models such as CodeLLaMA 7b that are 2.5 times larger.

One of the key features of Stable Code 3B is its ability to operate offline even without a GPU on common laptops such as a MacBook Air. This model was trained on software engineering-specific data, including code. It offers more features and significantly better performance across multiple languages.

Stable Code 3B also supports Fill in the Middle capabilities (FIM) and expanded context size. It is trained on 18 programming languages (selected based on the 2023 StackOverflow Developer Survey) and demonstrates state-of-the-art performance on the MultiPL-E metrics across multiple programming languages tested.

In the range of models with 3 billion to 7 billion parameters, Stable Code 3B stands out as one of the best due to its high-level performance and compact size. It is 60% smaller than CodeLLaMA 7b while offering similar performance, making it a top choice for developers seeking efficient and effective code completion tools.

Code Llama (7, 13, 34b)

Code Llama offers a trio of powerful open source coding language models, available in 7B, 13B, and 34B parameter configurations. Each variant is trained on an impressive 500B tokens of code and code-related data, ensuring robust performance. Notably, the 7B and 13B models possess fill-in-the-middle (FIM) capabilities, facilitating seamless code insertion and immediate code completion tasks.

These models cater to diverse requirements. The 7B model is suitable for single GPU deployment, prioritizing efficiency. The 34B model, while slightly more resource-intensive, delivers superior coding assistance, enhancing code quality. Furthermore, Code Llama extends its functionality through two specialized variations: Code Llama – Python and Code Llama – Instruct.

Code Llama – Python stands out as a language-specific adaptation, fine-tuned on an extensive 100B tokens of Python code. This specialization enhances its aptitude for generating Python code, a widely used language in the coding landscape. As Python and PyTorch hold significance in the AI community, this tailored model is poised to provide enhanced utility and performance for users.

Model NameCode Llama (7B, 13B, 34B)
Parameter Configurations7B, 13B, 34B
Training Data500 billion tokens of code and code-related data
FeaturesFIM capabilities for code insertion and completion
SpecializationsCode Llama – Python and Code Llama – Instruct
Specialization DetailsCode Llama – Python fine-tuned on 100 billion tokens of Python code


OctoCoder is an advanced AI coding language model, boasting an impressive 15.5 billion parameters. This model is the result of refining StarCoder through a process called instruction tuning, wherein it was trained on CommitPackFT and OASST datasets, as elucidated in the OctoPack research paper. OctoCoder is a polyglot, proficient in over 80 programming languages, making it a versatile tool for a wide range of coding tasks. While its extensive capabilities are remarkable, it’s worth noting that OctoCoder’s resource requirements might vary across hardware setups, although achieving functionality on different systems is feasible with appropriate configuration.

Model NameOctoCoder
Parameters15.5 billion
Training DataTrained on CommitPackFT and OASST datasets
ProficiencyProficient in over 80 programming languages
Resource RequirementsVaries across hardware setups, configurable for different systems

Redmond-Hermes-Coder 15B

Redmond-Hermes-Coder 15B is an advanced open source coding language model that has been fine-tuned on a comprehensive dataset of over 300,000 instructions. Developed by Nous Research in collaboration with Teknium, Karan4D, Redmond AI, and other contributors, this model represents the forefront of code generation technology.

Built upon the WizardCoder framework, Redmond-Hermes-Coder 15B outperforms its predecessor, the Nous-Hermes model based on Llama, in terms of coding proficiency. While it may not match the pure code benchmarks set by WizardCoder, such as HumanEval, it showcases exceptional capabilities in generating high-quality code.

Preliminary experiments have shown that Redmond-Hermes-Coder 15B achieved a score of 39% on HumanEval, while WizardCoder attained a score of 57%. However, ongoing efforts are being made to further improve its performance.

What sets Redmond-Hermes-Coder 15B apart is its versatility beyond coding tasks. In various non-code applications, including writing assignments, it has demonstrated promising results, even surpassing WizardCoder. This broadens its utility and makes it a valuable tool for both coding and writing endeavors.

Model NameRedmond-Hermes-Coder 15B
Training DataComprehensive dataset of over 300,000 instructions
CollaboratorsNous Research, Teknium, Karan4D, Redmond AI, and others
FrameworkBuilt upon the WizardCoder framework
PerformanceAchieved a score of 39% on HumanEval, versatile for non-code applications
Ongoing ImprovementsEfforts underway to enhance performance


Phi-1, developed by Microsoft, is a compact yet powerful Open Source Coding Language Model (LLM) with 1.3 billion parameters. Designed for Python programming, it’s known for its efficiency on resource-constrained hardware. Trained on diverse data sources, including StackOverflow and code contests, Phi-1 achieves over 50% accuracy on Python coding tasks. However, it primarily uses specific Python packages, and there’s a slight risk of replicating online scripts. Phi-1 offers a valuable solution for Python developers, balancing power and resource efficiency.

Model NamePhi-1
Parameters1.3 billion
Training DataDiverse sources, including StackOverflow and code contests
PerformanceOver 50% accuracy on Python coding tasks
Resource EfficiencyEfficient on resource-constrained hardware

DeciCoder 1B

DeciCoder 1B stands out as one of the smallest AI coding models covered in this article. Boasting just 1 billion parameters, it’s designed specifically for code completion tasks. This model has been trained on code subsets from Python, Java, and Javascript found in the Starcoder Training Dataset. It employs a feature called Grouped Query Attention, which enhances its understanding of context. With a context window spanning 2048 tokens, DeciCoder 1B was fine-tuned using a Fill-in-the-Middle training objective.

What’s intriguing is that its architecture is the result of Deci’s innovative Neural Architecture Search-based technology, AutoNAC. This model’s strength lies not just in its capabilities, but in its accessibility too—it’s optimized to comfortably run on most everyday hardware, ensuring ease of use for developers.

Model NameDeciCoder 1B
Parameters1 billion
SpecializationCode completion
Training DataCode subsets from Python, Java, and JavaScript
FeatureGrouped Query Attention for context understanding
Context Window2048 tokens
ArchitectureResult of Deci’s Neural Architecture Search-based technology


StableCode-Instruct-Alpha-3B stands out as a remarkable open source coding language model with its 3 billion parameter decoder-only architecture. What makes this model truly stand apart is its focus on instruction-tuned code generation across a diverse spectrum of programming languages. This open source marvel has disrupted the landscape dominated by proprietary coding language models, with many even considering it to be surpassing them.

One of the key factors that elevates StableCode-Instruct-Alpha-3B is its performance in the Stack Overflow developer survey. Topping this renowned survey reflects its acceptance and appreciation within the developer community. Unlike its proprietary counterparts, this model brings an aura of inclusivity by being open source, enabling developers to engage, contribute, and enhance its capabilities collaboratively.

Model NameStableCode-Instruct-Alpha-3B
Parameters3 billion
ArchitectureDecoder-only architecture
FocusInstruction-tuned code generation across diverse languages
Open SourceCollaborative development and enhancement capabilities
RecognitionHigh ranking in the Stack Overflow developer survey


CodeGen2.5-7B is an advanced open-source autoregressive language model designed for program synthesis. It builds upon its predecessor, CodeGen2, and is trained on a massive dataset called StarCoderData, consisting of 1.4 trillion tokens. Despite its smaller size, CodeGen2.5-7B delivers competitive results similar to larger models like StarCoderBase-15.5B.

This LLM offers infilling capabilities, allowing it to generate code to fill in missing or incomplete sections. It supports multiple programming languages, making it versatile for developers. Additionally, it has been further trained on Python tokens to enhance its language proficiency, resulting in the CodeGen2.5-7B-mono variant.

For research purposes, CodeGen2.5-7B-instruct is available, which continues training from CodeGen2.5-7B-mono using instruction data. However, it’s important to note that this variant is only intended for research and not licensed for general use.

CodeGen2.5-7B and its variants are released under the Apache-2.0 license, enabling developers and researchers to utilize and expand upon the models. With its impressive performance, language versatility, and support for multiple programming languages, CodeGen2.5-7B is a top choice for program synthesis tasks.

Share This Article
SK is a versatile writer deeply passionate about anime, evolution, storytelling, art, AI, game development, and VFX. His writings transcend genres, exploring these interests and more. Dive into his captivating world of words and explore the depths of his creative universe.