StarCoder2 LLMs Designed Specifically to Generate Code Arrive

ServiceNow, Hugging Face and NVIDIA have once again combined their efforts to advance a generative artificial intelligence (AI) platform that has been specifically trained to generate higher quality code than rival approaches based on general-purpose large language models (LLMs) such as ChatGPT.

StarCoder2 is a family of LLMs created by the BigCode community led by ServiceNow. Three iterations are starting now with a 3 billion‑parameter model trained by ServiceNow; a 7 billion‑parameter model was trained by Hugging Face, while NVIDIA built a 15 billion‑parameter model.

The 3 billion‑parameter model is capable of matching the performance of the original StarCoder 15 billion‑parameter model.

Nicolas Chapados, vice president of research for ServiceNow, said as AI research continues to advance, each successive wave of LLMs becomes more efficient. Each organization, depending on the use case and total cost, can then opt to deploy StarCoder LLMs either on a local machine or in the cloud. The more parameters an LLM has, the more infrastructure resources are needed to run any inference engine based on it.

The StarCoder2 LLMs were trained using 619 programming languages and can be extended to, for example, provide summarizations of code. The foundation of StarCoder2 is a new code dataset called The Stack v2 that is more than seven times larger than version 1 of the dataset. In addition, additional training has enabled the model to understand low‑resource programming languages such as COBOL, mathematics and program source code discussions. Organizations can also fine‑tune models with industry or organization‑specific data using open source tools such as the NVIDIA NeMo platform or Hugging Face transform reinforcement learning (TRL) techniques. StarCoder2 was built using data under license from the digital commons of Software Heritage hosted by Inria.

The overall goal is to provide development teams with LLMs that, in addition to being less likely to hallucinate, also generates code with much fewer vulnerabilities because the code generated is based on exampled vetted by the BigCode community, said Chapados.

It’s not clear right now how much of the code finding its way into production environments is being generated by machines versus humans. However, as LLMs continue to advance, the percentage created by machines will undoubtedly increase. The challenge DevOps teams now face is determining how to apply AI to pipelines that will increase in number as codebases not only proliferate but also become inevitably larger.

In the meantime, DevOps teams would be well-advised to become familiar with what type of LLM is being used to train the generative AI platforms being used to create code. The more general purpose an LLM is, the more likely it is to generate suboptimal code that DevOps teams are only going to need to eventually replace, especially if that code is consuming expensive infrastructure resources inefficiently.

There is, of course, as far as generative AI is concerned, no going back at this point. The issue now is determining how to keep track of the provenance of all the code being generated to determine how best to fix it whenever needed.