Meta AI has released a new version of its advanced code generation model, Code Llama 70B. One of the largest open-source AI models for code generation, the new platform is a significant upgrade from its predecessor, making it both significantly faster and more accurate.
Code Llama 70B has been trained on 500 billion tokens of code and code-related data, and has a large context window of 100,000 tokens, allowing it to process and generate longer and more complex code across a range of languages including C++, Python, PHP and Java.
Based on Llama 2, one of the biggest general-purpose large language models (LLM) in the world, Code Llama 70B has been fine-tuned for code generation using a technique called self-attention which can better understand code relationships and dependencies.
Uphill battle
Another of the highlights of the new model is CodeLlama-70B-Instruct, a variant fine-tuned for understanding natural language instructions and generating code accordingly.
Meta AI's CEO Mark Zuckerberg stated, “The ability to code has also proven to be important for AI models to process information in other domains more rigorously and logically. I’m proud of the progress here, and looking forward to including these advances in Llama 3 and future models as well.”
Code Llama 70B is available for free download under the same license as Llama 2 and previous Code Llama models, allowing both researchers and commercial users to use and modify it.
Despite the improvements, Meta has the uphill challenge of trying to win over developers currently using GitHub Copilot, the number one AI tool for developers created by GitHub and OpenAI. Many devs are also suspicious of Meta and its data collection processes, and a lot aren’t fans of AI generated code in the first place. This can often require serious debugging, and produce code that non-programmers are happy to use but don't understand, leading to problems down the line.