Code Llama is a language model (LLM) that uses textual prompts to generate code.
Two variants of Code Llama have been further refined: Code Llama – Python and Code Llama – Instruct.
Code Llama – Python is a language-focused version fine-tuned on 100B tokens of Python code. Given the wide use of Python and its importance in the artificial intelligence community, we felt it was essential to create a specialized model that specifically dealt with Python code generation, thus improving its usefulness.
Code Llama – Instruct is another variant specially trained with an instruction optimization process. The model receives natural language instructions as input, along with the expected output. This fine-tuning allows it to better understand the desired outcome from human instructions. Code Llama is specialized in code-specific tasks and is not suitable as a base model for other purposes.
The benchmark results show that Code Llama outperforms open-source code-specific LLMs and even Llama 2. For example, Code Llama 34B scored 53.7% on Human Evaluation (HumanEval) and 56.2% on Model-Based Prompted Programming (MBPP), achieving the highest performance among the latest open solutions and on par with ChatGPT.
For some time now, code generators have contributed significantly to assisting developers in their work. Powered by OpenAI’s GPT-4 model, Copilot by GitHub was launched in March. Amazon AWS also has a similar tool called CodeWhisperer, which contributes similarly to code generation, validation and updating. Finally Google is working on a code writing tool called AlphaCode but it is still under development and has not been released to the public yet.
Recently, the legal landscape surrounding generative code tools has become problematic. Accusations have arisen that Copilot may violate copyright laws by reproducing licensed code. This legal case highlights the challenges and complexities that arise when innovative artificial intelligence technologies intersect with intellectual property rights. The outcome of this legal battle could have significant implications for the future of code generation tools and their relationship with copyrighted code.