MarketOpenAI Codex (language model)
Company Profile

OpenAI Codex (language model)

OpenAI Codex is a large language model developed by OpenAI for translating natural-language prompts into source code. Announced in 2021, it was a modified production version of GPT-3 that was fine-tuned on source code in multiple programming languages, and it served as the original model for GitHub Copilot.

Capabilities
Built on GPT-3, Codex was further trained on 159 gigabytes of Python code drawn from 54 million GitHub repositories. A typical use case of Codex is for a user to type a comment, such as "//compute the moving average of an array for a given window size", then use the AI to suggest a block of code that satisfies that comment prompt. OpenAI stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than to replace it. According to OpenAI's blog, Codex excels most at "mapping... simple problems to existing code", which they describe as "probably the least fun part of programming". Co-founder of Fast.ai, Jeremy Howard, said, "Codex is a way of getting code written without having to write as much code", and that "it is not always correct, but it is just close enough". OpenAI stated that Codex could complete about 37% of programming tasks in its evaluation set and was intended to make human programmers faster rather than replace them. OpenAI claims that Codex can create code in over a dozen programming languages, including Go, JavaScript, Perl, PHP, Ruby, Shell, Swift, and TypeScript, though it is most effective in Python. According to VentureBeat, OpenAI demonstrations suggested that Codex could keep track of earlier parts of a prompt and use that context to generate working code. In these demonstrations, it was used to create a browser game in JavaScript and to generate data-visualization code using matplotlib. == Limitations and concerns ==
Limitations and concerns
OpenAI demonstrations also showed weaknesses such as inefficient code and occasional unexpected results in individual examples. Copyright concerns The Free Software Foundation expressed concerns that code snippets generated by Copilot and Codex could violate copyright, in particular the condition of the GPL that requires derivative works to be licensed under equivalent terms. Issues they raised include whether training on public repositories falls into fair use or not, how developers could discover infringing generated code, whether trained machine learning models could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom. An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. In one example the model outputted the training data code implementing the fast inverse square root algorithm, including comments and an incorrect copyright notice. == References ==
tickerdossier.comtickerdossier.substack.com