Initial release The first version of Llama (stylized as LLaMA and sometimes referred to as Llama 1) was announced on February 24, 2023, via a blog post and a paper describing the
model's training, architecture, and performance.
Leak On March 3, 2023, a torrent containing Llama's weights was uploaded, with a link to the torrent shared on the
4chan imageboard and subsequently spread through online AI communities. On March 4, a pull request was opened to add links to
HuggingFace repositories containing the model. On March 20, Meta filed a
DMCA takedown request for copyright infringement against a repository containing a script that downloaded Llama from a mirror, and GitHub complied the next day. Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated
spam. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments. The model architecture remains largely unchanged from that of Llama 1 models, but 40% more data was used to train the foundational models. Llama 2 includes foundation models and models
fine-tuned for chat. In a further departure from the original version of Llama, all models are released with weights and may be used for many commercial use cases. Because Llama's license enforces an
acceptable use policy that prohibits Llama from being used for some purposes, it is not open source. Meta's use of the term
open-source to describe Llama has been disputed by the
Open Source Initiative (which maintains
The Open Source Definition) and others. Code Llama is a fine-tune of Llama 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with a 70B version released on January 29, 2024. Starting with the foundation models from Llama 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.
Llama 3 On April 18, 2024, Meta released Llama 3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating
Gemini Pro 1.5 and
Claude 3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and
multimodal, better at coding and reasoning, and to increase its context window. Regarding
scaling laws, Llama 3 models empirically showed that when a model is trained on data that is more than the "
Chinchilla-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens. During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere. Llama 3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.
Llama 4 The Llama 4 series was released in 2025. The architecture was changed to a
mixture of experts where only a fraction of the model’s expert sub-networks are activated per input token. They are multimodal (text and image input, text output) and multilingual (12 languages). • Scout: 17 billion active parameter model with 16 experts, context window of 10M, with 109B parameters in total. • Maverick: 17 billion active parameter model with 128 experts, context window of 1M, with 400B parameters in total. The Behemoth model was also announced, but was not released. Meta claimed it was a 288 billion active parameter model with 16 experts and around 2T parameters in total; it was still in training when Scout and Maverick were released. Maverick was codistilled from Behemoth, while Scout was trained from scratch. The training data included publicly available data, licensed data, and Meta-proprietary data such as publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. The knowledge cutoff was August 2024. The company also stated that Llama 4's benchmark score was achieved using an unreleased "experimental chat version" of the model that was "optimized for conversationality", which differed from the version of Llama 4 released to the public. LMArena indicated that it would change its policies to prevent this incident from reoccurring, and responded, "Meta's interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that 'Llama-4-Maverick-03-26-Experimental' was a customized model to optimize for human preference."
Comparison of models For the training cost column, only the largest model's cost is written by default. For example, "21,000" is the training cost of Llama 2 69B in units of petaFLOPS-day. Also, 1 petaFLOPS-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. "T" means "trillion" and "B" means "billion". The following table lists the main model versions of Llama, describing the significant changes included with each version: == Architecture and training ==