Meta’s Llama 2 Elbows Into a Still-Open Field

Half of ChatGPT 3.5’s size, it’s portable to smartphones and open to interface

4 min read
internals of a computer with blue lines and red squares, a blue box that reads "ai chat bot" and 1's and 0's in lines
Getty Images

Last week, Meta introduced Llama 2, a new large language model with up to 70 billion parameters. The new generative AI system represents a spectacular shot across the bow of OpenAI, which shares few details about most AI models including GPT-3/3.5 and GPT-4. Llama 2’s release—with 40 percent of the parameters of ChatGPT 3.5, according to Wikipedia—included a prominent partnership with Microsoft. And Redmond is not just a nominal partner either, having recently announced support for Llama 2 in Azure and Windows. Meanwhile, Qualcomm now says it’s entering the LLM fray with Llama 2—unveiling plans to bring Llama 2 to smartphones.

Slightly more contentious is Meta’s claim that Llama 2 is open source. Meta and Microsoft certainly tout the new Llama‘s open-source credentials. (Some open-source developers, on the other hand, beg to differ.)

Llama 2’s license is a force multiplier, providing developers and researchers an opportunity to tune the model for their specific needs.

Last week’s developments spell, whatever the sourcing, a dramatic expansion in the capability and reach of open source AI models.

“Oh, this is just way better,” says Aravind Srinivas, cofounder and CEO of “Whether they match GPT 3.5 or not [now], it’s just a matter of time.”

Llama 2: Fine-tuned and Ready to Chat offers an impressive, free online demo of multiple Llama 2 models. Its results are competitive with top chatbots today, including ChatGPT and Google Bard. Llama 2 rapidly generates clean, natural text which, although unlikely to win awards, is easy to read and understand. Llama 2 can also produce commonly understood facts, generate code, and solve mathematical equations.

Llama 2, like all LLMs, will occasionally generate incorrect or unusable answers, but Meta’s paper introducing Llama 2 claims it’s on par with OpenAI’s GPT 3.5 in academic benchmarks such as MMLU (which measures an LLM’s knowledge across 57 STEM subjects) and GSM8K (which measures an LLM’s understanding of math).

Most small models that outperform Llama 2 on the Open LLM leaderboard are themselves based on Meta’s prior model, Llama.

Meta’s researchers achieved this partially through sheer model size, but that’s only half the story. Llama 2 uses supervised fine-tuning, reinforcement learning with human feedback, and a novel technique called Ghost Attention (GAtt) which, according to Meta’s paper, “enables dialogue control over multiple turns.” Put more simply, GAtt helps Llama 2 generate desired results when asked to work within a specific constraint, as might occur when asked to “act as” a historical figure, or to produce responses within the context of a specific topic, such as architecture.

An example of the Llama 2 model's "Ghost Attention" in action. Two chat windows are visible. One asks the chatbot to answer all questions in the form of a Haiku, while the other asks it to provide responses relevant to architecture, if possible. Llama 2’s “Ghost Attention,” the LLM’s promoters say, helps the model provide conversational results that fit user-defined constraints. Meta

These techniques help Llama 2 offer a diverse range of models with solid benchmark performance relative to their size. The largest model, Llama 2 70B (with 70 billion parameters), performs best across all benchmarks, but Meta also provides Llama 2 7B and Llama 2 13B.

Variants with fewer parameters don’t perform as well as Llama 2 70B, but they’re compact enough to run locally on less powerful devices—like smartphones. Qualcomm, a leading producer of smartphone systems-on-a-chip (SoCs), announced a partnership with Meta to have Llama 2 running locally on Qualcomm-powered smartphones “starting in 2024.”

“We are able to use our software tools to compile and optimize the model specifically to run on our Hexagon processor,” says Rodrigo Caruso Neves do Amaral, marketing communications specialist at Qualcomm. “The amount of energy that is saved by running on the device makes a huge impact, whether it’s to the companies that are running these models, or to the consumer that sometimes would have to pay for getting access to these applications.”

Open Source Fits Where Closed Models Can’t

Running a large language model offline on a smartphone is something closed AI models (like OpenAI’s GPT 3.5 and Google’s PaLM2) can’t handle. This isn’t necessarily due to technical limitations (presumably, OpenAI and Google could offer a model suitable for a smartphone) but instead a philosophical divide. OpenAI and Google provide LLMs as an API. An Internet connection is required to access the API, and customers are charged based on use.

Llama 2, by contrast, was released with a license that allows unlimited, free commercial and academic use. The license doesn’t meet all standards set by the Open Source Initiative, as the license includes a clause that requires permission to use Llama 2 for “products or services” with “greater than 700 million monthly active users.” However, this clause is relevant only to Meta’s largest competitors, such as OpenAI and Google. Meta’s Llama 2 models already appear on HuggingFace’s Open LLM leaderboard, with “llama-2-70b-chat-hf” claiming third best-performing latency and throughput benchmarks, as of close of Monday, 24 July. (AI developers are quickly exploiting Llama’s 2 potential: The current top model as of press time, Stability AI’s FreeWilly2, is in fact already based on Llama 2, but FreeWilly2 fine-tunes the model with a different dataset.)

A screenshot of the HuggingFace OpenLLM Leaderboard, which provides a list ranking open source LLMs. The Llama 2 model "llama-2-70b-chat-hf-" is in second place.As of 21 July, the AI aggregator HuggingFace’s OpenLLM Leaderboard placed “llama-2-70b-chat-hf” as the second-highest performer among all open LLMs in metrics for performance and latency. HuggingFace

Srinivas sees Llama 2’s open-source license as a force multiplier, providing developers and researchers an opportunity to tune the model for their specific needs. “One person can start a fork of Llama 2 where they focus on quantization, another person can start a fork of Llama where they focus on low-rank fine-tuning, …another person can work on the distillation of larger models into smaller models. The progress just accelerates.”

This will prove particularly relevant for developers targeting devices on the edge—such as smartphones. The fact that Llama 2 70B performs well is no great surprise, given the size of the model. But Llama 2’s smaller models also rank well relative to their model size. And most small models that outperform Llama 2 on the Open LLM leaderboard are themselves based on Meta’s prior model, Llama. That suggests Llama 2 will race up the charts as developers in the open-source community apply their talents to Llama 2.

“I think that [Llama 2 7B and Llama 2 13B] are already exciting. ... This is just the start, right? [Meta] put it out, and now people can improve on it,” says Srinivas. “Other frameworks and other engineering layers can be built, and this gives more power to everybody.”

The Conversation (0)