haifeng_jin

Llama Is Open-Source,
But Why?

An analysis of Meta’s open-source large model strategy

June 2024

Training a large language model can cost millions of dollars. Why would Meta spend so much money training a model and letting everyone use it for free?

This article analyzes Meta’s GenAI and large model strategy to understand the considerations of open-sourcing their large models. We also discuss how this wave of open-source models is similar to and different from traditional open-source software.

DISCLAIMER: Whether the Llama models are genuinely open-source falls outside the scope of this article. All information is from public sources.

The illusion of proprietary models

If Meta made its models open-source, how could they make money? People would just build their own services instead of paying for the service provided by Meta, for example, the chatbot on meta.ai, an API based on Llama, or services for fine-tuning and serving the model.

However, you can never prevent people from building their own solutions. Keeping your models proprietary would not help you achieve this goal. Regardless of whether you make your models open-source, others, like Mistral AI, Alibaba, and even Google, made their models open-source.

The users are building their models based on the best open-source models. Therefore, it doesn't matter whether your model is open-source or not unless it is always better than the best open-source models in the long run. I don't think many companies are so confident about that.

So, now, you only have two choices: to be the first and the leader of open-source models, or to be a follower by releasing your models later.

Why be the leader of open-source models?

Being the leader of open-source models has many benefits, but the most important is attracting talent.

The war of GenAI is a talent competition bottlenecked by computing power. How much computing power you get largely depends on two things: your cash flow, and your relationship with Nvidia. However, how many talents you have is another story.

According to Elon Musk, Google had two-thirds of the AI talent pool in the early 2010s. To counter Google’s power, they founded OpenAI. Then, some of the best people left OpenAI and founded Anthropic to focus on AI safety. So, these three companies have the best and the most AI experts in the job market. Everyone else is super hungry for more AI experts.

Being the leader of open-source models would help Meta bridge this gap of AI experts. Open-source models attract talent in two different ways.

First, the AI experts want to work for Meta. It is super cool to have the whole world use the model you built. It gives you so much exposure for your work, amplifies your professional impact, and benefits your future career. So, many talented people would like to work for them.

Second, the AI experts in the community do the work for Meta for free. Right after the release of Llama, people started to experiment with it. They help you develop new serving technologies to reduce costs, fine-tune your models to discover new applications and scrutinize your model to discover vulnerabilities to make it safer. For example, according to this article, they did instruction tuning, quantization, quality improvements, human evals, multimodality, and RLHF for Llama within a month after its initial release. Delegating this work to the community saves Meta huge amounts of computing and human resources.

Iterate fast with the community.

With open-source models, Meta can iterate quickly with the community by directly incorporating their newly developed methods.

How much would it cost Google to adopt a new method from the community? The process consists of two phases: implementation and evaluation. First, they need to reimplement the method for Gemini. This involves rewriting the code in JAX, which requires a fair amount of engineering resources. During the evaluation, they need to run a list of benchmarks on it, which requires a lot of computing power. Most importantly, it takes time. It stopped them from iterating on the latest technologies when they were first available.

Conversely, if Meta wants to adopt a new method from the community, it will cost them nothing. The community has done the experiments and benchmarks on the Llama model directly, so not much further evaluation is needed. The code is written in PyTorch. They can just copy and paste it into their system.

Llama built a flywheel between Meta and the community. Meta brings in the latest technology from the community and rolls out its next-generation model to the community. PyTorch is the common language they speak.

Can they still make money?

The model is open-source. Wouldn’t people just build their own service? Why would they want to pay Meta for a service built on an open-source model? Of course, they will. The service is difficult to build even with an open-source model.

How do you fine-tune and align the model to your specific application? How do you balance between the service cost and the model quality? Are you aware of all the tricks to fully utilize your GPUs?

The people who know the answers to these questions are expensive to hire. Even with enough people, it is hard to get the computing power to fine-tune and serve the model. Imagine how hard it is to build Meta AI from the open-source Llama model. I would expect hundreds of employees and GPUs to be involved.

So, it is likely that people will still pay for Meta’s GenAI service if they have any in the future.

It’s just like open-source software, but not quite.

The situation is very similar to traditional open-source software. The “free code paid service” framework still applies. The code or the model is free to attract more users to the ecosystem. With a larger ecosystem, the owner collects more benefits. The service built upon the free code is for profit.

However, it is also NOT like open-source software. The main difference can be summarized as low user retention and a new type of ecosystem.

Low user retention

Open-source models have lower user retention. Migrating to a new model is much easier than to new software.

It is hard to migrate software. PyTorch and HuggingFace have established a strong ecosystem for deep learning frameworks and model pools. Imagine how hard it would be to shift their dominance even slightly if you created a new deep learning framework or model pool to compete with them.

A good example is JAX. It has better support for large-scale distributed training, but it is hard to onboard users to JAX because it has a smaller ecosystem and community. It lacks a helpful community to support users with issues. Moreover, the engineering cost of migrating the entire infra to a new framework is too high for most companies.

Open-source models do not have these problems. They are easy to migrate and require almost no user support. Therefore, it is easy for people to shift to the latest and best models. To maintain your leadership in open-source models, you must constantly release new models at the top of the leaderboard. This is also noted as a downside or challenge to be the leader in open-source models.

A new type of ecosystem

Open-source models create a new type of ecosystem. Unlike open-source software, which creates ecosystems of contributors and new software built upon them, open-source models create ecosystems of fine-tuned and quantized models, which can be seen as forks of the original model.

As a result, an open-source foundational model doesn’t have to be super good at every specific task because users would fine-tune it for their applications with domain-specific data. The most important feature of a foundational model is to meet the deployment requirements of the users, such as low latency in inferencing or being small enough to fit an end device.

This is why Llama has multiple sizes for each version. For example, Llama-3 has three versions: 8B, 70B, and 400B. They want to ensure they cover all deployment scenarios.

Summary

Even if Meta did not open-source their model, others would. So, it would be wise for Meta to open-source it early and lead the open-source models. Then, Meta can iterate quickly with the community to improve its models and catch up with OpenAI and Google.

When open-sourcing your model, there is no need to worry about people not using your service since there is still a huge gap between the foundational model and a well-built service.

Open-source models are similar to open-source software in that they all follow the “free code paid service” framework but differ in user retention rate and the type of ecosystem they create.

In the future, I would expect to see more open-source models from more companies. Unlike the deep learning frameworks converged on PyTorch, open-source models will remain diverse and competitive for a long time.