haifeng_jin

TensorFlow
Is Open-Source,
But Why?

A peek into Google’s open-source strategy

November 2022

The reason behind the open-source of TensorFlow may be quite different from what you think. At least, it is quite different from many popular open-source projects today.

I am a Software Engineer from the TensorFlow/Keras team at Google. This article summarizes my analysis of the reasons behind the open source of TensorFlow.

DISCLAIMER: The article represents the writer’s personal opinion and in no way reflects the opinions and/or ideas of Google or its partners. The entirety of the information presented within this article is sourced exclusively from publicly available materials.

A popular business model of open-source software

The most popular business model for running open-source software is the “free code, paid service” model. They use the open-source code to attract more users, but there is always a paid service based on the code down the road somewhere to earn the money back from the users. Let us see two examples.

The first example is a big open-source software, Android. Android is open-source so all smartphone manufacturers want to use this operating system instead of developing their own. Therefore, whatever brand of smartphone you buy, you are going to use the Android operating system. A large number of users are connected to their ecosystem. If any of these users ever purchase an app, android will get its share.

The second example is reveal.js from a small startup company. It is a JavaScript library for making slides. You need to write JavaScript code to use it. It is open-source and free to use. They also host a premium service on slides.com, which is a web graphical user interface based on the open-source library. Again, the open-source library is to make technical impacts and attract users, and the premium service is to make profits from some of the attracted users.

A similar conspiracy theory around TensorFlow

Similarly, we can expect that TensorFlow is trying to board more people on the platform and get profit from them somewhere down the road. The most widely spread theory is that Google open-sourced TensorFlow to connect users to their cloud services.

It makes perfect sense. TensorFlow is there to make a technical impact and attract users, and the Google Cloud Platform (GCP) would be the best premium service for using TensorFlow. Especially, the TPUs, which pushed TensorFlow’s performance to the next level, are exclusive to the GCP.

If this is true …

If this theory is true, the slide from using open-source TensorFlow to premium TensorFlow services should be buttery smooth. This is because they want to push the conversion rate from free users to paid users as high as possible. We should at least see the following things happening.

First, TPUs should be promoted in the TensorFlow documentation. Promoting TPUs can direct users to GCP, where the TPUs live. However, the TPU instructions are hidden quite deep on the website. If you land on the index page of the website, it is hard to find anything related to TPU unless you use the search bar.

Second, TensorFlow services on GCP should be super easy to use. However, on the AI & ML page of the GCP website, there is only one dedicated product for TensorFlow, which is the TensorFlow Enterprise. None of the rest of the products even mention TensorFlow as a promotion.

Third, Google Colab users should be able to pay more for premium services. Google Colab is the most popular TensorFlow service hosted by Google. Many data scientists and machine learning practitioners use it as their “TensorFlow IDE”. Google Cloud can easily make a profit by providing premium GPU and TPU services in Google Colab. However, Google Colab has extremely strict computing resource limits and there is no way to pay more to get more. You are only allowed to use a single GPU or TPU. Even if you are on the Colab Pro+, which is the top tier, your environment may still be preempted after running for 24 hours.

Why Google did not follow this path

There are always such conspiracy theories about giant companies. There is a big brother, who oversees everything and makes all the decisions. However, in reality, it is largely not true. Google is one of the world’s biggest companies. We cannot think of it as a small company, where all teams work towards a single clear goal pointed out by the upper management.

Each division at Google makes its own OKRs. As a big tech company, Google has always been living in the fear of being broken up into pieces by antitrust lawsuits. In preparation for the apocalypse, Google let every division make its OKRs, a side effect of which is that the divisions are harder to have deep strategic cooperation as they have to prioritize their own OKRs.

TensorFlow and GCP belong to different divisions at Google. TensorFlow was a tool first developed for internal research and machine learning applications at Google, which is more like an internal infrastructure software. However, cloud services are aiming at driving external revenue. Therefore, they have their own top priority goals to care of. Thus, may not have deep collaborations with each other.

Why was it initially open-sourced?

We first need to understand how big companies make small decisions. Most small decisions are made bottom-up instead of top-down. The management operates in the background and the great ideas come from engineers, who do the actual work. I believe the decision to open-source TensorFlow is not big enough to have all the upper management coordinate the teams across multiple divisions at Google to plan a long-term monetization strategy for it.

It could be a simple decision just because engineers and researchers prefer open source. The engineers, who developed TensorFlow, prefer open source for better exposure to their work. The researchers, who used TensorFlow in their research, also prefer open source to share their experimental code for credibility.

The management at Google will approve the open source request since it aligns with Google’s interest. First, it would not help their competitors beat Google. Second, it would strengthen Google’s world-leading role in technology. Third, open source drives product excellence as a world of developers contributing ideas and code to it. Fourth, Building an open-source community also reaches and locks more users. Fifth, they may make a profit from these users in the future, who knows?

Why keep devoting resources to open source?

The previous answer may be boring. However, the real interesting question here is: Why does Google keep devoting more resources to building the open-source community for TensorFlow?

Up to now, you should see that TensorFlow does not have a clear business model to make a profit. However, Google is pretty happy to spend a lot of resources just to make TensorFlow a more successful open-source project. They host online and offline events about TensorFlow. Keep renovating TensorFlow official documentation. Spend engineering power on modularizing TensorFlow, like splitting Keras and XLA out of the TensorFlow repo just for easier community contributions. Why is Google throwing so much money away just to power something that is not critical?

The answer is simple. It is critical actually. To see why it is critical, we need to approach the problem from another perspective. Let’s try to answer another similar question: If they do not care about the open-source community, what would happen?

If Google does not commit resources…

If Google only keeps releasing the code updates but does nothing to build its open-source community, such as maintaining the documentation, publishing tutorials, and calling for code contributions, here is what is going to happen. First, TensorFlow would lose a significant amount of users. Second, TensorFlow will become less useful to developers outside of Google. Google would only care about internal needs instead of the rest of the world. Third, TensorFlow will become harder to use even internally without support from the open-source community.

If it goes down this path, no one outside of Google will use TensorFlow. People in the tech world are constantly jumping around between companies. This trend will eventually penetrate Google. Even Google engineers do not want to use TensorFlow. Then, another request would be submitted to the upper management to replace TensorFlow with the most successful third-party deep learning framework at the time. The management will approve it since TensorFlow is not easy to use anyway.

Finally, Google lost control of an important piece of infrastructure software that powers everything related to machine learning at Google, which will damage the quality of many Google services. Moreover, Google needs to spend much more money on migrating its services to the new framework. It costs hundreds of times more than building an open-source community for TensorFlow.

Conclusions

Google did not have a business plan when TensorFlow was first open-sourced. It was rather a bottom-up decision from the engineers for self-interest instead of a top-down one. Google keeps investing more resources into TensorFlow to build a successful open-source community for the open-source’s own sake instead of aiming for a profit. It is just because the cost of losing the battle of open-source deep learning frameworks is too high.

Therefore, in the foreseeable future, if you are building your deep learning infrastructure, TensorFlow is a trustworthy platform to build upon. As a TensorFlow team member, it is also an ecosystem worth my time to contribute to.