Founded for 2 years, with each employee valued at $21 million, why did MosaicML sell for $1.3 billion?

Why did MosaicML, founded 2 years ago with each employee valued at $21 million, sell for $1.3 billion?

Recently, there has been a wave of investment and acquisitions in the field of AI. Global renowned company Salesforce has invested $450 million in Anthropic, while Runway has successfully raised $141 million in funding. In addition, Snowflake has announced the acquisition of Neeva, and Chinese domestic giant Meituan has acquired AI company Lightyear for $2.065 billion.

However, the most eye-catching deal is undoubtedly the acquisition of the startup company MosaicML. It is reported that MosaicML was acquired by the big data giant Databricks for about $1.3 billion, with its valuation multiplied by six in this transaction, making it the largest acquisition in the first half of this year. What supports MosaicML’s high valuation with only 2 years of establishment and more than 60 employees?

Databricks acquires MosaicML to accelerate the democratization of generative AI technology

Databricks has recently announced its acquisition of the generative AI startup company MosaicML for about $1.3 billion (about 9.3 billion yuan), to provide services for enterprises to build ChatGPT-like tools.

After the acquisition, MosaicML will become part of the Databricks Lakehouse platform, and its entire team and technology will be integrated into Databricks, providing enterprises with a unified platform to manage data assets and enabling them to build, own, and protect their own generative AI models using their proprietary data.

MosaicML is a very young generative AI company, founded in San Francisco in 2021, and has only disclosed one round of financing with only 62 employees. In the previous round of financing, its valuation was $220 million, which means that the valuation of MosaicML directly increased sixfold in this acquisition. This transaction is the largest acquisition announced in the generative AI field so far this year. Just recently, cloud computing giant Snowflake announced the acquisition of another generative AI company, Neeva. After a few months of investment frenzy, it seems that large enterprises are starting a wave of large-scale acquisitions of generative AI startups.

Databricks originated from UC Berkeley and participated in the development of the ALianGuaiche SLianGuairk project. As a data storage and analytics giant, it is valued at $31 billion as of 2022 and helps large companies such as AT&T, Shell, and Walgreens process data. Recently, it open-sourced its own large model Dolly, aiming to achieve similar effects to ChatGPT with fewer parameters. After the popularity of cloud computing, the “lakehouse” concept proposed by SLianGuairk has deeply influenced a group of big data startups. Since its establishment in 2013, Databricks has grown rapidly into the hottest Data Infra company in the world. Last year, Databricks’ annual revenue exceeded $1 billion, and after completing its latest round of financing in August 2021, its latest valuation reached $38 billion.

The advantages of MosaicML’s MPT series models

MosaicML’s MPT series models are subclassed from the HuggingFace PretrainedModel base class and are fully compatible with the HuggingFace ecosystem. The MPT-7B model is one of the most popular models of MosaicML, with billions of parameters and the ability to handle over 2,000 natural language processing tasks. The optimization layers of MPT-7B include FlashAttention and low-precision layer normalization, which make the model 2-7 times faster than traditional training methods. The near-linear scalability of resources ensures that models with billions of parameters can be trained in a matter of hours rather than days. MosaicML has also released a new commercially available open-source large language model, MPT-30B, with 30 billion parameters and performance surpassing GPT-3.

Data source: Evaluation of mainstream models by MT-Bench MosaicML

The advantage of the MPT series models lies in their efficiency and low cost. The complexity of artificial intelligence models trained with a large amount of data has risen sharply, and it now costs at least millions of dollars to train a model. Except for large companies, other small and medium-sized enterprises generally cannot afford it. However, MosaicML’s MPT series models enable enterprises to train their own language models at a lower cost and higher efficiency, making it easier to apply generative AI technology and achieve better business performance. Most open-source language models can only handle sequences with up to a few thousand tokens (see Figure 1). However, with the MosaicML platform and a single node of 8xA100-40GB, users can easily fine-tune MPT-7B to handle context lengths of up to 65k. The ability to adapt to such extreme context lengths comes from ALiBi, which is one of the key architectural choices in MPT-7B.

For example, the full text of “The Great Gatsby” is less than 68k tokens. In one test, the model StoryWriter read “The Great Gatsby” and generated a coda. One of the codas generated by the model is shown in Figure 2. StoryWriter read “The Great Gatsby” in about 20 seconds (about 150,000 words per minute). Due to the longer sequence length, its “typing” speed is slower than other MPT-7B models, about 105 words per minute. Although StoryWriter is fine-tuned with a context length of 65k, ALiBi allows the model to infer longer inputs than the training data: in the case of “The Great Gatsby”, up to 68k tokens, and in the test, up to 84k tokens.

Figure 2: MPT-7B-StoryWriter-65k+ wrote a coda for "The Great Gatsby". The result of the coda is to provide the full text of "The Great Gatsby" (about 68k tokens) as the input to the model, followed by the word "coda", and allow the model to continue generating.

The popularization of generative AI technology

Generative AI technology is a branch of artificial intelligence that uses a large amount of data and deep learning algorithms to automatically generate original text, images, computer code, and other content. The emergence of this technology has made it more convenient for people to process and analyze data, better serving human needs. With the rapid development of big data and artificial intelligence technology, generative AI technology has been widely used in natural language processing, image recognition, virtual reality, and other fields. For example, in the field of natural language processing, GPT-4 has become one of the most popular generative AI models, which can be used for tasks such as generating articles, translating languages, and answering questions. In the field of image recognition, StyleGAN2 can generate high-quality images, which can be used in game development, film and television production, virtual reality, and other fields.

MosaicML’s CEO Naveen Rao previously stated that since 2018, the complexity of AI models trained on large amounts of data has dramatically increased, and it now costs millions of dollars to train a model. Small and medium-sized enterprises, apart from large companies, generally cannot afford it. However, after the acquisition, the combined product of Databricks’ Lakehouse platform and MosaicML technology will enable enterprises to train and build generative AI models using their proprietary data in a simple, fast, and cost-effective manner. Users will have control and ownership of their data, allowing for customized AI model development. According to Databricks, with the platform and technical support of Databricks and MosaicML, the cost of training and using LLMs will be significantly reduced, estimated to be around thousands of dollars. This provides convenience for the popularization of generative AI.

The significance of Databricks’ acquisition of MosaicML

The main purpose of Databricks’ acquisition of MosaicML is to accelerate the development and democratization of generative AI technology. By integrating the technologies and resources of the two companies, Databricks can better meet customer needs and provide more efficient and convenient solutions. Specifically, the acquisition will bring the following changes:

1. More efficient large language models

After acquiring MosaicML, Databricks can integrate the MPT series models into its Lakehouse platform, providing customers with more efficient and cost-effective large language models. This will help enterprises better handle natural language processing tasks, improving business efficiency and accuracy.

2. Faster model training speed

MosaicML’s MPT series models have the characteristic of fast training, which will help Databricks provide faster model training services. This is particularly important for enterprises that need to respond quickly to market demands, enabling them to better meet customer needs.

3. Greater democratization

Databricks’ acquisition of MosaicML also means that the democratization of generative AI technology will be further enhanced. MosaicML’s MPT series models make it easier for small and medium-sized enterprises to train their own language models, enabling them to better apply generative AI technology and achieve better business performance. This will contribute to the development and application of generative AI technology, promoting the popularization and development of artificial intelligence technology.


Generative AI applications aim to generate original text, images, and computer code based on user’s natural language prompts. Since the launch of the online generative AI chatbot ChatGPT by AI startup OpenAI in November last year, interest in this technology has surged. “Every organization should be able to benefit from the AI revolution and have more control over how they use their data. Databricks and MosaicML have an incredible opportunity to democratize AI and make Lakehouse the best place to build generative AI,” said Ali Ghodsi, Co-founder and CEO of Databricks.

The significance of Databricks’ acquisition of MosaicML lies not only in accelerating the development and democratization of generative AI technology, but also in integrating the technologies and resources of the two companies to provide customers with more efficient and convenient solutions. With the rapid development and application of artificial intelligence technology, generative AI technology will play an increasingly important role. Databricks’ acquisition of MosaicML reflects the importance and investment of various enterprises in this direction. Companies like Anthropic and OpenAI license their existing language models to enterprises, which then build generative AI applications on top of them. The strong commercial demand for these models has created opportunities for startups like MosaicML. From the consecutive acquisitions by Snowflake and Databricks, we can see that large technology companies are gradually moving from independent research and strategic investment to the stage of mergers and acquisitions in generative AI technology.

Reference sources:

This Week in AI: Databricks’ Acquisition of MosaicML