GPT-5 is coming? What dramatic changes will happen in the AI industry

GPT-5 Dramatic Changes in the AI Industry

Since the birth of chatGPT, AI has been evolving rapidly on a monthly basis. The abundance of models and the speed of iteration have made many people realize that humanity seems to be standing on the edge of the AGI door.

Recently, a document disclosed by the United States Patent and Trademark Office (USPTO) reveals that OpenAI submitted a trademark application for “GPT-5” on July 18th, and it has been accepted.

Screenshot of the USPTO document

Although in the first half of this year, various AI experts and scholars have jointly published open letters, calling for attention to the potential risks of generative AI, and OpenAI also announced that there are no plans to train GPT-5 in the short term.

However, the temptation of technology ultimately led humanity to break the boundaries of taboos.

In this disclosed application, OpenAI mentioned that the unreleased GPT-5 will have many capabilities that GPT-4 does not have, and almost every aspect aims at AGI.

Screenshot of the USPTO document

So, what does this change mean for AI and humanity?

Today, this article will attempt to analyze the potential functions, changes, and impacts of GPT-5 based on the limited information disclosed in OpenAI’s application document.

01 The Road to AGI

In the document disclosed this time, the first change mentioned by OpenAI is the enhancement of multimodal capabilities.

Specifically, the functions of GPT-5 include translating text or speech from one language to another, speech recognition, generating text and speech, etc.

Although in the current GPT-4, users can also achieve translation between different languages, since the translation function is singled out here, it must have been reoptimized.

So why does OpenAI highlight the translation capability of GPT-5?

Perhaps this is because one of the prerequisites for GPT to become universal is to minimize the cost gap of using large models in different languages as much as possible.

Previously, research results from the University of Oxford showed that due to the different cost measurement and billing methods adopted by services such as OpenAI, the cost of English input and output is much lower than that of other languages.

The cost of Simplified Chinese is about twice that of English, Spanish is 1.5 times that of English, and Shan language in Myanmar is 15 times that of English.

Because languages like Chinese have different and more complex structures, they require a higher rate of tokenization.

For example, according to OpenAI’s GPT-3 tokenizer, the tokenization of “your affection” only requires two tokens in English, but it requires eight tokens in Simplified Chinese.

This means that using and training models in languages other than English is much more expensive.

Once the “language barrier” is crossed, it will undoubtedly directly remove the generalization barrier in front of GPT.

In addition, the highlighted speech recognition function in the document seems to be a minor change, but to some extent, it is also another stepping stone laid by OpenAI for GPT-5 on the road to AGI.

As we all know, in the future development direction of large models, the marginalization and terminalization of models have become an increasingly obvious trend.

Since July this year, after Qualcomm released a 1 billion parameter large model that can run on mobile phones, manufacturers such as Honor and Apple have also announced plans to launch their own “large model” phones.

Starting from mobile phones, future AI data will be processed more and more on the terminal side, such as cameras, sensors, and autonomous driving.

In such application scenarios, speech recognition is undoubtedly more convenient and efficient.

For example, AI language models can allow drivers to control the vehicle through voice commands. Convert the driver’s voice commands into executable commands, such as start, stop, accelerate, brake, and other operations.

Similarly, intelligent assistants that exist in mobile phone systems like Siri will also prioritize control through voice commands.

Therefore, speech recognition is not just an added bonus, but the “standard configuration” for GPT-5 to enter the terminal side.

Through sinking into these terminal devices, GPT-5 will also obtain more marginalized and non-linguistic data structures.

After all, in the development of large models so far, the amount of text data that can be absorbed has reached its limit. In order to take another step on the road to AGI, this “non-text” data is crucial.

02 Challenge Expert Models

In addition to the above characteristics, the document submitted by OpenAI also mentions: “GPT-5 may also have the ability to learn, analyze, classify, and respond to data.”

Based on the current development trend of artificial intelligence, this is likely to refer to GPT-5 having the ability to actively learn like an intelligent agent.

Such an ability will make a fundamental difference between GPT-5 and models that can only passively learn new knowledge by being fed data by humans.

Specifically, the ability to actively learn means that the model can independently select, acquire, and process data based on its own goals and needs, rather than relying solely on human-provided data.

This allows the model to more effectively utilize the information and knowledge in the data, adapt more flexibly to different data environments and task scenarios, and not just passively receive and output data.

And such an ability becomes particularly important when GPT-5 faces unfamiliar and vertical fields.

Some specific fields, such as medicine, law, finance, etc., usually have their own specific terminology, rules, and knowledge systems, which may be difficult for ordinary language models to understand and process.

If GPT-5 has the ability to learn actively, it can automatically collect and update relevant data in these fields from the Internet, analyze and classify the basic concepts, important principles, and latest trends in these fields, as well as respond to common questions, typical cases, and practical applications in these fields.

In this way, GPT-5 can more quickly master the professional knowledge in these fields and complete the corresponding tasks in a more accurate and efficient manner.

And all of this is the key to its progress towards a true general model.

Because if GPT always needs to access specific “expert models” to solve professional tasks, it cannot be truly “general”.

Because this will result in differences and dependencies in GPT’s intelligent abilities in different fields and scenarios, and it will also increase the communication and coordination costs between GPT and “expert models”, and cannot guarantee high-quality services in all situations.

Earlier, foreign media Semianalysis revealed the secrets of GPT-4, which was released in March this year, exposing that OpenAI used a mixture of expert models to build GPT-4.

According to the leaks, GPT-4 uses 16 mixture of expert models, each with 111 billion parameters, and each forward pass routes through two expert models.

However, more expert models mean more difficult generalization and convergence.

This is because each expert model has its own parameters and strategies, which are often difficult to coordinate, making it difficult for GPT to balance and “take into account the overall situation”.

With the ability to learn actively, GPT-5 will be able to utilize multimodal understanding and reasoning abilities, as well as knowledge graphs and databases, to analyze and understand the acquired data and associate and summarize the relevant data through clustering algorithms and classifiers.

In this way, GPT-5 can effectively utilize the information and knowledge in the data according to different data environments and task scenarios.

03 Replacing More Jobs

As mentioned earlier, after overcoming language barriers and entering the terminal side with convenient speech recognition capabilities, GPT-5 will continuously acquire knowledge in different scenarios, fields, and modalities through continuous active learning, and thus rapidly advance towards the path of AGI.

It can be foreseen that when GPT-5, with such powerful “generality”, begins to spread to various fields, except for a few industries with data barriers (such as healthcare), most vertical fields’ large models will gradually fade away.

Because, after all, a considerable number of experts or vertical large models are essentially products of certain enterprises’ insufficient computing power and data, unable to climb the heights of “general large models” and can only settle for second best (this is particularly evident in China).

If a general large model, with its powerful learning ability, can master most industries, who would be willing to tediously switch between different models and bear the costs of training and using different models?

From this perspective, the gradual replacement of expert models by general models is an inevitable historical process on the path to AGI.

Another accompanying phenomenon is the replacement of more specialized and trivial work.

Because with more powerful general models, people will find that the work content of many positions can be merged and consolidated.

Product managers and data analysts are a possible example.

For example, in a project for developing a new product, GPT-5 can search for relevant market research, competitor analysis, user profiles, and other data from the internet based on the given product concept or requirements and download them into its memory.

Afterwards, it will analyze and understand the acquired data using its multimodal understanding, logical reasoning abilities, as well as knowledge graphs and databases.

After obtaining the corresponding data and organizing it into categories, GPT-5 will use its language understanding ability to learn relevant marketing strategies, user feedback, and other information from the feedback of the dialogue system, and compare and evaluate them against the given product concept or requirements.

In this way, the positions of product managers and data analysts are efficiently “merged”.

And on the unfinished road to AGI, there are countless other positions that are being merged and replaced in this way.

Therefore, a more general-purpose GPT-5 is both a blessing for human productivity progress and a prelude to a major earthquake in the industry.

By then, many companies that do not yet possess the capabilities of general large models and lack industry barriers will collapse like castles made of sand.

And more ordinary individuals, faced with positions that are constantly being replaced, will deeply feel the uncertainty of the times…