AI and neural networks

Ilon Musk believes humans have exhausted the data for AI training

Ilon Musk believes humans have exhausted the data for AI training

Ilon Musk, founder of xAI, said that the artificial intelligence industry has reached the limit of using data created by humans. He said «peak data» was reached in 2023, and future development of models will be impossible without moving to the use of synthetic data, that is, data created by the AI models themselves.

Ilon Musk said that the AI industry has reached the limit of using human-generated data.

These words echo statements made by Ilya Sutzkever, former chief scientist at OpenAI, at the NeurIPS conference in December 2024. Sutzkever also noted that the lack of real-world data requires a rethinking of how modern models are developed.

Sutzkever also noted that the lack of real-world data requires a rethinking of how modern models are developed.

The benefits of synthetic data

.

Synthetic data is already being used extensively by large companies such as Microsoft, OpenAI and Anthropic. Their popularity is due to a number of advantages. They allow the creation of virtually unlimited amounts of information, reducing development costs. For example, the Palmyra X 004 model from Writer was developed almost entirely on the basis of synthetic data and cost 700 thousand dollars. By comparison, a similar OpenAI model cost about 4.6 million dollars.

A similar model from OpenAI cost about 4.6 million dollars.

Some of the most advanced models, including Microsoft’s Phi-4, Google’s Gemma, and Anthropic’s Claude 3.5 Sonnet, were built using mixed data sources combining real and synthetic data. Gartner predicts that by 2024, 60 percent of all data used for AI and analytics projects will be synthetic.

Synthetic data is the most common source of data used for AI and analytics projects.

Ilon Musk xAI

Problems and Challenges

The move to synthetic data comes with risks, however. Research shows that over-reliance on such data can lead to deterioration in the functionality of models. This manifests itself in reduced creativity and increased inference bias. If the input data on which the synthetic material is based contains errors or limitations, these problems can be amplified and propagate to the models’ outputs.

In addition, models trained on synthetic data may become less adaptive and lose the ability to generate original solutions.

The Future of Artificial Intelligence Development

Despite the challenges, synthetic data is opening up new perspectives for the development of artificial intelligence. Companies are striving to find a balance between efficiency and quality to minimize risk and maintain high standards of model performance.

At the same time, companies are seeking to find a balance between efficiency and quality to minimize risk and maintain high standards of model performance.

Mask believes that the transition to synthetic data is inevitable, and emphasizes the importance of quality control and the introduction of new approaches to AI training. This trend could be a key step in the evolution of AI technology.

Musk believes the transition to synthetic data is inevitable and emphasizes the importance of quality control and new approaches to AI training.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

You may also like