Will AI ruin AI? The world after the emergence of chat GPT and the story of "digital pollution"
Hello everyone! I'm John, and I'll be explaining the fun of AI technology in an easy-to-understand way. Recently, "generative AI (AI that automatically creates text and images)" such as Chat GPT has been rapidly entering our daily lives. It's really useful and surprising, answering questions and thinking up sentences.
But did you know that experts are whispering that this rapid development of AI technology may actually be an unexpected "pitfall" for future AI development? Just as the world's first atomic bomb test had a major impact on the world afterwards, for better or worse, and brought about irreversible changes, the emergence of Chat GPT may have brought about major and irreversible changes in our digital world.
Today, let's take a look at this slightly scary but important issue of "AI polluting the digital world."
What exactly is "digital pollution"?
First, let's briefly review how AI becomes smart. AI, especially large-scale language models like Chat GPT (AI that learns by reading a lot of text), learn by "reading" huge amounts of text data on the Internet, such as blog articles, news, novels, etc. It's similar to how humans acquire knowledge by reading a lot of books.
However, since the emergence of generative AI such as Chat GPT, the number of AI-generated sentences and information on the Internet has been increasing. This in itself may be a good thing in the sense that the amount of useful information is increasing. However, the problem begins here.
if,The new AI of the future will be able to learn not only information created by humans, but also information created by AI.So what happens then?
- Information created by AI may still contain errors or lack nuance.
- Furthermore, when an AI learns from sentences it has created, the content may gradually become biased or the quality may decline.
To put it in perspective, it's like clear water (high-quality information created by humans) gradually being mixed with muddy water (information created by AI, some of which is unreliable). It may be fine at first, but gradually the water as a whole becomes muddy... this is the phenomenon known as "digital pollution."
Researchers call this AI-generated data flooding the internet "data pollution" or "model pollution," and they worry that it could be a major obstacle to future AI development.
The scary future of "model collapse"
If an AI continues to learn only from data it has created (sometimes called "synthetic data").Model collapseIt is said that a phenomenon called "
This is a state in which the more an AI learns, the less intelligent it becomes. It's like how the letters on a document become faded and difficult to read after being copied over and over again. The text generated by an AI may gradually become incomprehensible or start repeating the same thing over and over again.
If something like this really were to happen, the evolution of AI would not only halt but may even regress. It's a bit ironic that just as we've seen such a useful AI emerge, this very AI would end up hindering the growth of future AI.
Lessons learned from "low background steel"
There is an interesting analogy to consider when thinking about this issue.Low background steelThis is a story about a special iron.
"Low background steel" refers to iron that has been barely affected by radioactive materials released into the atmosphere by nuclear tests since 1945. Examples of this include iron made before nuclear tests began and iron salvaged from old sunken ships.
Why is this iron so important? In fact, iron with extremely low levels of radioactivity is needed to make very precise measuring devices such as Geiger counters (machines that measure radiation levels). Modern steelmaking methods inevitably result in trace amounts of radioactive material from the atmosphere being mixed in, so this "uncontaminated" old iron is treated as extremely valuable.
The same thing may happen in the world of AI.
In other words, before it was "contaminated" with AI-generated content.Purely human-generated dataHowever, it could become a very valuable "digital version of low background steel" for training future AI. It may be important for future AI development to store high-quality human-created data now.
What should we do?
This "digital pollution" problem is still in its infancy, and no clear solution is yet in sight, but there are some directions being considered.
- Ensuring high-quality human-driven datasets:Research institutions and companies should work together to develop and preserve large, trusted datasets that are untainted by AI-generated content.
- AI-generated content identification technology:Develop technology to identify text and images created by AI and either exclude them from training data or handle them with caution.
- Developing new ways of learning:Researching new AI learning methods that are less affected by synthetic data or that can make good use of synthetic data.
There may not be much that each of us can do as individuals, but even just being a little more conscious about how we receive information, such as not blindly accepting information created by AI and getting into the habit of checking the source of the information, may make a difference.
A word from John
The emergence of Chat GPT is truly groundbreaking and has the potential to greatly change the way we live and work. However, any powerful technology has its light and shadows. This talk on "digital pollution" has given me a new opportunity to think about how we should deal with AI technology.
In order for AI to continue to be a truly useful partner to us, I think it's important that we pay attention to these issues and pool our wisdom together. I hope that the AI of the future will be smarter and enrich our lives even more, without being burdened by the "leftovers" of today's AI.
This article is based on the following original articles and is summarized from the author's perspective:
The launch of ChatGPT polluted the world forever, like the
first atomic weapons tests