As generative AI becomes more prevalent, distinguishing reality from artificial creations is increasingly challenging. AI tools now produce highly sophisticated images, videos, and text based on prompts, reflecting their advanced capabilities.
This has led to ongoing disputes between publishers and the developers of these AI tools over issues of copyright infringement.
Although OpenAI CEO Sam Altman acknowledges that creating tools like ChatGPT inevitably involves using copyrighted content, copyright law does not currently ban using such content for training AI models.
A study conducted by Amazon Web Services (AWS) researchers indicates that 57% of online content is either AI-generated or translated using AI algorithms.
Researchers from Cambridge and Oxford warn that the proliferation of AI-generated content and the heavy reliance on the same sources can lead to a decline in the quality of responses to queries.
The study found that AI-generated responses deteriorated in both value and accuracy with each attempt. Dr. Ilia Shumailov from the University of Oxford commented:
“It is surprising how fast model collapse kicks in and how elusive it can be. At first, it affects minority data—data that is badly represented. It then affects diversity of the outputs and the variance reduces. Sometimes, you observe small improvement for the majority data, that hides away the degradation in performance on minority data. Model collapse can have serious consequences.”
Researchers attribute the decline in chatbot response quality to a cyclical overload of AI-generated content. AI models rely on internet information for training, so if this information is itself AI-generated and inaccurate, the training process becomes flawed. This leads to the production of incorrect answers and the spread of misinformation.
The researchers chose to investigate further to identify the underlying cause of the problem. They found that the rise in AI-generated articles being published online without proper fact-checking was a significant factor.
To explore this, the team utilized a pre-trained AI-powered wiki, training the tool with its own outputs. They quickly observed a noticeable drop in the quality of the information produced by the tool.
The study also reveals that the AI tool consistently omitted rare dog breeds from its knowledge base after processing multiple data sets, even though it was initially trained on a comprehensive library of information about dog breeds. This suggests that as AI becomes more prevalent and the volume of AI-generated content increases, the quality of search results is likely to decline.
Other Relevant Stories For You