Apple has come forward to address its use of OpenELM, which was trained on the contentious Pile data, following a report that many companies utilized YouTube video transcription data to train their AIs.
After learning that the company behind Pile, EleutherAI, may have used the YouTube Subtitles data set, Apple revealed to reporters and clarified its position and future plans. This action appears to contradict the data use policies of the popular video platform.
Apple emphasized its dedication to the rights of creators and publishers, stating that it provides websites with the option to opt out of their data being used to train Apple Intelligence. This new feature was introduced during WWDC 2024 and is set to be included in iOS 18.
The company also confirmed that it utilizes high-quality data, including licensed data from publishers, stock images, and some publicly available data from the web, to train its models, including those for Apple Intelligence.
It is uncertain whether YouTube’s transcription data is completely hidden from view, as it is not intended to be a public resource.
Apple also develops research models, and OpenELM is essentially a tool for gaining more knowledge about language models.
In a paper on OpenELM, researchers mentioned that they trained it on Pile data. However, Apple states that OpenELM is solely for research purposes and is not utilized to power AI features in any Apple devices, including the top iPhones, iPads, and Macs.
Additionally, it seems that OpenELM’s time in the spotlight is coming to an end, as Apple has confirmed that they have no plans to create future versions of the model.
While this may provide some comfort to YouTube creators, including tech reporters, whose data was obtained for Pile and utilized in models like Apple’s OpenELM, it does not address the issue that EleutherAI allegedly scraped the data without permission from YouTube or the creators before providing it to companies such as Apple.
Stories You May Like