Runway’s AI text-to-video generator was reportedly trained using data from thousands of YouTube videos and pirated films, as revealed in a report from 404 Media.
The training data spreadsheet obtained by the outlet contains links to YouTube channels of major entertainment companies such as Netflix, Disney, Nintendo, and Rockstar Games, as well as popular creators like MKBHD, Linus Tech Tips, and Sam Kolder.
Additionally, the spreadsheet includes links to channels owned by news outlets like The New Yorker, The Verge, Reuters, and Wired.
According to a former Runway employee speaking to 404 Media, the channels listed in the spreadsheet were part of a company-wide effort to source high-quality videos for building the AI model.
This data was then used as input for a large-scale web crawler that downloaded videos from these channels using proxies to circumvent Google’s blocking measures.
Runway, an AI startup, has secured significant funding from Google’s parent company Alphabet and Nvidia.
The company has developed impressive tools that enable users to create realistic AI videos and capture specific animation styles.
The latest tool, Gen-3 Alpha, launched in June, can generate videos in any desired style. Like other AI models, Gen-3 Alpha requires a wide range of content for training.
In addition to YouTube channels, 404 Media discovered that Runway’s dataset includes links to piracy sites such as KissCartoon, which offers free access to anime and other animated content.
It remains uncertain whether all the videos in this dataset were used to train the Gen-3 Alpha model, and this may never be clarified.
In a June interview with TechCrunch, Runway co-founder Anastasis Germanidis mentioned that the company uses “curated, internal datasets” for model training, but did not elaborate further.
When asked for a response, Google referred The Verge to a statement from YouTube CEO Neal Mohan, who stated in April that using AI to train on the platform’s videos goes against its policies.
Runway is not the sole AI company with its AI training data associated with YouTube. Earlier this year, OpenAI CTO Mira Murati mentioned uncertainty about whether the company’s text-to-video generator, Sora, was trained using YouTube.
Additionally, a recent report from Proof News and Wired revealed that Anthropic, Apple, Nvidia, and Salesforce utilized over 170,000 YouTube videos to train their AI models.
Relevant Stories You May Like