Employees at Nvidia allegedly used video and movie libraries sourced from sites like YouTube, Netflix and others without obtaining permission from copyright holders, 404 Media reported this week.
The company reportedly operated 20 to 30 virtual servers on Amazon's cloud platform to download videos, consuming footage equivalent to 80 years of viewing every day.
This vast collection of videos was used to train artificial intelligence models for video applications in products like Omniverse (Nvidia's virtual reality product), autonomous driving systems and digital twin technologies. The project, known internally as Cosmos, was revealed through leaked communications and screenshots from Slack groups managing the project, shared by a former employee.
The goal of the project was to develop an AI video model similar to OpenAI's Sora, which was also caught harvesting YouTube videos for the same purpose. However, Nvidia's efforts extended beyond YouTube, utilizing content from Netflix and other services as well. Employees reportedly expressed skepticism about the legality of the project and asked their managers to verify its legitimacy. They were told that the project had been approved at the highest levels of the company.
Nvidia responded to the allegations by claiming that using the videos to train its model fell under "fair use." However, Google pointed out that using its data to train a video model constitutes a "violation of its terms of service," as seen in OpenAI's case. Netflix also confirmed that its terms of service do not allow "data scraping."
This incident highlights the tech industry's sense of invulnerability, with its ethos of moving fast and not asking permission often leading to problems that could have been avoided with more responsible conduct. Additionally, it demonstrates the massive amount of data required to train AI models, suggesting that similar stories are likely to recur.
Content owners, who invest time, money and creativity into their works, understandably feel outraged when tech companies use their content to advance their products without permission. Given Nvidia's market value has already surpassed that of Apple, one wonders why they didn't simply pay for the content.