Meta Used Pirated Book Dataset to Train Llama AI

TRENDING GLOBALLY

-: FOLLOW US :-  @theinsaneapp

Rightsholders, including record labels, authors, and visual artists, have filed lawsuits against AI companies, alleging the unauthorized use of their work without proper compensation.

-: FOLLOW US :-  @theinsaneapp

Lawsuits against tech companies, such as Meta and OpenAI, involve the use of the controversial Books3 dataset, 

-: FOLLOW US :-  @theinsaneapp

Which has a piracy angle and was created by AI researcher Shawn Presser in 2020 by scraping the library of the 'pirate' site Bibliotik.

-: FOLLOW US :-  @theinsaneapp

The Books3 dataset, over 37GB in size, aimed to assist AI enthusiasts in building better models and spur innovation.

-: FOLLOW US :-  @theinsaneapp

Large tech companies, including Meta, used the Books3 dataset to improve their language models, contributing to the mainstream AI boom.

-: FOLLOW US :-  @theinsaneapp

The dataset was freely available for years, aiding AI researchers globally, but it gained attention from rightsholders when the AI boom reached the mainstream, leading to retaliatory actions.

-: FOLLOW US :-  @theinsaneapp

Rights Alliance demanded the removal of Books3 from The Eye and AI company Huggingface due to reported copyright infringement.

-: FOLLOW US :-  @theinsaneapp

Meta, in response to a lawsuit, admits using portions of the Books3 dataset to train its Llama AI model but denies copyright infringement and claims of using copyrighted works without permission.

-: FOLLOW US :-  @theinsaneapp

Meta asserts that consent, credit, or compensation is not necessarily required for using copyrighted works to train AI models.

-: FOLLOW US :-  @theinsaneapp

Authors allege their works are referred to as "infringed works" in the Books3 database, but Meta denies infringing copyrights.

-: FOLLOW US :-  @theinsaneapp

Meta plans to rely on a fair use defense, stating that any unauthorized copies constitute fair use under the law.