Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OpenAI's models were trained on ebooks from a private ebook torrent tracker leeched en-mass during a free leech event by people who hated private torrent trackers and wanted to destroy their "economy."

The books were all in epub format, converted, cleaned to plain text, and hosted on a public data hoarder site.



Have you got some support for this claim?

There's a lot of wild claims about, so while this is plausible it would be great if there were some evidence backing it.


NYT claims that OpenAI trained on their material. They argue for copyright violation, although I think another argument might be breach of TOS in scraping the material from their website or archive.

The complaint filing has some references to some of the other training material used by OpenAI, but I didn't dig deeply in to what all of it was:

https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...


What's that got to do with this books claim?


Relevant similar behavior.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: