We reported back in February on a deal Reddit had done with a then unnamed AI outfit to open up access to user posts for AI training in return for some $60 million annually. That deal turned out to be with Google, but now Reddit has inked another arrangement with OpenAI.
In a blog post, Reddit outlined the deal but did not disclose how much money was involved. However, it seems there may be differences between the Google and OpenAI deals. With Google, there were explicit references to AI training.
But the OpenAI deal talks about allowing access to "real-time, structured, and unique content from Reddit. This will enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics."
That implies OpenAI's models will be able to reference, link to or quote Reddit posts, but it does not seem to explicitly include training future versions of OpenAI's ChatGPT models on Reddit posts.
In return, Reddit with be able to bring "new AI-powered features to redditors and mods."
For what it's worth, OpenAI posted exactly the same statement on its website. Well, save for one additional comment at the bottom in smaller font.
"OpenAI Disclosure: Sam Altman is a shareholder in Reddit. This partnership was led by OpenAI’s COO and approved by its independent Board of Directors."
Well, of course Altman is a shareholder in Reddit. It simply wouldn't be compatible with the broader sense of corporate and technological hegemony if the leading figure in AI didn't also have a stake in key social media.
Anyway, it's not hard to see how deals like this make sense. The training value is obvious enough. There's only so much historical content publicly available on the internet. That's already been scraped by existing models and the various AI outfits are said to be running out of new content with which to train their models.
Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.
Indeed, so desperate for new input data was OpenAI that is reported to have transcribed millions of hours of YouTube audio into text to help train ChatGPT 4.
At the same time, the inability of many LLMs to access up to date or live information limits their utility. With many models, it's as if the world stopped at some arbitrary cut off date a year or two hither.
Moreover, whatever you think about giving access specifically to Reddit posts for AI models, be that for training or reference purposes, deals like this at least add to transparency. If you post on Reddit, you can't be in any doubt what your posts will be used for.
Likewise, these Reddit deals with two of the biggest players in AI demonstrate that the latter are willing to pay for access to content instead of just using it all without permission.
Whether they'll be as up front about all the data they access or if it's more a case of they'll pay for the content when its owners are likely to be proactive and litigious remains to be seen. If you want a broader take on how AI is impacting online content, and it's not pretty, check out Jacob's new wide-ranging overview.