OpenAI and Midjourney to Use Tumblr and WordPress Posts for AI Training

Tumblr and WordPress are reportedly preparing to sell user data to artificial intelligence companies OpenAI and Midjourney. According to 404 Media, Automattic, the parent company of both platforms, is close to finalizing an agreement to supply data for training AI models.

While the precise data involved remains unclear, internal communications from Tumblr's product manager, Cyle Gage, indicate that Automattic may have initially intended to include sensitive information. This potentially includes private posts from public blogs, deleted or suspended blogs, unanswered questions, private messages, explicit content, and material from premium partner blogs. Automattic’s engineering team is reportedly creating a list of post IDs that should be excluded from the agreement.

In response to inquiries about the report, Automattic issued a statement claiming, “We will share only public content that’s hosted on WordPress.com and Tumblr from sites that haven’t opted out.” The company emphasized that current legal regulations do not mandate AI companies to respect user opt-out preferences.

Automattic's statement suggests alignment with their reported data-sharing plans: “We are also working directly with select AI companies as long as their plans resonate with our community’s values: attribution, opt-outs, and control.” The company confirmed its commitment to respecting all opt-out requests and noted plans to regularly inform partners about users who choose to opt out, seeking the removal of their content from both past datasets and future training.

Additionally, Automattic plans to introduce a new opt-out tool that will allow users to block third parties, including AI companies, from utilizing their data. An internal FAQ suggests, “If you opt out from the start, we will block crawlers from accessing your content by adding your site to a disallowed list. If you change your mind later, we will advocate for the removal of content from past sources and future training.”

Despite these assurances, phrasing such as “asking” AI companies to comply raises questions about the effectiveness of these measures. Automattic’s AI lead, Andrew Spittle, has indicated that they will keep partners updated on users who opt out, with hopes that the AI companies will cooperate based on previous discussions.

The emergence of AI data training deals presents new revenue opportunities for online platforms navigating a challenging digital landscape. For instance, Google recently formed a partnership with Reddit to leverage its extensive user-generated content for AI training.

Most people like

Find AI tools in YBX