mastodon.online is one of the many independent Mastodon servers you can use to participate in the fediverse.
A newer server operated by the Mastodon gGmbH non-profit

Server stats:

11K
active users

Bluesky To Sell Your Content To AI Data Miners

So it begins. Hidden in Jay Graber's recent charm offensive is this innocuously framed initiative: Bluesky is weighing a proposal that gives users consent over how their data is used for AI (techcrunch.com/2025/03/10/blue)

Not so fast.

1) Shows they are planning on doing content deals with AI companies.
2) Seems like it is Opt-out vs. Opt-in (see below).
3) It is just a voluntary robots.txt file

h/t @Lydie tech.lgbt/@Lydie/1141490233448

more...

TechCrunch · Bluesky is weighing a proposal that gives users consent over how their data is used for AI | TechCrunch
More from Sarah Perez 💙
Mastodon Migration

@Lydie

Let's get into this. is bleeding money and selling your data is the best way they have of "monetizing" you. So why not frame it as a "voluntary" initiative?

Thing is, seems like it will be opt-out. See this github 'proposal': github.com/bluesky-social/prop

"Suppose a Bluesky user does not want any of their public data to be used for generative AI training. They would go in to app settings, find the data reuse preferences section, and configure “Generative AI” to “disallow”.

more...

Bluesky proposal discussions. Contribute to bluesky-social/proposals development by creating an account on GitHub.
GitHubproposals/0008-user-intents at main · bluesky-social/proposalsBluesky proposal discussions. Contribute to bluesky-social/proposals development by creating an account on GitHub.

@Lydie

Opt-out vs. Opt-in is a crucial thing. 90% of users never change the default setting.

Oh, and when we dig into the github doc, it is not just AI.

"The initial categories described here include:
generative AI
protocol bridging
bulk datasets
public archiving and preservation"

What are these "bulk datasets" that Bluesky would be selling?

Just more 'distributed' Bluesky trickery. And if you don't like it, Jay Graber says you can "fork off": mastodon.online/@mastodonmigra

@Lydie

Let's just try one more thought exercise. Let's say Eugen Rochko told tech media that all content on mastodon.social was going to be sold to AI scapers and "bulk dataset" brokers, but you had the ability to "opt-out" by checking a box that would insert the robots.txt header.

Can you imagine?

And yet, Jay Graber's announcement flies under the radar. This is what happens when you've constructed a cult of personality around your enshittification. Enough of the gaslighting.

@Lydie

In the comments below some defenders of this opt-out AI scraping change say something like, "It just gives users more control over their content." Like it is a good thing. This is baloney.

You don't need to put a sign on your car saying it is not okay to break into this car. It's your car.

This "control of your own data" argument is nonsense. You have control of your own data, it's yours. All you can be tricked into doing is giving it away.

@Lydie

To be more specific... (Thanks @mackuba)

"Intent preferences would be tri-state: explicitly allow, explicitly disallow, or undefined"

"Realistically, a large majority of users may stick with the default "undeclared" state. In that situation, downstream projects will need to make their own policy decisions around whether content re-use is acceptable"

Which seems even worse, as is signals intent and absolution of any responsibility.

@mastodonmigration

hm, I have a bridge to bluesky, so I guess I have to do the same setting on my bridged account too?

@Lydie

@di0v0n @Lydie

Great point. If Bluesky does a deal with AI data scrapers they are going to hoover up all the fedi bridged content too no doubt.

@mastodonmigration @di0v0n @Lydie came here to say similar. As enabling the bridge doesn't give us an actual bsky account, I don't see where there could even be an opt-out option that we would be able to access.

@mastodonmigration @Lydie As effective as taking a shower in public and placing "please respect my privacy - do not look" sign.

But frankly, using services that inherently require resources and labour to be provided for free, without having an idea how they are being funded, requires either a considerable level of naïvity or utter indifference.

@mastodonmigration @Lydie

If Eugen Rochko did the same as #Bluesky and put the "Social." data up for sale, the users would move to another instance and the value of social would be 0 euros... Thanks for that #fediverse.

@mastodonmigration @Lydie If it were about giving users control, it would be opt-in. Anything less is simply a lie.

@mastodonmigration @Lydie One of the best ways to fuck wirh AI is to set up a bot that will be obvious to and only to humans, and feed the AI output back to the input.

Enough of this will give the AI the electronic equivalent of an LSD trip. Add war footage for an electronix K-hole (NOT the same as a pi-hole!) or other bad trip.

@mastodonmigration @Lydie It's all explained here below - so bulk datasets is about whether you allow sites like archive.org etc. to store your public posts forever (which they technically can now, but are not sure if they're allowed to):

@mackuba @Lydie

Sorry, but this doesn't seem to clarify anything.

@mastodonmigration @mackuba @Lydie yo creo que se veía venir ...hay muchos usuarios muy muy sospechosos que están allí a la espera de que algo así pase ...que raro ...ver algunos usuarios muy interesados en el Fediverso pero que aparentemente y solo aparentemente digo tuvieran más información de hacia donde se dirige la plataforma ...llevo hace poco allí y ya se siente un tufillo a "lo mismo".

@mastodonmigration @Lydie Also, opt-out means they have a window where they sell all your data before you can get to the screen to opt-out

@mastodonmigration @Lydie How would I opt out of having my Mastodon posts used for training AI? Is there even any way to know if someone has set up an instance, followed thousands of people, and is feeding all the posts into an AI?

Mastodon doesn’t allow you to opt out of your data used to train AI (although some instances have clauses in their terms to prohibit it). In fact if anybody from Threads is following you - or following somebody who boosts one of your posts - Threads privacy policy says your data can be used to train Meta’s AI and target ads.

So if Bluesky implements this, they’ll be providing more control than Mastodon does today. How people are getting from there to “they’re going to sell your data!!!!” is mysterious to me. Sure, opt-in would be better but Mastodon doesn’t whether have opt out!

@dogzilla @mastodonmigration @Lydie

@thenexusofprivacy @dogzilla @Lydie

Mastodon is defacto opt-out of your data being used to train AI, because no such rights are explicitly granted. The authorized uses are enumerated in the instance privacy policy and they do not include AI scaping.

Agree with you about the Threads problem and wrote about it extensively at the time.

Yes, selling content is an inference. What do you think the plan is, to simply give it away to AI scrapers? Not sure this would be better, and it makes no sense.

Yes I think Bluesky’s plan is very much to make it easy for AI scrapers and everybody else to access public dsta for free. They’ve said so repeatedly, their architecture is optimized for it, it fits in with their belief system, and they have plenty of other ways of making money. Of course they could change their minds, but adding robots.txt-like consent signals doesn’t matter that any easier or more likely.

As for the situation on Mastodon, I’m not sure what privacy lawyer told you that and how much time they had spent looking at your instance’s privacy policy, but you might want to get some other expert opinions before giving that advice to others.

@mastodonmigration @dogzilla @Lydie

@thenexusofprivacy @mastodonmigration @dogzilla @Lydie the problem with mastodon is the same as with all other "but it's free information on the internet" arguments of people training AI: There's no laws for it. And no laws doesn't mean it's legal or illegal, it means that legislation has to be made that will solve that question. And this might look very different when an instance doesn't make their local data public without an account, only through their instance, etc.

I certainly agree that we need to be legislation specifically around the use of data for AI training, and that it's a different situation for public data as opposed to data that's only accessible eith an account or via an API. Still, scraping data in violation of the terms of service isn't necessarily legal -- as Solove and Hartzog write in The Great Scrape, "Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others." Ulrike Hahn's Bridging to Bluesky: The open social web, consent, and GDPR look at the interactions between the ActivityPub Fediverse and Bluesky; the joint statement from a dozen data protection offices and Kieran Mcarthy's Web Scraping for Me, But Not for Thee look at scraping in general.

TL;DR summary: it's complex!

@mastodonmigration @leberschnitzel @dogzilla @Lydie

papers.ssrn.comThe Great Scrape: The Clash Between Scraping and PrivacyArtificial intelligence (AI) systems depend on massive quantities of data, often gathered by “scraping” – the automated extraction of large amounts of data from

@dogzilla @mastodonmigration @Lydie AI companies already do this illegally.

People have already posted screenshots of ChatGPT summarizing the content of other people's fediverse accounts.

@hisham_hm @dogzilla @Lydie

Of course they do. The issue under discussion is whether the platform gives them the authority to do so or not.

@mastodonmigration @hisham_hm @dogzilla @Lydie If we know it is done regardless of the permission to do so, then discussing which platforms allows it or not becomes a bit useless.

Taking a great stance against AI scrapping to no effect and having a deal with AI scrapper leads to the same consequences. We should look into that.

@mastodonmigration @dogzilla @Lydie Yes, of course. I'm definitely not saying "well, all is lost because they'll scrape your data anyway". Legal accountability is no joke.

@hisham_hm @mastodonmigration @Lydie Well, how many admins or users have a legal department to draw on? In theory that would restrain corporations or hackers, in practice it really doesn’t. It’s probably a line item in the business plan.

So I’m not sure that in practice BlueSky is that different from Masto, at least for this issue. I’m sure there’s plenty of other metrics

For me, I’ll never rely on a centrally-controlled presence again, but I’ll visit

@mastodonmigration @Lydie Under european data protection laws it must be opt out at least for european users - otherwise they open themself up to potentially catastrophique fines. (20M, up to 4% of global gross revenue)

So let’s see if they want to spend the time to implement a per country default.

LinkedIn did the same half a year ago but it was opt out in the USA, where it was an inventivized opt in for Europeans.