Protecting your online writing
Many platforms like Substack and LinkedIn are quietly using your writing to train LLMs. Learn how to take back control.
A lot of platforms rely on you opting out of allowing bots to scrape your data for AI Large Language Model training. But they don't advertise this. And if you do opt out - it isn't retrospective.
This article is about how to protect your writing so that it brings you most benefit, and is used in the way you want.
I've written a few posts on SEO skills for writers. Alongside these skills, it's important to put the writer at the centre of things. In this post I cover how can we protect our work and stop it being used for AI training without our consent.
In this post:
- How web search is changing
- What it feels like when the SEO works
- The implications of AI search results for writers
- Placing the writer at the centre
- Ways to protect your online writing
- The Takeaway
How web search is changing
I've read about how web search is changing and obviously see it in action every day on Google as do most people. It delivers an AI response for most queries. So currently you still see the classic view - a list of posts - and also an AI response which Gemini seems to write itself.
There are also new search engine companies like Perplexity which delivers AI first results, with the links as part of the the text.
So writing on the web is changing, if people are writing to be discovered in search now, they have to take this into account. That is, if this is the type of writing that they want to do - content they want to be discovered in SERPs... a search engine page result.
So I'd been working on a blog post for my self-catering cottage on Lewis (Macleod Cottage - discounted stays for writers in low season). But that's by the by.
I wrote a longish post about Viking sites on the Isle of Lewis and submitted it to Google Search Console. It was an updated post so Google updated the indexing pretty quickly, in a few minutes.
What it feels like when the SEO works
I'd followed Google's advice in the post of adding H2 headings for each section and jump-links and thinking about keywords and making it active... oh!... that's right, I had an idea about that, I'll tell you later. This helps text appear in the AI search result.
So I did all of that and then on the same day... the same day... I did an incognito search for what I had written, which was pretty specialised and specific, so I had a better chance to find it.
And Google delivered an AI result. And tt was very strange. It was the text I had written myself, but altered. Like a letter I had posted from a different version of myself from a (very slightly) alternate Universe. Like I was in a hurry because a blue Josh Brolin was telling me to get a move on because he had things to do.
It had taken what I'd written and reworked it and delivered it to me. This was indeed my aim, to have the blog post be discoverable. But I'd not encountered the whole process before. And I had no idea that a piece of writing could be ingested into a search engine and then used within a matter of hours.
I asked Perplexity and it did the same thing:
MacLeod Cottage has a blog post titled “The Vikings in Ness” that highlights several Viking-related places and themes in the far north of Lewis. It is written from the perspective of a local accommodation provider etc...
The implications of AI search results for writers
And yes, that was the aim. And I do write about how writers can improve their SEO. But it did make me think a little. It's an interesting moment of actually feeling the statement - if you're not paying for something, then you're the product.
Say I didn't want this text online in future, would Google still have it? I've never read the small print and so I imagine they say, we can do whatever we want with what you give us. It's a world where copyright doesn't exist, just a Faustian pact. Just going down to the crossroads.
Now, I had got the result I wanted.
It did make me think twice about posting any fiction online. And it did make me feel a little like a potato in a bucket. I asked some more questions and it just made up things, just wee potato based hallucinations, but it said things like.. the owners of the cottage believe this...
There is another search engine called kagi.com - interestingly, they don't do advertising, they charge a subscription instead. I haven't unpacked what that means for writers. Will the best human writing rise to the top? Will people find your work more easily as there are no sponsored links to get through. Or will their AI answers just take your work, repurpose it and use it. You would just be the ghost in the machine.
Placing the writer at the centre
So what is the Endgame for it all, I wonder. AI has ingested pretty much all that it can. All of the internet is just a part of that. And it is now being trained on AI generated material, which apparently isn't quite so good and well yes of course it isn't. Dante was a character from Wind in the Willows and all that. But now we add to it every day voluntarily, shovelling our writing into it so we can get customers to find us on the web.
And if you've watched the documentary about Deep Mind called The Thinking Game (which I really advise) we're in an interesting situation. Deep Mind is learning Pacman one day and then beating chess grandmasters in no time. But this is the interesting thing. When anyone can create text on any subject, on any length, what becomes valuable? Human writing does.
Google are pretty clear now that they don't just want to be ranking AI generated writing - some people call it AI slop. They want EEAT - Experience - Expertise - Authoritativeness - Trustworthiness. (Sidenote - don't they mean 'authority'? Nevertheless they have gone for a word with seventeen letters instead.)
And they're also pretty clear in that the human input we're providing improves their search results, and they reward us by giving us more impressions.
There are ways t0 protect your writing. Here are some resources and one of the best ways is using a paywall. But not all paywalls are equal...
Ways to protect your online writing
In one post, I wrote about how writing to the algorithm is changing what we write.I also started thinking more when I wrote this article about the impact on writers. Many writers are finding it harder to make a living, according to multiple surveys.
Here is an article sent to me by Martin at AllSimple hosting called 'How to Stop AI from scraping your website'
The value of original content is growing. Case in point: Google reportedly pays Reddit $60 million a year to license their user-generated content. But as you read this article, AI crawlers are silently scanning websites across the internet, harvesting their content to train large language models (LLMs) and power AI-driven services.
Here are some strategies to protect yourself from this. Some of the them won't work, of course. Look how Meta used pirated books from the shadow library LibGen, to train their AI model.
Use a paywall for content
A paywall doesn't show text you want hidden ie text which is just for subscribers, However, it is worth looking into what kind of paywall you have.
Some platforms just use a bit of code to hide the text from the reader who isn't a subscriber. However, this doesn't put a bot up or down. They can still see it.
Ghost and Substack have good paywalls, apparently.
Using your robots.txt file to block some AI bots
Your site will have a robots.txt file, although Ghost Pro takes care of all of that. In it, you will paste some directions which tell AI bots not to scrape your site for data that they will use for training.
It's good to make the distinction between search and bots used to train LLM models (large language models).
You will find your robots.txt file at yourdomain.com/robots.txt
And it will be a case of adding text to the file. You can find an example in section five of the article - How do I block AI crawlers? However, I would suggest always double checking any code from the internet and getting professional advice.
Using Cloudflare AI Crawl Control
When you can't access your robots.txt file directly, you can look at using a tool like Cloudflare, which helps block unauthorised scraping of your data for AI training.
Here is a view of some of the bots in AI crawl control, a very easy menu to use. (In this case Cloudflare hasn't been switched to go through it yet - an issue in Substack.)

Preventing automatic AI LLM training on Substack
Substack now has a setting where you can opt out of AI training. Go to Settings : Privacy : fifth item down the menu.

Preventing automatic AI LLM training on LinkedIn
LinkedIn have as part of the agreement, that they can also share your data with their partners. Their owner - Microsoft. If you want to turn this off, go to Settings : Data Privacy : Data for Generative AI improvement.
The Takeaway
I think it's worth thinking about how to have our writing used the way we want. As best we can. For that, we need to approach it with different strategies.
Looking at how efficient the system is, how quickly a search engine ingests one's writing, repurposes it and uses it, it's something as a writer to bear in mind.
We can't rely on companies to tell us. We can't rely on the Government - in the Copyright & Artificial Intelligence Consultation, they have been pushing for opt-out rather than opt-in.
I think personally I am happy to sacrifice some discoverability in order to feel as if I have some agency over how online writing is used.
It's good to write articles which are read and shared. But your writing is hard won and precious, and should be protected.
Thanks for reading.