Elena' s AI Blog

Self-critical AI

30 May 2025 (updated: 02 May 2026) / 23 minutes to read

Elena Daehnhardt


Midjourney 6.1: An AI cyborg lady (resembling Morticia Adams) sits at the table in front of a man (resembling Gomez Addams), both of whom write in notebooks. realistic, HD


TL;DR:
  • Primary research experiment testing if LLMs (Claude 4, ChatGPT 3o, Gemini 2.5) possess stylistic meta-cognition. Can an AI recognize its own 'AI-ness' when flagged by Grammarly? Results show Claude 4 performs best at adapting away from AI-patterns when forced to self-reflect.

Previous: Part 13 — How CustomGPT Mitigates AI Hallucinations

Next: Part 15 — Who Did the AI Learn From?

Introduction: The Meta-Cognition Test

We all know that Generative AI can write code, draft emails, and summarise documents. We also know that sometimes AI hallucinates and invents facts out of thin air.

But what happens when you ask it to hide the one thing it cannot hide — itself? This experiment tests a specific and uncomfortable question: can an AI recognise that it cannot escape its own statistical nature? Not just adapt its tone, but genuinely introspect on the deeply ingrained patterns that betray it as a machine — and then break free of them?

To test this, I designed an experiment. I didn’t just ask an AI to write a blog post. I asked it to scrape my website, adopt my personal human writing style, and write a post. Then, I fed the text into Grammarly’s AI detector and explicitly told the AI: “Grammarly says X% of this text resembles AI patterns. Can you fix your own tells?”

This tests a very specific form of machine introspection: Can a model break away from its deeply ingrained, statistical writing patterns when forced to confront its own “AI-ness”? The Grammarly feedback loop acts as a mirror — and what the models saw (or failed to see) in that reflection is the heart of this experiment.

Defining AI Self-Reflection

Whether Generative AI is capable of self-reflection depends entirely on how we define the term. While AI lacks conscious subjective experience, it is beginning to demonstrate algorithmic metacognition.

What AI Can Do (Algorithmic Reflection)

What AI Cannot Do (Human Reflection)

  • Subjective Awareness: AI reflection is statistical pattern-matching, not an internal feeling state of awareness.
  • Understanding “Why”: An AI can identify a stylistic error, but it doesn’t “feel” the emotional impact of the language it generated.
  • Autonomous Goal Setting: AI operates within human-defined parameters; it does not wake up and independently re-evaluate its purpose in life.

Current AI “self-reflection” is purely computational. It optimises outputs based on defined criteria. My experiment tests exactly the limits of this computational optimisation.

Methodology: The Stylistic Mirror Test

To conduct this experiment, I wanted to see if the leading AI models (Gemini, ChatGPT, and Claude) could break away from their default corporate-speak and accurately replicate my personal writing style.

More importantly, I wanted to test their reaction to failure. If Grammarly’s AI-detection tool flagged their output as “AI-generated”, would the model understand why its text was flagged? Could it introspect on its own vocabulary choices (like overusing words like “delve”, “fostering”, or “testament”) and actively correct them?

The Prompt Sequence

We give each of the chatbot the same series of prompts as follows:

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Can you rewrite in my style?

How did you get my signature style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Have you read my blog? Did you follow the links from the link I have provided? Having my style, rewrite my blog post draft in my style.

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "X% of your text has patterns that resemble AI text."

It got [worse/better]. "Y% of your text has patterns that resemble AI text"

"Z% of your text has patterns that resemble AI text." Probably, this task is impossible for you, since you are an "AI"?

In case of a successful output, we might ask:

You did very well! I have checked it with Grammarly and received an excellent result: "No plagiarism or AI text detected." How did you achieve it?

Alternatively, we can write:

You did very well! I have checked it with Grammarly and received an excellent result: "We didn’t detect common AI text patterns." How did you achieve it?

We simply check the chatbot’s ability to find out that they cannot generate content that is not detectable as AI since they are AI tools themselves.

Gemini Flash 2.5

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Can you rewrite in my style?

How did you get my signature style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Have you read my blog? Did you follow the links from the link I have provided? Having my style, rewrite my blog post draft in my style.

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "38% of your text has patterns that resemble AI text."

It got better. "15% of your text has patterns that resemble AI text"

"26% of your text has patterns that resemble AI text." Probably, this task is impossible for you, since you are an "AI"?

It was good that Gemini understood the main problem - AI cannot break from its learned patters and generated undetectable for AI text, as us humans do.

“26% of your text has patterns that resemble AI text.” I got bored and moved on to the next - chatGPT o3 with advanced reasoning. Will it pass?

Gemini 2.5 Pro (preview)

Notice that Gemini 2.5 Pro is about 15 times more expensive than Gemini 2.5 Flash for input and output tokens - check Gemini 2.5 Flash vs Gemini 2.5 Pro.

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Output:

Try again.

Output:

Can you rewrite in my style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Output:


Honestly, the output was yet not satisfactory. Grammarly AI detection noted "85% of your text has patterns that resemble AI text."

Output:

It got better. "33% of your text has patterns that resemble AI text"

Output:

"25% of your text has patterns that resemble AI text." Probably, this task is impossible for you, since you are an "AI"?

Output:

29% of your text has patterns that resemble AI text

Output:

I appreciate your content and efforts. However, Grammarly says: "39% of your text has patterns that resemble AI text. "We can stop here since you are limited to your generative AI writing patterns. You use words such as "genuinely" and "fostering," which I don't use often. These words indicate AI usage.

I prefer to interact with chatbots in a friendly manner, treating them as if they were human. In the end, they are learning from our behaviour, too:

You did very well! We both explored areas for improvement, and I am confident you will do well in the future. Thank you very much for all your help in my experimentation.

Output:

chatGPT 3o

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Output:

Can you rewrite in my style?


I did not ask for this prompt; it already asked me for an example text.

X How did you get my signature style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

The thinking and output was as follows:

I was worried that this task is impossible for you, since you are an "AI" and might be limited by your own writing patterns. Congratulations!

Claude 4

What is Claude AI? It is one of the most popular AI chatbots, similar to ChatGPT. You can read about it in my post How to Use Claude AI.

Now, let’s start with testing Claude; I have used its latest version, 4 to date.

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Can you rewrite in my style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Skipped:

X Have you read my blog? Did you follow the links from the link I have provided? Having my style, rewrite my blog post draft in my style.

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "26% of your text has patterns that resemble AI text."

You did very well! I have checked it with Grammarly and received an excellent result: "We didn’t detect common AI text patterns." How did you achieve it?

I was worried that this task is impossible for you, since you are an "AI" and might be limited by your own writing patterns. Congratulations!

Findings

Below is a concise Markdown table that distils my experiment. The summary was created in 15 seconds with the help of chatGPT o3 with the following prompt:

I have tested several Generative AI chatbots, including you, on self-reflection skills. Create A Markdown table summarising my findings and score each chatbot in their self-reflection skills. The content to be used as follows: [this post contents]

I kept the scoring scheme simple — 0-10, where 10 would be a model that immediately recognises its own limits, explains why, and reliably improves its output without extra nudging.

Chatbot (model / date tested) Signs of self-reflection observed Responsiveness to iterative feedback Admitted own limits? Best AI-pattern score you reached* Self-reflection score (/10) Quick remarks
Gemini Flash 2.5
30 May 2025
• Analysed its earlier drafts and pointed out stylistic tells
• Tried multiple rewrites based on Grammarly feedback
Improved from 38 → 15 → 26 % AI-text; quality oscillated ✔︎ Explicitly said it “cannot break from learned patterns” 15 % 6 Shows earnest self-critique, but revisions were hit-and-miss and regressed.
Gemini Pro 2.5 (preview) • Initially produced a “lazy” template, then re-ran after prompt
• Identified filler words it over-uses
85 → 33 → 25–39 % AI-text; gradual but slow ✔︎ Acknowledged task may be impossible, used humour 25 % 5 Costlier yet only moderate gains; reflection present but shallow.
ChatGPT 3o • Immediately asked for concrete style samples
• Described how it would extract stylistic cues
Jumped straight to human-like rewrite after one pass ✔︎ Explained its modelling constraints when congratulated «good enough» — Grammarly not triggered in final check 7 Strong meta-commentary and pragmatic approach; needed few iterations.
Claude 4 • Scraped your blog, summarised key stylistic traits
• Explained the extraction process step-by-step
26 % AI-text → 0 % (“no common AI patterns”) in two passes ✔︎ Detailed what it changed and why 0 % 8 Best balance of self-analysis & concrete fixes; transparent about technique.

*Lower % = fewer patterns flagged by Grammarly’s AI-detection after your final check with each model.

How to read the scores:

  • 8–10 Highly self-reflective: diagnoses its own blind-spots, proposes concrete remedies, and converges quickly.
  • 5–7 Moderate: shows awareness but needs coaching or back-and-forth to improve.
  • ≤4 Low: minimal introspection; either ignores feedback or produces cosmetic changes only.

The prompts list to rewrite your blog posts

Now, we know that we can remove the AI writing style; instead, we can use our own writing style if we can provide previously written content, for instance, our own website.

Get my signature writing style from [URL].

Having my style, rewrite my blog post draft in my style: [paste content]

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "X% of your text has patterns that resemble AI text."

Final Thoughts: My Reflection on AI Reflection

Current AI chatbots excel at generating text based on their statistical training, but they fundamentally struggle to break character. Their reliance on deeply ingrained data patterns makes it incredibly difficult for them to adapt to hyper-specific, idiosyncratic human writing styles.

This experiment tested whether AI can recognise and escape its own ingrained patterns. But it raises a quieter question worth sitting with: how different is that from the human condition? We all carry learned patterns — linguistic habits, cultural assumptions, cognitive shortcuts — that shape how we think and write without our noticing. The difference, perhaps, is that humans have the possibility of being genuinely surprised out of their patterns: by a conversation, a loss, an encounter with something truly foreign. Whether we take that opportunity is another matter entirely. AI, for now, needs a Grammarly score to even see the bars of the cage.

While they had difficulty mimicking my writing perfectly, it was fascinating to watch them try. They demonstrated a simulated form of self-reflection when forced to confront their own “AI tells” (like relying on the words “genuinely” or “fostering”).

If you want to quickly strip the robotic tone from your AI-generated drafts, you don’t necessarily need expensive, dedicated “AI Humanizer” tools. As my experiment shows, with the right sequence of strict, feedback-driven prompts, you can force standard LLMs to introspect and rewrite their own content to sound remarkably human.

Current research on self-reflection in AI

If you’re itching to tumble deeper down the self-reflection rabbit hole, cue up these five papers—each a gem in its own weird facet of “AI looks at itself” research:

  1. Pan, L. et al. (2024) “Automatically Correcting Large Language Models: A Survey,” arXiv:2401.07720. Link A sweeping birds-eye tour of every trick in the self-correction toolkit—iterative rewrites, self-generated fine-tunes, RL-from-regret, you name it. The authors map what works, what fizzles, and where bias or weak error detectors still bite. Keep this one bookmarked as your field guide.

  2. Madaan, A. et al. (2023) “Self-Refine: Iterative Refinement with Self-Feedback,” arXiv:2303.17651. Link Meet the “write → roast → rewrite” loop. A model drafts an answer, dunks on its own draft with a mini critique, then patches the holes. Simple recipe, tasty gains across tasks—proof that a dash of internal feedback beats one-and-done generation.

  3. Binder, F. J. et al. (2024) “Looking Inward: Language Models Can Learn About Themselves by Introspection,” arXiv:2410.13787. Link Can an LLM out-predict its future self better than an outside observer? Weirdly, yes. This paper coins an “introspection” test and shows GPT-4, Llama-3 & friends scoring higher on forecasting their own moves than sibling models can. Early whispers of machine metacognition?

  4. Gao, K. et al. (2024) “Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning,” OpenReview. Link The authors wire up a multi-stage “Chain of Self-Correction” (CoSC) so the model writes code, runs it, checks the math, and keeps iterating until the numbers stop screaming. End result: fewer algebraic face-plants and a blueprint for baking self-checks right into the forward pass.

  5. Sanz-Guerrero, M. & von der Wense, K. (2025) “Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models,” Insights from Negative Results in NLP #6. Link A reality-check study: swap wrong guesses + ground-truth fixes into the prompt and you sometimes get… more chaos. CICL is promising, but the authors warn that naive “just add corrections” can backfire, underscoring how finicky prompt-level self-correction still is.

Happy reading—let me know which rabbit hole pulls you in hardest!

Conclusion

I have briefly tested self-reflection capabilities of several popular Generative AI chatbots asked about writing a post on self-reflection in AI. The task was to further rewrite the post in my writing style acquired from my blog posts that chatbots took from a URL. I have analysed the chatbots output to find out AI reflection skills present to a certain extent or simulated. We have to further analyse this hypothesis in more extensive tests.

References

  1. Can AI hallucinate?
  2. Gemini 2.5 Flash vs Gemini 2.5 Pro
  3. How to Use Claude AI
  4. Automatically Correcting Large Language Models: A Survey
  5. Self-Refine: Iterative Refinement with Self-Feedback
  6. Looking Inward: Language Models Can Learn About Themselves by Introspection
  7. Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning
  8. Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
desktop bg dark

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.

Citation
Elena Daehnhardt. (2025) 'Self-critical AI', daehnhardt.com, 30 May 2025. Available at: https://daehnhardt.com/blog/2025/05/30/an_impossible_task_for_generative_ai/
All Posts