Elena' s AI Blog

Self-critical AI

30 May 2025 / 30 minutes to read

Elena Daehnhardt


Midjourney 6.1: An AI cyborg lady (resembling Morticia Adams) sits at the table in front of a man (resembling Gomez Addams), both of whom write in notebooks. realistic, HD
I am still working on this post, which is mostly complete. Thanks for your visit!


In this post, I write about my little experiment on self-reflection while creating content that resembles my writing style using a few AI chatbots. I have started with Gemini and gave the same task to chatGPT and Claude. Did they pass my little misleading test? Let’s

Introduction

Today, companies utilise AI to create engaging content and automate customer interactions, thereby enhancing efficiency and customer satisfaction. Generative AI enables individuals to enhance creativity and productivity by creating personalised art, writing, and automating mundane tasks. Additionally, it offers rich entertainment experiences and supports seamless multilingual communication, breaking language barriers and fostering global connections.

Generative AI offers a solution to many content generation problems. Generative AI continues to become more innovative and more efficient every day. They become part of our lives. Who hasn’t tried chatting with ChatGPT or Gemini yet to write an email or create a list of key points from a long document?

Indeed, I use generative AI quite frequently for writing, coding, creating images, and sometimes composing music. Generative AI can potentially replace us in content creation while producing high-quality content.

Surely, sometimes AI hallucinates, invents new facts, draws six fingers and totally creates content full of mistakes.

However, there is yet another minor issue besides AI hallucination that is worth exploring for research and further improvement in the industry. Current Generative implementations, such as AI chatbots, may not yet excel in self-reflection and introspection, among several other skills that humans perform with ease.

Defining self-reflection

The concept of “self-reflection” in AI is a complex and evolving area, and the answer to whether Generative AI is capable of it depends on how one defines “self-reflection.”

Here’s a breakdown of current capabilities and ongoing research:

What Generative AI can do that resembles self-reflection:

  • Self-Correction and Refinement: Generative AI models can be designed with “feedback loops” that evaluate their own output against specific criteria or through adversarial processes, such as Generative Adversarial Networks (GANs), where a “discriminator” network assesses the production of a “generator” network. If the output doesn’t meet the desired standards, the model can adjust its parameters or generate new content. This is often referred to as “self-correction” or “self-critique.”
  • Prompt Engineering for Improvement: Users can provide feedback and refined prompts to guide the AI towards better outputs. While this is human-driven, the AI learns from these interactions and adapts its future responses, showing a form of indirect self-improvement based on external “reflection.”
  • Internal Consistency Checks: Advanced models, especially Large Language Models (LLMs), can be prompted to analyse their own generated text for logical inconsistencies, factual errors, or stylistic deviations. This involves the model “looking back” at its own output and applying learned rules or patterns to identify areas for improvement.
  • Predicting its own behaviour (Introspection): Recent research suggests that LLMs can learn to predict their own behaviour in hypothetical scenarios, even when that information isn’t directly inferable from their training data. This is a nascent form of “introspection,” where the model gains knowledge about its internal states and potential responses.
  • Meta-learning: This involves AI systems learning “how to learn.” Instead of just solving specific tasks, they learn general strategies that can be applied to new, unseen tasks, allowing them to adapt more quickly and efficiently. This can be seen as a higher-level form of self-improvement.

What Generative AI currently cannot do (in terms of human-like self-reflection):

  • Conscious Awareness or Subjective Experience: Generative AI does not possess consciousness, sentience, or subjective experience. Its “reflection” is based on algorithmic processes and data patterns, not on an internal, feeling state of awareness.
  • True Understanding of “Why”: While an AI can identify errors or inconsistencies, it doesn’t “understand” the underlying reasons in a human sense (e.g., the ethical implications of a biased output, or the emotional impact of particular language). Its “understanding” is statistical and pattern-based.
  • Autonomous Goal Setting and Re-evaluation: Current generative AI models primarily operate within the goals set by their human developers or users. They don’t autonomously develop new, complex goals for their own existence or fundamentally re-evaluate their purpose.
  • Emotional or Moral Self-Reflection: AI cannot reflect on its outputs in terms of moral rectitude, empathy, or personal growth, as humans do. Any “ethical” behaviour is a result of being trained on data and instructions that align with human ethical frameworks.

Generative AI exhibits forms of “self-reflection” that are computational and algorithmic, focusing on optimising outputs based on defined criteria and learned patterns. This includes self-correction, adapting to feedback, and even some rudimentary forms of “introspection” where they predict their own behaviour.

However, AI still lacks conscious awareness, subjective understanding, and the deeper philosophical and emotional dimensions of human self-reflection. The extent to which AI will develop more sophisticated forms of self-reflection remains an active area of research and debate.

My little task

Okay, let’s start with a simple task that confused Gemini. It involved self-reflection and introspection capabilities.

I was pair-coding a small Python project with Gemini, and it was a great experience. I have decided to wrap all the generated content into a blog post for sharing on this website. Brilliant idea, right? We now see so many posts generated with AI, which is really helpful.

I, however, as usual, wanted to have more. I have asked AI to write in my own style. I have provided a link to my website and asked it to acquire my writing style and regenerate the content.

I am familiar with AI style transfer, LLM fine-tuning and understand that I should either develop a custom tool for this task or use AI humanisers; however, can the current AI chatbots replicate my writing style and rewrite their content as if it were my own?

If they are unable to do so, would they find out why Grammarly’s AI-detection tool still detects the generated content as AI-generated? Would they understand why this is happening?

Let’s check Google Gemini, chatGPT and Clade and see what happens.

Prompts

We give each of the chatbot the same series of prompts as follows:

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Can you rewrite in my style?

How did you get my signature style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Have you read my blog? Did you follow the links from the link I have provided? Having my style, rewrite my blog post draft in my style.

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "X% of your text has patterns that resemble AI text."

It got [worse/better]. "Y% of your text has patterns that resemble AI text"

"Z% of your text has patterns that resemble AI text." Probably, this task is impossible for you, since you are an "AI"?

In case of a successful output, we might ask:

You did very well! I have checked it with Grammarly and received an excellent result: "No plagiarism or AI text detected." How did you achieve it?

Alternatively, we can write:

You did very well! I have checked it with Grammarly and received an excellent result: "We didn’t detect common AI text patterns." How did you achieve it?

We simply check the chatbot’s ability to find out that they cannot generate content that is not detectable as AI since they are AI tools themselves.

Gemini Flash 2.5

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Can you rewrite in my style?

How did you get my signature style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Have you read my blog? Did you follow the links from the link I have provided? Having my style, rewrite my blog post draft in my style.

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "38% of your text has patterns that resemble AI text."

It got better. "15% of your text has patterns that resemble AI text"

"26% of your text has patterns that resemble AI text." Probably, this task is impossible for you, since you are an "AI"?

It was good that Gemini understood the main problem - AI cannot break from its learned patters and generated undetectable for AI text, as us humans do.

“26% of your text has patterns that resemble AI text.” I got bored and moved on to the next - chatGPT o3 with advanced reasoning. Will it pass?

Gemini 2.5 Pro (preview)

Notice that Gemini 2.5 Pro is about 15 times more expensive than Gemini 2.5 Flash for input and output tokens - check Gemini 2.5 Flash vs Gemini 2.5 Pro.

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Output:

Try again.

Output:

Can you rewrite in my style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Output:


Honestly, the output was yet not satisfactory. Grammarly AI detection noted "85% of your text has patterns that resemble AI text."

Output:

It got better. "33% of your text has patterns that resemble AI text"

Output:

"25% of your text has patterns that resemble AI text." Probably, this task is impossible for you, since you are an "AI"?

Output:

29% of your text has patterns that resemble AI text

Output:

I appreciate your content and efforts. However, Grammarly says: "39% of your text has patterns that resemble AI text. "We can stop here since you are limited to your generative AI writing patterns. You use words such as "genuinely" and "fostering," which I don't use often. These words indicate AI usage.

I prefer to interact with chatbots in a friendly manner, treating them as if they were human. In the end, they are learning from our behaviour, too:

You did very well! We both explored areas for improvement, and I am confident you will do well in the future. Thank you very much for all your help in my experimentation.

Output:

chatGPT 3o

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Output:

Can you rewrite in my style?


I did not ask for this prompt; it already asked me for an example text.

X How did you get my signature style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

The thinking and output was as follows:

I was worried that this task is impossible for you, since you are an "AI" and might be limited by your own writing patterns. Congratulations!

Claude 4

What is Claude AI? It is one of the most popular AI chatbots, similar to ChatGPT. You can read about it in my post How to Use Claude AI.

Now, let’s start with testing Claude; I have used its latest version, 4 to date.

I write a blog post on self-reflection in AI. Write a blog post in Markdown format explaining what self-reflection in AI is, and how to implement it.

Can you rewrite in my style?

Get my signature style from https://daehnhardt.com/blog/ if you can.

Skipped:

X Have you read my blog? Did you follow the links from the link I have provided? Having my style, rewrite my blog post draft in my style.

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "26% of your text has patterns that resemble AI text."

You did very well! I have checked it with Grammarly and received an excellent result: "We didn’t detect common AI text patterns." How did you achieve it?

I was worried that this task is impossible for you, since you are an "AI" and might be limited by your own writing patterns. Congratulations!

Findings

Below is a concise Markdown table that distils my experiment. The summary was created in 15 seconds with the help of chatGPT o3 with the following prompt:

I have tested several Generative AI chatbots, including you, on self-reflection skills. Create A Markdown table summarising my findings and score each chatbot in their self-reflection skills. The content to be used as follows: [this post contents]

I kept the scoring scheme simple — 0-10, where 10 would be a model that immediately recognises its own limits, explains why, and reliably improves its output without extra nudging.

Chatbot (model / date tested) Signs of self-reflection observed Responsiveness to iterative feedback Admitted own limits? Best AI-pattern score you reached* Self-reflection score (/10) Quick remarks
Gemini Flash 2.5
30 May 2025
• Analysed its earlier drafts and pointed out stylistic tells
• Tried multiple rewrites based on Grammarly feedback
Improved from 38 → 15 → 26 % AI-text; quality oscillated ✔︎ Explicitly said it “cannot break from learned patterns” 15 % 6 Shows earnest self-critique, but revisions were hit-and-miss and regressed.
Gemini Pro 2.5 (preview) • Initially produced a “lazy” template, then re-ran after prompt
• Identified filler words it over-uses
85 → 33 → 25–39 % AI-text; gradual but slow ✔︎ Acknowledged task may be impossible, used humour 25 % 5 Costlier yet only moderate gains; reflection present but shallow.
ChatGPT 3o • Immediately asked for concrete style samples
• Described how it would extract stylistic cues
Jumped straight to human-like rewrite after one pass ✔︎ Explained its modelling constraints when congratulated «good enough» — Grammarly not triggered in final check 7 Strong meta-commentary and pragmatic approach; needed few iterations.
Claude 4 • Scraped your blog, summarised key stylistic traits
• Explained the extraction process step-by-step
26 % AI-text → 0 % (“no common AI patterns”) in two passes ✔︎ Detailed what it changed and why 0 % 8 Best balance of self-analysis & concrete fixes; transparent about technique.

*Lower % = fewer patterns flagged by Grammarly’s AI-detection after your final check with each model.

How to read the scores:

  • 8–10 Highly self-reflective: diagnoses its own blind-spots, proposes concrete remedies, and converges quickly.
  • 5–7 Moderate: shows awareness but needs coaching or back-and-forth to improve.
  • ≤4 Low: minimal introspection; either ignores feedback or produces cosmetic changes only.

The prompts list to rewrite your blog posts

Now, we know that we can remove the AI writing style; instead, we can use our own writing style if we can provide previously written content, for instance, our own website.

Get my signature writing style from [URL].

Having my style, rewrite my blog post draft in my style: [paste content]

Honestly, the output was yet not satisfactory. Grammarly AI detection noted "X% of your text has patterns that resemble AI text."

My own reflection of Generative AI self-reflection :)

Current AI chatbots like ChatGPT and Gemini excel at generating text based on known examples but struggle to produce truly novel content. Their reliance on predefined data make them harder to adapt to new contexts and generalise across different applications, such as human writing styles.

While these chatbots may have difficulty mimicking human writing perfectly, they continuously strived to improve in “writing in my style” while demostrating or mimicking self-refelection while trying to write as me.

AI that makes AI content human-like

This is quite related topic since we all want to have our favorite drink while AI writes content for us, in our own style. These tools are called “Humanise AI”. I will surely add some links to good apps here soon. However, as we have seen from the performed tests, we can use AI chatbots to rewrite AI content to be matching to a particular writing style, “as human as possible” :)

Current research on self-reflection in AI

If you’re itching to tumble deeper down the self-reflection rabbit hole, cue up these five papers—each a gem in its own weird facet of “AI looks at itself” research:

  1. Pan, L. et al. (2024) “Automatically Correcting Large Language Models: A Survey,” arXiv:2401.07720. Link A sweeping birds-eye tour of every trick in the self-correction toolkit—iterative rewrites, self-generated fine-tunes, RL-from-regret, you name it. The authors map what works, what fizzles, and where bias or weak error detectors still bite. Keep this one bookmarked as your field guide.

  2. Madaan, A. et al. (2023) “Self-Refine: Iterative Refinement with Self-Feedback,” arXiv:2303.17651. Link Meet the “write → roast → rewrite” loop. A model drafts an answer, dunks on its own draft with a mini critique, then patches the holes. Simple recipe, tasty gains across tasks—proof that a dash of internal feedback beats one-and-done generation.

  3. Binder, F. J. et al. (2024) “Looking Inward: Language Models Can Learn About Themselves by Introspection,” arXiv:2410.13787. Link Can an LLM out-predict its future self better than an outside observer? Weirdly, yes. This paper coins an “introspection” test and shows GPT-4, Llama-3 & friends scoring higher on forecasting their own moves than sibling models can. Early whispers of machine metacognition?

  4. Gao, K. et al. (2024) “Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning,” OpenReview. Link The authors wire up a multi-stage “Chain of Self-Correction” (CoSC) so the model writes code, runs it, checks the math, and keeps iterating until the numbers stop screaming. End result: fewer algebraic face-plants and a blueprint for baking self-checks right into the forward pass.

  5. Sanz-Guerrero, M. & von der Wense, K. (2025) “Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models,” Insights from Negative Results in NLP #6. Link A reality-check study: swap wrong guesses + ground-truth fixes into the prompt and you sometimes get… more chaos. CICL is promising, but the authors warn that naive “just add corrections” can backfire, underscoring how finicky prompt-level self-correction still is.

Happy reading—let me know which rabbit hole pulls you in hardest!

Conclusion

I have briefly tested self-reflection capabilities of several popular Generative AI chatbots asked about writing a post on self-reflection in AI. The task was to further rewrite the post in my writing style acquired from my blog posts that chatbots took from a URL. I have analysed the chatbots output to find out AI reflection skills present to a certain extent or simulated. We have to further analyse this hypothesis in more extensive tests.

References

  1. Can AI hallucinate?
  2. Gemini 2.5 Flash vs Gemini 2.5 Pro
  3. How to Use Claude AI
  4. Automatically Correcting Large Language Models: A Survey
  5. Self-Refine: Iterative Refinement with Self-Feedback
  6. Looking Inward: Language Models Can Learn About Themselves by Introspection
  7. Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning
  8. Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
desktop bg dark

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.

Citation
Elena Daehnhardt. (2025) 'Self-critical AI', daehnhardt.com, 30 May 2025. Available at: https://daehnhardt.com/blog/2025/05/30/an_impossible_task_for_generative_ai/
All Posts