Skip to main content

๐Ÿชž Reflection Prompting

Reflection prompting is a technique where you ask the AI to review, critique, and improve its own output. After the AI generates an initial response, you ask it: "What could be wrong with your answer?" or "How could this be better?" The AI then identifies weaknesses and produces an improved version. It's a built-in quality check.

Think of it like asking a writer to proofread their own work, but with specific prompts that guide them to look for particular problems.

Why This Mattersโ€‹

AI models can often catch their own mistakes โ€” if you ask them to look:

  1. Self-correction is powerful โ€” The AI may generate a flawed answer, but when prompted to critique it, it spots the flaw
  2. First drafts are rarely best โ€” Just like human writing, AI output improves with revision
  3. Different "modes" of thinking โ€” Generating an answer and evaluating an answer use different reasoning patterns, often catching different errors
  4. No external tools needed โ€” Unlike fact-checking or web search, reflection works with just the LLM itself
  5. Builds a feedback loop โ€” Multiple rounds of reflection can significantly improve quality

Research has shown that self-reflection can improve accuracy by 10-25% on reasoning tasks, especially when the AI is asked to check for specific types of errors.

The Reflection Loopโ€‹

Generate โ†’ Reflect โ†’ Improve โ†’ (optionally Reflect again) โ†’ Final Output

Types of Reflectionโ€‹

Error-Checking: "What factual errors might be in your response?"

Completeness: "What important points did you miss?"

Quality: "How could this be clearer, more concise, or better structured?"

Audience Fit: "Would a beginner actually understand this explanation?"

Adversarial: "Try to find the strongest argument against your conclusion."

Prompt Exampleโ€‹

PHASE 1 โ€” Generate:
Explain why microservices architecture is better than monolithic
architecture for large teams.

PHASE 2 โ€” Reflect:
Now review your explanation above. Answer these questions honestly:
1. Did you present a balanced view, or was it one-sided?
2. Are there scenarios where monoliths are actually better?
3. Did you make any claims without evidence?
4. Would a junior developer understand this?

PHASE 3 โ€” Improve:
Based on your self-critique, write an improved version that
addresses all the weaknesses you identified.

โŒ Bad Exampleโ€‹

Explain microservices vs monoliths.

Problem: One-shot generation with no review. The AI produces a response that might be one-sided, missing nuance, or unclear โ€” and you'd have to catch those issues yourself.

โœ… Improved Exampleโ€‹

STEP 1: Explain the trade-offs between microservices and monolithic
architecture for a team of 50 developers working on an e-commerce
platform. Cover both advantages and disadvantages of each approach.

STEP 2: Now critically review your explanation:
- Did you give equal weight to both architectures?
- Did you mention operational complexity of microservices?
- Did you mention scalability limits of monoliths?
- Are there any oversimplifications or missing nuances?
- List 3 specific weaknesses in your explanation.

STEP 3: Rewrite your explanation, addressing every weakness from Step 2.
Mark the improved sections with [IMPROVED] so I can see what changed.

Why it works: The reflection step forces the AI to switch from "answer mode" to "critic mode." This change in perspective catches problems the generation phase missed.

๐Ÿงช Try It Yourself

Edit the prompt and click Run to see the AI response.

Practice Challenge

Use reflection prompting to improve a technical document:

  1. Generate: Ask the AI to write a README for a Todo app REST API
  2. Reflect: Ask the AI to critique it โ€” Is it missing setup instructions? Are the API examples correct? Would a new developer be confused by anything?
  3. Improve: Ask the AI to rewrite addressing its own critique

Compare the version from step 1 with the version from step 3. What improved?

Real-World Scenarioโ€‹

Self-Reviewing Code Generation:

GENERATE:
Write a Python function that finds all duplicate files in a directory
by comparing their content (not just names). Handle edge cases like
empty files, symbolic links, and permission errors.

REFLECT:
Review the code you just wrote. Check for:
1. Does it handle very large files efficiently? (memory usage)
2. Does it handle symbolic links correctly? (infinite loops?)
3. What happens if the directory contains thousands of files?
4. Are there any security concerns? (symlink attacks, etc.)
5. Is error handling comprehensive?
6. Are there any race conditions?

Score your code 1-10 on each criterion.

IMPROVE:
Rewrite the function addressing every issue scored below 8.
Add comments explaining each improvement.
Add a docstring with usage examples.

This approach consistently produces more robust code than a single generate pass. The reflection phase catches issues like unbounded memory usage, missing error handling, and edge cases that the initial generation often misses.

Multi-Round Reflectionโ€‹

For critical outputs, you can run multiple reflection rounds:

Round 1: Generate initial response
Round 2: Reflect on correctness and completeness
Round 3: Improve based on Round 2
Round 4: Reflect on clarity and audience fit
Round 5: Final polish based on Round 4

Each round should focus on a different quality dimension to avoid diminishing returns.

Interview Question

Q: What is reflection prompting and how does it improve output quality?

A: Reflection prompting is a multi-phase technique where you first ask the AI to generate a response, then explicitly ask it to critique its own output, and finally ask it to improve based on that critique. It works because the generation and evaluation tasks activate different reasoning patterns โ€” the AI may miss an error while generating but catch it while reviewing. I use it by including specific reflection questions like "What errors might exist?" or "What did you miss?" rather than vague "is this good?" prompts. The technique typically improves quality by 10-25% on reasoning tasks. It's most valuable for code generation, technical writing, and analysis where correctness matters. The trade-off is higher token usage since you're generating content multiple times.

Summary
  • Reflection prompting = generate, then critique, then improve
  • Forces the AI to switch between generation and evaluation modes
  • Ask specific reflection questions (not just "is this good?")
  • Types: error-checking, completeness, quality, audience fit, adversarial
  • Improves quality by 10-25% on complex tasks
  • Multiple rounds can focus on different quality dimensions
  • Especially valuable for code, technical docs, and analysis
  • Mark improvements with [IMPROVED] to track what changed
  • Trade-off: more tokens for better quality