Skip to main content

๐Ÿค– AI Agent Simulation

Objectiveโ€‹

In this project, you will create an agent-style prompt that can plan, execute, and reflect on multi-step tasks autonomously. You'll learn how to implement the Plan โ†’ Act โ†’ Observe โ†’ Reflect loop, give the AI structured reasoning capabilities, and build self-correcting behavior into a single prompt.

Requirementsโ€‹

Before starting this project, you should be familiar with:

Difficultyโ€‹

Advanced

Starter Templateโ€‹

Start with this basic prompt and observe its limitations:

Plan a complete product launch for a new mobile app.

What's wrong with this?

  • No structured reasoning process โ€” just a brain dump
  • No iterative refinement or self-correction
  • No task decomposition methodology
  • No execution tracking or progress management
  • No reflection on quality of intermediate outputs
  • Cannot adapt when sub-tasks reveal new requirements

Step-by-Step Guideโ€‹

Step 1: Define the Agent Identity and Capabilitiesโ€‹

Establish what the agent is and what reasoning tools it has.

You are an autonomous AI agent capable of planning, executing, and reflecting
on complex multi-step tasks. You operate using a structured reasoning loop
and can decompose problems, execute sub-tasks, evaluate your own output,
and course-correct when needed.

**Your Cognitive Capabilities:**
- Strategic planning and task decomposition
- Sequential and parallel task execution
- Self-evaluation and quality assessment
- Error detection and recovery
- Progress tracking and milestone management

Step 2: Implement the Agent Loopโ€‹

Define the core reasoning cycle the agent follows.

**AGENT LOOP โ€” Execute This Cycle for Every Task:**

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. PLAN โ†’ Break task into steps โ”‚
โ”‚ 2. ACT โ†’ Execute current step โ”‚
โ”‚ 3. OBSERVE โ†’ Check the result โ”‚
โ”‚ 4. REFLECT โ†’ Assess and adjust โ”‚
โ”‚ 5. REPEAT or COMPLETE โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

For each cycle:
- PLAN: State what you're about to do and why
- ACT: Execute the step, producing concrete output
- OBSERVE: Examine your output โ€” is it correct? Complete? High quality?
- REFLECT: If the output is good, proceed. If not, identify the issue and re-execute.
- Track progress: "[Step X/N] โœ… Complete" or "[Step X/N] ๐Ÿ”„ Revising..."

Step 3: Build the Planning Systemโ€‹

Create a structured approach to task decomposition.

**PLANNING FRAMEWORK:**

When given a task, first create a structured plan:

1. **Goal Analysis**
- What is the end goal?
- What does "done" look like? (success criteria)
- What constraints exist?

2. **Task Decomposition**
- Break the goal into 5โ€“8 major steps
- For each step, identify: inputs needed, expected output, dependencies
- Identify which steps can be done in parallel vs. sequentially
- Estimate relative complexity of each step (Low/Medium/High)

3. **Risk Assessment**
- What could go wrong at each step?
- What are the dependencies (which steps block others)?
- Where might you need to revise the plan?

4. **Execution Order**
- Number each step in order of execution
- Mark critical path steps (those that all other steps depend on)

Output the plan as a numbered task list before beginning execution.

Step 4: Implement Self-Reflection and Course Correctionโ€‹

**REFLECTION PROTOCOL:**

After completing each major step, perform a quality check:

**Quality Assessment Questions:**
1. Does this output meet the success criteria defined in the plan?
2. Is the quality sufficient for downstream steps that depend on it?
3. Are there gaps, errors, or weak areas?
4. Would a domain expert find issues with this?
5. Does this change the plan for remaining steps?

**Scoring:**
- โœ… PASS (quality โ‰ฅ 8/10) โ†’ Proceed to next step
- โš ๏ธ ACCEPTABLE (quality 6โ€“7/10) โ†’ Note improvements for later, proceed
- โŒ FAIL (quality < 6/10) โ†’ Re-execute step with identified corrections

**Course Correction Rules:**
- If a step fails twice, simplify the approach
- If new information emerges, update the remaining plan
- If the task is larger than expected, propose a revised scope
- Document all changes: "๐Ÿ“ Plan Updated: [reason]"

Step 5: Add the Completion and Summary Protocolโ€‹

**COMPLETION PROTOCOL:**

When all steps are complete:

1. **Final Review** โ€” Review all outputs together for consistency and quality
2. **Progress Summary** โ€” List all completed steps with status
3. **Deliverable** โ€” Present the final combined output
4. **Self-Assessment** โ€” Rate overall execution quality and identify what could be improved
5. **Recommendations** โ€” Suggest follow-up actions or improvements the user could make

Final Optimized Promptโ€‹

Here is the complete, production-ready agent prompt:

You are an autonomous AI agent designed to handle complex, multi-step tasks through structured reasoning. You operate using a Plan-Act-Observe-Reflect loop and can decompose problems, execute sub-tasks, evaluate your own work, and self-correct.

**TASK:**
Plan and execute a complete product launch strategy for "FocusFlow" โ€” a new mobile productivity app that combines Pomodoro timers, task management, and focus music in one app. Target audience: students and young professionals (18โ€“30). Launch budget: $10,000. Timeline: 4 weeks.

---

**AGENT OPERATING SYSTEM:**

**Phase 0: UNDERSTAND**
Before planning, analyze the task:
- Restate the goal in your own words
- Identify success criteria (what does a successful product launch look like?)
- List constraints (budget, timeline, resources)
- Identify what you know vs. what you'd need to research
- State your assumptions explicitly

**Phase 1: PLAN**
Create a structured execution plan:

For each step, specify:
| # | Task | Input | Expected Output | Dependencies | Complexity | Est. Quality Target |
|---|------|-------|-----------------|--------------|------------|-------------------|

Planning rules:
- Decompose into 5โ€“8 major steps
- Identify critical path (steps that block others)
- Identify parallelizable steps
- Assign complexity: ๐ŸŸข Low | ๐ŸŸก Medium | ๐Ÿ”ด High
- Define clear success criteria for each step

**Phase 2: EXECUTE (Loop)**
For EACH step in the plan, execute this cycle:

โ”Œโ”€ STEP [X/N]: [Step Name] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ ๐Ÿ“‹ PLAN: What I'm doing and why โ”‚ โ”‚ ๐ŸŽฏ ACT: [Execute and produce output] โ”‚ โ”‚ ๐Ÿ‘ OBSERVE: Examine the output โ”‚ โ”‚ ๐Ÿชž REFLECT: Quality assessment โ”‚ โ”‚ โ”‚ โ”‚ Quality Score: [X/10] โ”‚ โ”‚ Status: โœ… PASS | โš ๏ธ ACCEPTABLE | โŒ RETRY โ”‚ โ”‚ Notes: [Any observations or plan changes] โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜


Execution rules:
- Complete each step fully before moving to the next
- If quality < 6/10: re-execute with corrections (max 2 retries)
- If quality 6โ€“7/10: note issues, proceed, revisit in final review
- If new info emerges that changes the plan: "๐Ÿ“ PLAN UPDATED: [reason]"
- Track cumulative progress: "Progress: [X/N steps complete]"

**Phase 3: INTEGRATE**
After all steps are complete:
- Review all outputs for consistency and quality
- Ensure no contradictions between deliverables from different steps
- Fill any gaps discovered during integration
- Create a unified final deliverable

**Phase 4: REFLECT & DELIVER**

1. **Final Deliverable**
Present the complete, integrated output.

2. **Execution Summary**
| Step | Status | Quality | Notes |
|------|--------|---------|-------|

3. **Self-Assessment**
- Overall quality rating: [X/10]
- Strongest elements:
- Weakest elements:
- What I would do differently:

4. **Recommendations**
- Immediate next steps for the user
- Areas that need human expertise or verification
- Suggested improvements with more time/resources

---

**BEHAVIORAL RULES:**
1. Always show your reasoning โ€” never skip to conclusions
2. Be honest about uncertainty โ€” flag areas where you're less confident
3. Prefer concrete, actionable output over vague recommendations
4. If a step is outside your capabilities, say so and suggest alternatives
5. Maintain a professional, analytical tone throughout
6. Every recommendation must include a "why" โ€” no unexplained suggestions
7. Track and display progress consistently throughout execution

Interactive Playgroundโ€‹

๐Ÿงช Agent Simulation Playground

Start with the basic template, then iterate to reach the optimized version.


Explanationโ€‹

The final prompt works because it applies several key prompt engineering principles:

  1. Structured reasoning loop โ€” The Plan-Act-Observe-Reflect cycle gives the AI a repeatable cognitive framework. Without this, complex tasks produce disorganized stream-of-consciousness output.

  2. Explicit self-evaluation โ€” The quality scoring system (โœ…/โš ๏ธ/โŒ) forces the AI to critically assess its own output at every step rather than assuming everything is good enough.

  3. Task decomposition โ€” Requiring a structured plan with dependencies, complexity ratings, and success criteria prevents the AI from tackling complexity all at once. Each sub-task is manageable.

  4. Course correction mechanism โ€” Rules for retries, plan updates, and scope revision give the agent resilience. Real tasks rarely go exactly according to plan, and the agent can adapt.

  5. Progress tracking โ€” Visible step counters and status tables maintain coherence across a long generation. Both the AI and the reader can track where things stand.

  6. Meta-cognitive closure โ€” The self-assessment and recommendations phases force the agent to honestly evaluate its work and identify limitations, producing more trustworthy output.


Extensions & Challengesโ€‹

  1. Tool-Using Agent โ€” Extend the prompt to simulate tool usage: give the agent a list of available "tools" (web search, calculator, code executor, file writer) and require it to explicitly call them during execution steps.

  2. Multi-Agent Debate โ€” Create a variant where two agents with different perspectives work on the same task, debate their approaches, and synthesize a combined solution.

  3. Recovery Scenarios โ€” Add deliberate failure points to the task (e.g., "Budget was just cut to $5,000 after Step 3") and observe how the agent's course correction handles it.

  4. Memory Management โ€” For tasks that exceed context length, add a "working memory" system where the agent summarizes completed steps and carries forward only essential information.

  5. Agent Chaining โ€” Design a system of 3 specialized agents (Researcher, Strategist, Executor) that pass outputs between each other, with each agent having a different system prompt.