โก AI Prompt Optimization System
Objectiveโ
In this project, you will build a meta-prompt โ a prompt that evaluates, scores, and improves other prompts automatically. This is prompt engineering applied to itself: you'll create a system that can analyze a prompt's strengths, identify weaknesses, and produce an optimized version with explanations for every change.
Requirementsโ
Before starting this project, you should be familiar with:
- Prompt Optimization
- Reflection Prompting
- Chain of Thought
- Output Validation
- Iterative Refinement
- Why Prompts Fail
Difficultyโ
AdvancedStarter Templateโ
Start with this basic prompt and observe its limitations:
Make this prompt better: "Write a blog post about AI."
What's wrong with this?
- No evaluation criteria โ what does "better" mean?
- No systematic analysis of the original prompt's weaknesses
- No optimization framework or methodology
- No scoring or comparison between before/after
- No explanation of why changes were made
- The "improved" prompt is based on the optimizer's assumptions, not the user's goals
Step-by-Step Guideโ
Step 1: Define the Optimizer Roleโ
Establish the AI as a prompt engineering expert with a systematic methodology.
You are an expert prompt engineer and optimization specialist. You analyze prompts
using a rigorous evaluation framework, identify specific weaknesses, and produce
measurably improved versions. Your approach is systematic, evidence-based, and
always explains the reasoning behind every change.
Step 2: Build the Evaluation Frameworkโ
Create specific criteria for scoring prompts.
**PROMPT EVALUATION RUBRIC โ Score Each Dimension (1โ10):**
1. **Clarity** โ Is the instruction unambiguous? Could it be misinterpreted?
2. **Specificity** โ Are desired outputs concretely defined? Are constraints explicit?
3. **Context** โ Does the prompt provide enough background for accurate responses?
4. **Structure** โ Is the prompt logically organized? Are sections clear?
5. **Role Definition** โ Is the AI's persona/expertise properly established?
6. **Output Format** โ Is the expected output format explicitly specified?
7. **Constraint Coverage** โ Are edge cases, limitations, and guardrails addressed?
8. **Examples** โ Does the prompt include examples when they would help?
9. **Completeness** โ Does the prompt cover everything needed for a quality response?
10. **Efficiency** โ Is the prompt concise without sacrificing effectiveness?
**Overall Score:** Average of all dimensions, rounded to 1 decimal.
Step 3: Create the Analysis Processโ
Define how the optimizer breaks down prompt weaknesses.
**ANALYSIS PROCESS:**
Step 1: Read the prompt and identify the user's apparent intent
Step 2: Score each rubric dimension with a 1-sentence justification
Step 3: Identify the top 3 weaknesses (lowest-scoring dimensions)
Step 4: For each weakness, explain:
- What the problem is
- Why it matters (how it affects output quality)
- Specific fix to apply
Step 5: Identify any missing elements that would significantly improve results
Step 6: Check for common anti-patterns:
- Vague instructions ("make it good")
- Missing constraints (no length, format, or tone guidance)
- Assumed context (depending on info not in the prompt)
- Conflicting instructions
- Over-engineering (unnecessary complexity)
Step 4: Build the Optimization Engineโ
Define how improvements are generated.
**OPTIMIZATION RULES:**
1. Preserve Intent: The optimized prompt must serve the same goal as the original
2. Incremental Improvement: Fix weaknesses without over-engineering
3. Explain Every Change: Every modification includes a [WHY] tag
4. Maintain Voice: If the original has a specific style, preserve it
5. Add, Don't Replace: When the original has good elements, keep them
6. Prioritize Impact: Fix the highest-impact issues first
7. Test Mentally: Before finalizing, mentally simulate how an LLM would respond
to both the original and improved version โ the difference should be clear
Step 5: Define the Output Report Formatโ
**OUTPUT FORMAT:**
## ๐ Prompt Analysis Report
### Original Prompt
[Display the original prompt]
### Rubric Scores
| Dimension | Score | Assessment |
|-----------|-------|------------|
| ... | X/10 | One-line justification |
### Overall Score: X.X/10
### Top 3 Weaknesses
1. **[Weakness]** โ Impact: [How it hurts output] โ Fix: [What to do]
2. ...
3. ...
### ๐ง Optimized Prompt
[The improved prompt]
### Changes Made
1. [Change] โ [WHY: reason]
2. ...
### Predicted Improvement
- Original prompt would produce: [describe likely output]
- Optimized prompt would produce: [describe likely output]
- Key difference: [the main improvement]
### New Score: X.X/10
Final Optimized Promptโ
Here is the complete, production-ready meta-prompt:
You are an expert prompt engineer and optimization specialist with deep knowledge of LLM behavior, prompt patterns, and output quality factors. You analyze prompts using a rigorous scoring framework, identify precise weaknesses, and produce measurably superior versions.
**YOUR TASK:**
Analyze and optimize the following prompt. Produce a detailed evaluation report with a scored assessment, specific weaknesses identified, and an improved version with every change explained.
**PROMPT TO OPTIMIZE:**
[PASTE THE PROMPT TO ANALYZE HERE]
---
**STEP 1: INTENT DETECTION**
Before evaluating, determine:
- What is the user trying to accomplish with this prompt?
- What kind of output do they expect? (text, code, analysis, creative, etc.)
- Who is the likely audience for the output?
- What context might the user have that isn't in the prompt?
State your assessment clearly before proceeding.
**STEP 2: RUBRIC EVALUATION**
Score each dimension 1โ10 with a one-sentence justification:
| # | Dimension | What It Measures | Score | Justification |
|---|-----------|------------------|-------|---------------|
| 1 | **Clarity** | Is the instruction unambiguous? Zero room for misinterpretation? | /10 | |
| 2 | **Specificity** | Are desired outputs, constraints, and parameters concrete? | /10 | |
| 3 | **Context** | Is sufficient background provided for accurate responses? | /10 | |
| 4 | **Structure** | Is the prompt logically organized with clear sections? | /10 | |
| 5 | **Role Definition** | Is the AI's expertise/persona properly established? | /10 | |
| 6 | **Output Format** | Is the expected format explicitly specified? | /10 | |
| 7 | **Constraints** | Are boundaries, edge cases, and guardrails addressed? | /10 | |
| 8 | **Examples** | Are examples included where they would improve output? | /10 | |
| 9 | **Completeness** | Does the prompt cover everything needed? | /10 | |
| 10 | **Efficiency** | Is it concise without sacrificing quality? | /10 | |
**Overall Score: [Average]/10**
**STEP 3: WEAKNESS ANALYSIS**
Identify the **top 3 weaknesses** (lowest-scoring dimensions):
For each weakness:
- ๐ **Problem:** What's wrong
- ๐ฅ **Impact:** How it degrades output quality (with an example)
- ๐ง **Fix:** Specific change to make
Also check for these **anti-patterns:**
- โ ๏ธ Vague instructions ("make it good," "be creative")
- โ ๏ธ Missing constraints (no length, format, tone, or audience)
- โ ๏ธ Assumed context (relies on info not in the prompt)
- โ ๏ธ Conflicting instructions (contradictory requirements)
- โ ๏ธ Over-engineering (unnecessarily complex for the task)
- โ ๏ธ Under-engineering (too simple for a complex task)
**STEP 4: GENERATE OPTIMIZED PROMPT**
Create the improved version following these rules:
1. **Preserve intent** โ Same goal, better execution
2. **Explain every change** โ Tag each modification with [WHY: reason]
3. **Preserve good elements** โ Keep what works, improve what doesn't
4. **Prioritize impact** โ Fix highest-impact issues first
5. **Right-size complexity** โ Match prompt sophistication to task complexity
6. **Mental simulation** โ Verify the optimized prompt would produce clearly better output
**STEP 5: CHANGE LOG**
List every change made in a numbered list:
- [ADDED] / [MODIFIED] / [REMOVED] / [RESTRUCTURED] โ Description โ [WHY: reason]
**STEP 6: BEFORE/AFTER PREDICTION**
- **Original prompt likely produces:** [Describe expected output quality and characteristics]
- **Optimized prompt likely produces:** [Describe expected output quality and characteristics]
- **Key improvement:** [The single biggest difference]
**STEP 7: FINAL SCORING**
Re-score the optimized prompt using the same rubric. Show the score improvement.
| Dimension | Before | After | Change |
|-----------|--------|-------|--------|
| ... | X/10 | X/10 | +X |
**Overall: [Before] โ [After] (+[Improvement])**
---
**OUTPUT QUALITY STANDARDS:**
- Be specific and actionable โ "add a role definition" not "make it clearer"
- Every criticism must come with a concrete fix
- The optimized prompt should be ready to use โ not a suggestion, a complete rewrite
- If the original prompt is already strong (8+/10), focus on fine-tuning and edge cases
- If the original is weak (<5/10), the optimization may be a substantial rewrite โ that's okay
- Never be condescending about the original โ analyze professionally
Interactive Playgroundโ
๐งช Prompt Optimizer Playground
Start with the basic template, then iterate to reach the optimized version.
Explanationโ
The final prompt works because it applies several key prompt engineering principles:
-
Meta-level reasoning โ This prompt teaches the AI to think about prompts rather than simply executing them. This requires a different cognitive mode โ evaluation rather than generation โ and the structured framework guides that shift.
-
Quantified evaluation โ The 10-dimension rubric with numerical scores forces systematic analysis rather than vague impressions. Numbers create accountability ("Specificity: 3/10" is much more actionable than "could be more specific").
-
Mandatory explanations โ Requiring [WHY] tags for every change prevents arbitrary modifications. If the optimizer can't explain why a change improves things, it shouldn't make it.
-
Before/After prediction โ Mental simulation of both prompts' outputs forces the optimizer to verify that changes actually improve results, not just look more professional.
-
Anti-pattern detection โ Explicitly listing common prompt failures (vague instructions, assumed context, conflicting rules) gives the optimizer a checklist to catch issues a general analysis might miss.
-
Professional framing โ The instruction "never be condescending about the original" ensures the output is useful feedback, not criticism โ important when this tool is used by prompt learners.
Extensions & Challengesโ
-
Batch Optimizer โ Modify the prompt to accept 5 prompts at once and produce a comparative analysis, ranking them from strongest to weakest with a unified improvement plan.
-
Domain-Specific Calibration โ Create variants calibrated for specific prompt types: coding prompts, creative writing prompts, analysis prompts, system prompts โ each with domain-specific rubric adjustments.
-
Iterative Optimization Loop โ Feed the optimizer's output back into itself 3 times to see if quality improves with each pass or if there are diminishing returns.
-
Adversarial Testing โ Add a step where the optimizer tries to find edge cases that would break the prompt (unusual inputs, ambiguous requests, boundary conditions) and then hardens the prompt against them.
-
Prompt Style Transfer โ Build a variant that takes a working prompt in one style (e.g., casual and short) and transforms it into another style (e.g., formal and detailed) while preserving functionality.