2026.03.28 2026.03.28

CCA Exam Prep D4: Prompt Engineering and Structured Output Design Patterns

swiftwand

Additionally, “Review conservatively.” You add this single instruction to the system prompt, hoping false positives will decrease. But the results don’t change. The code review tool still flags variable naming preferences, and developers start ignoring review results entirely. If you properly understand CCA exam prompt engineering and structured output, this failure pattern could have been avoided from the start.

In our previous article “CCA Exam D3: Claude Code Configuration & Workflows,” we covered CLAUDE.md hierarchy and custom slash commands. This article moves to Domain 4, covering specific prompt design techniques through structured output, batch processing, and multi-instance review. This domain accounts for 20% of the total score, testing implementation-focused design decisions.

Why Ambiguous Instructions Don’t Work — T4.1: Improving Precision with Explicit Criteria
- The Mechanism by Which False Positives Destroy Trust
Teaching by Pattern — T4.2: Few-Shot Prompting
- Four Principles of Few-Shot Design
- Few-Shot Implementation with Messages API
Structurally Eliminating JSON Syntax Errors — T4.3: tool_use + JSON Schema
Turning Failures into Feedback — T4.4: Validation-Retry Loops
50% Cost Reduction Conditions — T4.5: Batch Processing Strategy
- Message Batches API Characteristics
- Batch Suitability Criteria
The Limits of AI That Can’t Question Itself — T4.6: Multi-Instance Review
D4 Pitfalls: Common Design Mistakes and Mitigations
CCA Exam Practice: Domain 4 Mock Questions
Domain 4 Comprehensive Checklist — Final Pre-Exam Review

忍者AdMax

Why Ambiguous Instructions Don’t Work — T4.1: Improving Precision with Explicit Criteria

For example, the first concept to grasp in the CCA exam prompt engineering and structured output domain is the difference between ambiguous instructions and explicit criteria.

Therefore, abstract instructions like “be conservative” or “review carefully” don’t improve model precision. The model has no definition of “conservative.” Without unambiguous criteria for what to report and what to ignore, interpretations will vary.

Explicit criteria concretely separate what to report from what to skip:

You are a code review assistant. Follow these criteria:

【Report (always flag)】
- Security vulnerabilities (SQL injection, XSS, auth bypass)
- Bugs that could cause production crashes
- Logic errors with data loss risk

【Ignore (do not flag)】
- Variable naming or formatting preferences
- Suggestions to add/remove comments
- Patterns consistently used in the existing codebase

【When uncertain】
- Set confidence field to "medium" and include reasoning

The Mechanism by Which False Positives Destroy Trust

When a code review tool has a 40% false positive rate, developers learn to ignore all results—including genuine security issues. This “cry wolf” effect is more dangerous than missing a few issues. The solution isn’t adjusting temperature; it’s defining explicit boundaries between reportable and ignorable items.

Teaching by Pattern — T4.2: Few-Shot Prompting

Four Principles of Few-Shot Design

Include boundary cases: Show examples of unclear inputs, missing data, and edge cases
Keep it minimal (2–4 examples): More examples don’t always improve accuracy—they can introduce noise
Show reasoning: Include the “why” behind each classification decision
Ensure diversity: Select examples with different patterns and formats

Few-Shot Implementation with Messages API

messages = [
    {
        "role": "user",
        "content": "Extract data from this document: [Invoice A - standard format]"
    },
    {
        "role": "assistant",
        "content": '{"vendor": "ABC Corp", "total": 15000, "currency": "JPY", "confidence": "high"}'
    },
    {
        "role": "user",
        "content": "Extract data from this document: [Invoice B - handwritten scan, partially illegible]"
    },
    {
        "role": "assistant",
        "content": '{"vendor": "XYZ Ltd", "total": null, "currency": "USD", "confidence": "low", "note": "Amount field illegible"}'
    },
    {
        "role": "user",
        "content": "Extract data from this document: [actual document to process]"
    }
]

In conclusion, the second example is critically important. It explicitly shows the pattern “when unreadable, set to null and explain in note.” This structurally reduces the risk of the model fabricating values. Showing the “null when unknown” pattern in few-shot examples is a practical hallucination reduction technique tested in exams.

Key Point

Structurally Eliminating JSON Syntax Errors — T4.3: tool_use + JSON Schema

Method Comparison: Why tool_use Is Superior

Method	Syntax Error Elimination	Schema Enforcement	Implementation Cost
Prompt instruction (“return JSON”)	No guarantee	None	Low
Few-shot examples	Improved but not guaranteed	Weak	Medium
tool_use + JSON Schema	100% guaranteed	Strict	Medium

Three Modes of tool_choice

Mode	Behavior	Use Case
`auto`	Model decides whether to call tools	General conversation with optional tools
`any`	Must call a tool (eliminates text-only responses)	Guaranteed structured output
`tool`	Must call a specific named tool	Fixed pipeline steps

Schema Design Best Practices

tools = [{
    "name": "extract_invoice",
    "description": "Extract structured data from an invoice",
    "input_schema": {
        "type": "object",
        "properties": {
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "currency": {
                "type": "string",
                "enum": ["USD", "EUR", "JPY", "other"]
            },
            "currency_detail": {
                "type": "string",
                "description": "Currency code when currency is 'other'"
            },
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "amount": {"type": ["number", "null"]},
                        "confidence": {
                            "type": "string",
                            "enum": ["high", "medium", "low", "unclear"]
                        }
                    }
                }
            }
        },
        "required": ["vendor_name", "total_amount", "currency", "line_items"]
    }
}]

Key Point

Turning Failures into Feedback — T4.4: Validation-Retry Loops

retry-with-error-feedback Pattern

When validation fails, feed the specific error back to the model for correction rather than silently retrying.

for attempt in range(max_retries):
    result = call_model(messages)
    errors = validate(result)
    if not errors:
        return result
    messages.append({
        "role": "user",
        "content": f"The following validation errors were detected:\n{errors}\nPlease fix and re-extract."
    })

Key Point

When Retries Are Effective vs Ineffective

Case	Retry Effective	Reason
JSON format mismatch	Yes	Model can correct format
Missing required field extraction	Yes	Attention prompt improves extraction
Information doesn’t exist in source	No	Information won’t appear no matter how many retries
Facts outside model’s knowledge	No	Hallucination risk increases

Specifically, if the same error persists after 3 retries, the most likely cause is “the information doesn’t exist in the source document.” It’s not a retry count problem.

Semantic Validation

Syntactically correct JSON can still be semantically wrong. When calculated_total (sum of line items) and stated_total (document total) don’t match, rather than auto-selecting one, the correct approach is to flag the discrepancy, preserve both values, and route to human review.

50% Cost Reduction Conditions — T4.5: Batch Processing Strategy

Message Batches API Characteristics

Item	Detail
Cost reduction	50% discount from standard pricing
Processing time	Up to 24 hours (no latency SLA)
Constraint	Multi-turn tool calls not supported
Identification	Match requests and responses via custom_id

Batch Suitability Criteria

Specifically, a frequently tested exam scenario is “which of two workloads should be batched?” The criterion boils down to “does it block developers or users?”

Use Case	Batch Suitable	Reason
Nightly report generation	Yes	No latency requirement, scheduled processing
Weekly audit reports	Yes	No latency requirement, bulk processing
Pre-merge PR checks	No	Blocks developers
Real-time chat responses	No	Immediate response required

Key Point

The Limits of AI That Can’t Question Itself — T4.6: Multi-Instance Review

Structural Limitations of Self-Review

Reviewing output in the same session that generated it creates confirmation bias. The prior reasoning context makes the model unlikely to question its own decisions. This is why CI/CD code reviews should run in independent instances.

Multi-Pass Review Strategy

Pass 1 — Local file-level analysis: Each file is reviewed independently for local issues (bugs, security, style)
Pass 2 — Cross-file integration pass: Aggregates Pass 1 results to detect inter-file dependency issues, API contract violations, type mismatches, and import gaps

On the other hand, when a 14-file PR reviewed in a single pass misses inter-file dependency errors, the improvement is to “split into file-level local analysis + cross-file integration pass.”

Multi-Instance Review Architecture

Similarly, an orchestrator aggregates and judges review results. Instance A handles local analysis of files 1–5, Instance B handles files 6–14, and Instance C handles cross-file integration analysis. Each instance operates in an independent context without prior reasoning. Local analysis runs in parallel for throughput, while integration analysis receives all local results as input.

D4 Pitfalls: Common Design Mistakes and Mitigations

In particular, the CCA exam prompt engineering and structured output domain tests the ability to identify “designs that look correct but are actually inappropriate.” Here we organize common Domain 4 design mistakes and their mitigations.

Mistake 1: Flooding with Few-Shot Examples

In conclusion, the intuition that “more examples = better accuracy” is wrong. Injecting 15 few-shot examples not only consumes the context window but introduces noise from subtle contradictions between examples. The recommendation is 2–4 targeted examples focusing on boundary cases.

Example Count	Effect	Risk
0	Unstable formatting	Low output consistency
2–4	Optimal accuracy and consistency	Requires careful example selection
10+	No improvement or decline	Context pressure, increased noise

Mistake 2: Assuming tool_use + JSON Schema Guarantees Semantic Correctness

tool_use + JSON Schema’s Strict mode eliminates JSON syntax errors. However, it doesn’t guarantee the correctness of field values. Whether an amount field contains 12800 or 128000 cannot be determined by schema alone. The mitigation is implementing semantic validation separately.

Mistake 3: Assuming Retries Always Improve Results

Furthermore, retries are effective for format errors and extraction omissions. But when the source document lacks the information, retrying increases hallucination risk. After 3 failed retries, escalate to human review rather than continuing.

Mistake 4: Relying on Self-Review for Quality Assurance

Self-review within the same session is structurally weak due to confirmation bias. Code review should run in independent instances via CI/CD integration.

Mistake 5: Using Batch API for Real-Time Processing

The Batch API has no latency SLA and can take up to 24 hours. Using it for developer-blocking or user-facing processes is inappropriate.

Mistake 6: Trying to Fix Ambiguous Instructions with Temperature Adjustment

Temperature controls randomness, not precision. If the model is flagging variable naming preferences, lowering temperature won’t help. The fix is explicit criteria defining what to report and what to ignore.

Design Mistake Avoidance Checklist

Checkpoint	Mistake	Correct Approach
Are few-shot examples boundary-focused and minimal?	Bulk injection	2–4 carefully selected boundary cases
Is semantic validation implemented?	Schema-only confidence	Two-layer: syntax + semantic validation
Are retry preconditions checked?	Unconditional retry	Verify source data existence first
Are reviews run in independent instances?	Self-review	CI/CD-integrated independent review
Is batch API suitability assessed?	Batch everything	Sort by blocking criteria
Are output stabilization methods correct?	Temperature adjustment	Explicit criteria or few-shot

CCA Exam Practice: Domain 4 Mock Questions

Therefore, here are mock questions to verify understanding of CCA exam prompt engineering and structured output.

Q1: A code review tool has a 40% false positive rate. What is the most effective first step?

A) Fine-tune the model B) Identify high false-positive categories and add explicit criteria C) Lower temperature D) Limit output tokens

Correct: B — Adding explicit criteria is the most direct and cost-effective improvement.

Q2: Despite detailed prompt instructions, output format varies between requests. Most effective fix?

A) Lengthen system prompt B) Add 2–3 few-shot examples C) Set temperature to 0 D) Increase max_tokens

Correct: B — Few-shot is most effective for format consistency.

Key Point

Q3: You want to completely eliminate JSON syntax errors in structured output. Most appropriate method?

A) Prompt instruction for JSON format B) tool_use + JSON Schema C) Parse output with regex D) Show JSON examples via few-shot

Correct: B — Only tool_use + JSON Schema structurally eliminates syntax errors.

Q4: Primary reason for setting tool_choice to “any”?

A) Cost reduction B) Faster response C) Guaranteed structured output D) Simplified error handling

Correct: C — “any” eliminates text-only responses and guarantees tool calls.

Q5: Same error persists after 3 retries. Most likely cause?

A) Insufficient retry count B) Information doesn’t exist in source document C) Model version is outdated D) Temperature is too high

Correct: B — Persistent errors typically indicate missing source data, not a retry count issue.

Domain 4 Comprehensive Checklist — Final Pre-Exam Review

Ambiguous instructions replaced with explicit criteria (report vs ignore boundaries)
Few-shot examples are minimal (2–4) and include boundary cases
tool_use + JSON Schema used for guaranteed structured output
Semantic validation implemented separately from schema validation
Retry loops include error feedback and source data existence checks
Batch API used only for non-blocking, latency-tolerant workloads
Code review runs in independent instances, not self-review
Multi-pass review strategy for large PRs (local + integration passes)

Key Point: This article covers CCA exam prompt engineering and structured output design patterns. For practical preparation, combine hands-on learning through the Anthropic Academy official course with practice on mock exam sites. Also refer to our Complete Guide (Day 1) for the recommended “three-pillar” study strategy.

For more information, visit Anthropic Prompt Engineering.

ブラウザだけでできる本格的なAI画像生成【ConoHa AI Canvas】

ABOUT ME

CCA Exam Prep D4: Prompt Engineering and Structured Output Design Patterns

Why Ambiguous Instructions Don’t Work — T4.1: Improving Precision with Explicit Criteria

The Mechanism by Which False Positives Destroy Trust

Teaching by Pattern — T4.2: Few-Shot Prompting

Four Principles of Few-Shot Design

Few-Shot Implementation with Messages API

Structurally Eliminating JSON Syntax Errors — T4.3: tool_use + JSON Schema

Method Comparison: Why tool_use Is Superior

Three Modes of tool_choice

Schema Design Best Practices

Turning Failures into Feedback — T4.4: Validation-Retry Loops

retry-with-error-feedback Pattern

When Retries Are Effective vs Ineffective

Semantic Validation

50% Cost Reduction Conditions — T4.5: Batch Processing Strategy

Message Batches API Characteristics

Batch Suitability Criteria

The Limits of AI That Can’t Question Itself — T4.6: Multi-Instance Review

Structural Limitations of Self-Review

Multi-Pass Review Strategy

Multi-Instance Review Architecture

D4 Pitfalls: Common Design Mistakes and Mitigations

Mistake 1: Flooding with Few-Shot Examples

Mistake 2: Assuming tool_use + JSON Schema Guarantees Semantic Correctness

Mistake 3: Assuming Retries Always Improve Results

Mistake 4: Relying on Self-Review for Quality Assurance

Mistake 5: Using Batch API for Real-Time Processing

Mistake 6: Trying to Fix Ambiguous Instructions with Temperature Adjustment

Design Mistake Avoidance Checklist

CCA Exam Practice: Domain 4 Mock Questions

Domain 4 Comprehensive Checklist — Final Pre-Exam Review

3D Generative AI Foundation Models 2026: How Shape-Understanding AI Is Changing 3D Printing

CCA Foundations Exam Complete Guide 2026: The Fastest Path to Claude Certified Architect

CCA Exam Prep D1 Part 2: Multi-Step Workflows, Agent SDK Hooks, and Task Decomposition

Never Wonder "How Much Should I Charge?" Again: AI-Optimized 3D Print Cost Calculation

Your First 3D Print: AI-Assisted Zero-Failure Guide for Beginners

CCA Exam Prep D2: Tool Design and MCP Integration to Secure 18% of the Score