知識がなくても始められる、AIと共にある豊かな毎日。
Blog

CCA Exam Prep D4: Prompt Engineering and Structured Output Design Patterns

プロンプトエンジニアリング
swiftwand

Additionally, “Review conservatively.” You add this single instruction to the system prompt, hoping false positives will decrease. But the results don’t change. The code review tool still flags variable naming preferences, and developers start ignoring review results entirely. If you properly understand CCA exam prompt engineering and structured output, this failure pattern could have been avoided from the start.

In our previous article “CCA Exam D3: Claude Code Configuration & Workflows,” we covered CLAUDE.md hierarchy and custom slash commands. This article moves to Domain 4, covering specific prompt design techniques through structured output, batch processing, and multi-instance review. This domain accounts for 20% of the total score, testing implementation-focused design decisions.

忍者AdMax

Why Ambiguous Instructions Don’t Work — T4.1: Improving Precision with Explicit Criteria

For example, the first concept to grasp in the CCA exam prompt engineering and structured output domain is the difference between ambiguous instructions and explicit criteria.

Therefore, abstract instructions like “be conservative” or “review carefully” don’t improve model precision. The model has no definition of “conservative.” Without unambiguous criteria for what to report and what to ignore, interpretations will vary.

Explicit criteria concretely separate what to report from what to skip:

You are a code review assistant. Follow these criteria:

【Report (always flag)】
- Security vulnerabilities (SQL injection, XSS, auth bypass)
- Bugs that could cause production crashes
- Logic errors with data loss risk

【Ignore (do not flag)】
- Variable naming or formatting preferences
- Suggestions to add/remove comments
- Patterns consistently used in the existing codebase

【When uncertain】
- Set confidence field to "medium" and include reasoning

The Mechanism by Which False Positives Destroy Trust

When a code review tool has a 40% false positive rate, developers learn to ignore all results—including genuine security issues. This “cry wolf” effect is more dangerous than missing a few issues. The solution isn’t adjusting temperature; it’s defining explicit boundaries between reportable and ignorable items.

Teaching by Pattern — T4.2: Few-Shot Prompting

Four Principles of Few-Shot Design

  1. Include boundary cases: Show examples of unclear inputs, missing data, and edge cases
  2. Keep it minimal (2–4 examples): More examples don’t always improve accuracy—they can introduce noise
  3. Show reasoning: Include the “why” behind each classification decision
  4. Ensure diversity: Select examples with different patterns and formats

Few-Shot Implementation with Messages API

messages = [
    {
        "role": "user",
        "content": "Extract data from this document: [Invoice A - standard format]"
    },
    {
        "role": "assistant",
        "content": '{"vendor": "ABC Corp", "total": 15000, "currency": "JPY", "confidence": "high"}'
    },
    {
        "role": "user",
        "content": "Extract data from this document: [Invoice B - handwritten scan, partially illegible]"
    },
    {
        "role": "assistant",
        "content": '{"vendor": "XYZ Ltd", "total": null, "currency": "USD", "confidence": "low", "note": "Amount field illegible"}'
    },
    {
        "role": "user",
        "content": "Extract data from this document: [actual document to process]"
    }
]

In conclusion, the second example is critically important. It explicitly shows the pattern “when unreadable, set to null and explain in note.” This structurally reduces the risk of the model fabricating values. Showing the “null when unknown” pattern in few-shot examples is a practical hallucination reduction technique tested in exams.

Key Point

Structurally Eliminating JSON Syntax Errors — T4.3: tool_use + JSON Schema

Method Comparison: Why tool_use Is Superior

MethodSyntax Error EliminationSchema EnforcementImplementation Cost
Prompt instruction (“return JSON”)No guaranteeNoneLow
Few-shot examplesImproved but not guaranteedWeakMedium
tool_use + JSON Schema100% guaranteedStrictMedium

Three Modes of tool_choice

ModeBehaviorUse Case
autoModel decides whether to call toolsGeneral conversation with optional tools
anyMust call a tool (eliminates text-only responses)Guaranteed structured output
toolMust call a specific named toolFixed pipeline steps

Schema Design Best Practices

tools = [{
    "name": "extract_invoice",
    "description": "Extract structured data from an invoice",
    "input_schema": {
        "type": "object",
        "properties": {
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "currency": {
                "type": "string",
                "enum": ["USD", "EUR", "JPY", "other"]
            },
            "currency_detail": {
                "type": "string",
                "description": "Currency code when currency is 'other'"
            },
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "amount": {"type": ["number", "null"]},
                        "confidence": {
                            "type": "string",
                            "enum": ["high", "medium", "low", "unclear"]
                        }
                    }
                }
            }
        },
        "required": ["vendor_name", "total_amount", "currency", "line_items"]
    }
}]

Key Point

Turning Failures into Feedback — T4.4: Validation-Retry Loops

retry-with-error-feedback Pattern

When validation fails, feed the specific error back to the model for correction rather than silently retrying.

for attempt in range(max_retries):
    result = call_model(messages)
    errors = validate(result)
    if not errors:
        return result
    messages.append({
        "role": "user",
        "content": f"The following validation errors were detected:\n{errors}\nPlease fix and re-extract."
    })

Key Point

When Retries Are Effective vs Ineffective

CaseRetry EffectiveReason
JSON format mismatchYesModel can correct format
Missing required field extractionYesAttention prompt improves extraction
Information doesn’t exist in sourceNoInformation won’t appear no matter how many retries
Facts outside model’s knowledgeNoHallucination risk increases

Specifically, if the same error persists after 3 retries, the most likely cause is “the information doesn’t exist in the source document.” It’s not a retry count problem.

Semantic Validation

Syntactically correct JSON can still be semantically wrong. When calculated_total (sum of line items) and stated_total (document total) don’t match, rather than auto-selecting one, the correct approach is to flag the discrepancy, preserve both values, and route to human review.

50% Cost Reduction Conditions — T4.5: Batch Processing Strategy

Message Batches API Characteristics

ItemDetail
Cost reduction50% discount from standard pricing
Processing timeUp to 24 hours (no latency SLA)
ConstraintMulti-turn tool calls not supported
IdentificationMatch requests and responses via custom_id

Batch Suitability Criteria

Specifically, a frequently tested exam scenario is “which of two workloads should be batched?” The criterion boils down to “does it block developers or users?”

Use CaseBatch SuitableReason
Nightly report generationYesNo latency requirement, scheduled processing
Weekly audit reportsYesNo latency requirement, bulk processing
Pre-merge PR checksNoBlocks developers
Real-time chat responsesNoImmediate response required

Key Point

The Limits of AI That Can’t Question Itself — T4.6: Multi-Instance Review

Structural Limitations of Self-Review

Reviewing output in the same session that generated it creates confirmation bias. The prior reasoning context makes the model unlikely to question its own decisions. This is why CI/CD code reviews should run in independent instances.

Multi-Pass Review Strategy

  • Pass 1 — Local file-level analysis: Each file is reviewed independently for local issues (bugs, security, style)
  • Pass 2 — Cross-file integration pass: Aggregates Pass 1 results to detect inter-file dependency issues, API contract violations, type mismatches, and import gaps

On the other hand, when a 14-file PR reviewed in a single pass misses inter-file dependency errors, the improvement is to “split into file-level local analysis + cross-file integration pass.”

Multi-Instance Review Architecture

Similarly, an orchestrator aggregates and judges review results. Instance A handles local analysis of files 1–5, Instance B handles files 6–14, and Instance C handles cross-file integration analysis. Each instance operates in an independent context without prior reasoning. Local analysis runs in parallel for throughput, while integration analysis receives all local results as input.

D4 Pitfalls: Common Design Mistakes and Mitigations

In particular, the CCA exam prompt engineering and structured output domain tests the ability to identify “designs that look correct but are actually inappropriate.” Here we organize common Domain 4 design mistakes and their mitigations.

Mistake 1: Flooding with Few-Shot Examples

In conclusion, the intuition that “more examples = better accuracy” is wrong. Injecting 15 few-shot examples not only consumes the context window but introduces noise from subtle contradictions between examples. The recommendation is 2–4 targeted examples focusing on boundary cases.

Example CountEffectRisk
0Unstable formattingLow output consistency
2–4Optimal accuracy and consistencyRequires careful example selection
10+No improvement or declineContext pressure, increased noise

Mistake 2: Assuming tool_use + JSON Schema Guarantees Semantic Correctness

tool_use + JSON Schema’s Strict mode eliminates JSON syntax errors. However, it doesn’t guarantee the correctness of field values. Whether an amount field contains 12800 or 128000 cannot be determined by schema alone. The mitigation is implementing semantic validation separately.

Mistake 3: Assuming Retries Always Improve Results

Furthermore, retries are effective for format errors and extraction omissions. But when the source document lacks the information, retrying increases hallucination risk. After 3 failed retries, escalate to human review rather than continuing.

Mistake 4: Relying on Self-Review for Quality Assurance

Self-review within the same session is structurally weak due to confirmation bias. Code review should run in independent instances via CI/CD integration.

Mistake 5: Using Batch API for Real-Time Processing

The Batch API has no latency SLA and can take up to 24 hours. Using it for developer-blocking or user-facing processes is inappropriate.

Mistake 6: Trying to Fix Ambiguous Instructions with Temperature Adjustment

Temperature controls randomness, not precision. If the model is flagging variable naming preferences, lowering temperature won’t help. The fix is explicit criteria defining what to report and what to ignore.

Design Mistake Avoidance Checklist

CheckpointMistakeCorrect Approach
Are few-shot examples boundary-focused and minimal?Bulk injection2–4 carefully selected boundary cases
Is semantic validation implemented?Schema-only confidenceTwo-layer: syntax + semantic validation
Are retry preconditions checked?Unconditional retryVerify source data existence first
Are reviews run in independent instances?Self-reviewCI/CD-integrated independent review
Is batch API suitability assessed?Batch everythingSort by blocking criteria
Are output stabilization methods correct?Temperature adjustmentExplicit criteria or few-shot

CCA Exam Practice: Domain 4 Mock Questions

Therefore, here are mock questions to verify understanding of CCA exam prompt engineering and structured output.

Q1: A code review tool has a 40% false positive rate. What is the most effective first step?

A) Fine-tune the model B) Identify high false-positive categories and add explicit criteria C) Lower temperature D) Limit output tokens

Correct: B — Adding explicit criteria is the most direct and cost-effective improvement.

Q2: Despite detailed prompt instructions, output format varies between requests. Most effective fix?

A) Lengthen system prompt B) Add 2–3 few-shot examples C) Set temperature to 0 D) Increase max_tokens

Correct: B — Few-shot is most effective for format consistency.

Key Point

Q3: You want to completely eliminate JSON syntax errors in structured output. Most appropriate method?

A) Prompt instruction for JSON format B) tool_use + JSON Schema C) Parse output with regex D) Show JSON examples via few-shot

Correct: B — Only tool_use + JSON Schema structurally eliminates syntax errors.

Q4: Primary reason for setting tool_choice to “any”?

A) Cost reduction B) Faster response C) Guaranteed structured output D) Simplified error handling

Correct: C — “any” eliminates text-only responses and guarantees tool calls.

Q5: Same error persists after 3 retries. Most likely cause?

A) Insufficient retry count B) Information doesn’t exist in source document C) Model version is outdated D) Temperature is too high

Correct: B — Persistent errors typically indicate missing source data, not a retry count issue.

Domain 4 Comprehensive Checklist — Final Pre-Exam Review

  • Ambiguous instructions replaced with explicit criteria (report vs ignore boundaries)
  • Few-shot examples are minimal (2–4) and include boundary cases
  • tool_use + JSON Schema used for guaranteed structured output
  • Semantic validation implemented separately from schema validation
  • Retry loops include error feedback and source data existence checks
  • Batch API used only for non-blocking, latency-tolerant workloads
  • Code review runs in independent instances, not self-review
  • Multi-pass review strategy for large PRs (local + integration passes)

Key Point: This article covers CCA exam prompt engineering and structured output design patterns. For practical preparation, combine hands-on learning through the Anthropic Academy official course with practice on mock exam sites. Also refer to our Complete Guide (Day 1) for the recommended “three-pillar” study strategy.

For more information, visit Anthropic Prompt Engineering.

ブラウザだけでできる本格的なAI画像生成【ConoHa AI Canvas】
ABOUT ME
swiftwand
swiftwand
AIを使って、毎日の生活をもっと快適にするアイデアや将来像を発信しています。 初心者にもわかりやすく、すぐに取り入れられる実践的な情報をお届けします。 Sharing ideas and visions for a better daily life with AI. Practical tips that anyone can start using right away.
記事URLをコピーしました