2026.06.19 2026.07.04

MLA-C01 Domain 4 Complete Guide: Monitoring, Maintenance, and Security 24%

swiftwand

Domain 4, ML Solution Monitoring, Maintenance, and Security, is 24% of MLA-C01 and the part learners most often underestimate. It covers keeping a deployed model healthy, controlling cost, and securing the AWS resources around it. This guide walks the three task statements with the services behind each.

The Big Picture: The Operational Finish, Worth 24%
The Enemy Called Drift: Why Models Quietly Decay
The Four Monitoring Types of Model Monitor
Division of Labor with Clarify and A/B Testing
Task 4.2: Observability Tools – CloudWatch, X-Ray, CloudTrail
Right-Sizing and Purchase Options: The Performance-Cost Sweet Spot
Cost Management Tools: Cost Explorer, Budgets, Trusted Advisor
Task 4.3: IAM Least Privilege and SageMaker Role Manager
Network Isolation: VPC, Subnets, Security Groups
High-Frequency Checklist: Self-Diagnosis for Exam Day
Conclusion: The Last Piece That Proves ML Keeps Running

忍者AdMax

The Big Picture: The Operational Finish, Worth 24%

Task	Theme	What is tested
Task 4.1	Monitor model inference	Drift detection, Model Monitor, A/B testing
Task 4.2	Monitor and optimize infrastructure and cost	Observability tools, right-sizing, purchase options
Task 4.3	Secure AWS resources	IAM least privilege, network isolation, auditing

The Enemy Called Drift: Why Models Quietly Decay

A model that was accurate at launch degrades as the world changes. Data drift is a shift in the input distribution; concept drift is a change in the relationship between inputs and the target. Because the decay is silent, you need automated monitoring rather than waiting for users to complain.

The Four Monitoring Types of Model Monitor

Monitoring type	What it watches	How it works
Data quality	Drift in input data statistics	Compares a training-time baseline against live input
Model quality	Drop in prediction accuracy	Matches predictions against actual ground-truth labels
Bias drift	Change in bias in live predictions	Monitored periodically with SageMaker Clarify metrics
Feature attribution drift	Change in each feature contribution	Runs Clarify feature-attribution analysis on a schedule

Division of Labor with Clarify and A/B Testing

Model Monitor schedules the checks; Clarify supplies the bias and explainability metrics those checks use. To compare a new model against the current one in production, SageMaker production variants let you split traffic for A/B testing and shift weights gradually once the challenger proves itself.

Task 4.2: Observability Tools – CloudWatch, X-Ray, CloudTrail

Separate the three by purpose: CloudWatch collects metrics, logs, and alarms for how the system performs; X-Ray traces requests across distributed components to find latency bottlenecks; CloudTrail records who did what for audit and governance. The exam often asks which one answers a specific operational question.

Right-Sizing and Purchase Options: The Performance-Cost Sweet Spot

Match the instance to the workload and choose the right pricing. Use Inference Recommender to right-size endpoints, Spot Instances for fault-tolerant training, Savings Plans for steady usage, and consider Inferentia and Trainium for cost-efficient ML compute. Picking Spot for training but on-demand or savings plans for production endpoints is a common right answer.

Cost Management Tools: Cost Explorer, Budgets, Trusted Advisor

Cost Explorer visualizes and forecasts spend, AWS Budgets sends alerts when you approach a threshold, and Trusted Advisor flags idle or underused resources. Together they keep an ML platform from quietly overspending.

Task 4.3: IAM Least Privilege and SageMaker Role Manager

Security starts with least privilege: grant only the permissions a role needs. SageMaker Role Manager helps build scoped roles from common ML personas, and IAM policies, conditions, and resource scoping keep access tight. The exam rewards the most restrictive option that still works.

Network Isolation: VPC, Subnets, Security Groups

Run SageMaker in a VPC to control traffic, use private subnets with no internet route for sensitive workloads, and reach AWS services through VPC endpoints (PrivateLink) so data never traverses the public internet. Security groups and KMS encryption at rest and in transit complete the picture.

High-Frequency Checklist: Self-Diagnosis for Exam Day

Conclusion: The Last Piece That Proves ML Keeps Running

Domain 4 is where a model becomes a dependable production system. Drift monitoring, observability, cost control, and security are 24% of the exam and the difference between a demo and a service. Do not skim them.

#AWS #AWS Certification #MLA-C01 #MLOps #SageMaker

ブラウザだけでできる本格的なAI画像生成【ConoHa AI Canvas】