MLA-C01 Domain 4 Complete Guide: Monitoring, Maintenance, and Security 24%

Domain 4, ML Solution Monitoring, Maintenance, and Security, is 24% of MLA-C01 and the part learners most often underestimate. It covers keeping a deployed model healthy, controlling cost, and securing the AWS resources around it. This guide walks the three task statements with the services behind each.
- The Big Picture: The Operational Finish, Worth 24%
- The Enemy Called Drift: Why Models Quietly Decay
- The Four Monitoring Types of Model Monitor
- Division of Labor with Clarify and A/B Testing
- Task 4.2: Observability Tools – CloudWatch, X-Ray, CloudTrail
- Right-Sizing and Purchase Options: The Performance-Cost Sweet Spot
- Cost Management Tools: Cost Explorer, Budgets, Trusted Advisor
- Task 4.3: IAM Least Privilege and SageMaker Role Manager
- Network Isolation: VPC, Subnets, Security Groups
- High-Frequency Checklist: Self-Diagnosis for Exam Day
- Conclusion: The Last Piece That Proves ML Keeps Running
The Big Picture: The Operational Finish, Worth 24%
| Task | Theme | What is tested |
| Task 4.1 | Monitor model inference | Drift detection, Model Monitor, A/B testing |
| Task 4.2 | Monitor and optimize infrastructure and cost | Observability tools, right-sizing, purchase options |
| Task 4.3 | Secure AWS resources | IAM least privilege, network isolation, auditing |
The Enemy Called Drift: Why Models Quietly Decay
A model that was accurate at launch degrades as the world changes. Data drift is a shift in the input distribution; concept drift is a change in the relationship between inputs and the target. Because the decay is silent, you need automated monitoring rather than waiting for users to complain.
The Four Monitoring Types of Model Monitor
| Monitoring type | What it watches | How it works |
| Data quality | Drift in input data statistics | Compares a training-time baseline against live input |
| Model quality | Drop in prediction accuracy | Matches predictions against actual ground-truth labels |
| Bias drift | Change in bias in live predictions | Monitored periodically with SageMaker Clarify metrics |
| Feature attribution drift | Change in each feature contribution | Runs Clarify feature-attribution analysis on a schedule |
Division of Labor with Clarify and A/B Testing
Model Monitor schedules the checks; Clarify supplies the bias and explainability metrics those checks use. To compare a new model against the current one in production, SageMaker production variants let you split traffic for A/B testing and shift weights gradually once the challenger proves itself.
Task 4.2: Observability Tools – CloudWatch, X-Ray, CloudTrail
Separate the three by purpose: CloudWatch collects metrics, logs, and alarms for how the system performs; X-Ray traces requests across distributed components to find latency bottlenecks; CloudTrail records who did what for audit and governance. The exam often asks which one answers a specific operational question.
Right-Sizing and Purchase Options: The Performance-Cost Sweet Spot
Match the instance to the workload and choose the right pricing. Use Inference Recommender to right-size endpoints, Spot Instances for fault-tolerant training, Savings Plans for steady usage, and consider Inferentia and Trainium for cost-efficient ML compute. Picking Spot for training but on-demand or savings plans for production endpoints is a common right answer.
Cost Management Tools: Cost Explorer, Budgets, Trusted Advisor
Cost Explorer visualizes and forecasts spend, AWS Budgets sends alerts when you approach a threshold, and Trusted Advisor flags idle or underused resources. Together they keep an ML platform from quietly overspending.
Task 4.3: IAM Least Privilege and SageMaker Role Manager
Security starts with least privilege: grant only the permissions a role needs. SageMaker Role Manager helps build scoped roles from common ML personas, and IAM policies, conditions, and resource scoping keep access tight. The exam rewards the most restrictive option that still works.
Network Isolation: VPC, Subnets, Security Groups
Run SageMaker in a VPC to control traffic, use private subnets with no internet route for sensitive workloads, and reach AWS services through VPC endpoints (PrivateLink) so data never traverses the public internet. Security groups and KMS encryption at rest and in transit complete the picture.
High-Frequency Checklist: Self-Diagnosis for Exam Day
Conclusion: The Last Piece That Proves ML Keeps Running
Domain 4 is where a model becomes a dependable production system. Drift monitoring, observability, cost control, and security are 24% of the exam and the difference between a demo and a service. Do not skim them.





