2026.06.17 2026.07.04

MLA-C01 Domain 2 Complete Guide: ML Model Development for 26%

swiftwand

Domain 2, ML Model Development, is worth 26% of MLA-C01 and behaves like a diagnosis-to-prescription exam: you read a symptom in the data or training run and pick the right fix. This guide covers the three task statements – choosing a modeling approach, training and refining, and analyzing performance – with the AWS tools behind each.

The Character of Domain 2: Symptom to Prescription
Task 2.1: Consider Not Building Before You Build
A Map of Built-in Algorithms, from XGBoost to DeepAR
Task 2.2: The Vocabulary of Training – Epochs, Batches, Distributed Training
Fighting Overfitting: Regularization, Dropout, Catastrophic Forgetting
Hyperparameter Tuning: AMT Retires Brute Force
Slimming and Combining Models: Ensembles to the Model Registry
Task 2.3: Choosing Evaluation Metrics, from Confusion Matrix to AUC
Clarify and Model Debugger: Interpretation, Bias, Convergence
High-Frequency Checklist: Self-Diagnosis for Exam Day
Conclusion: Learn Model Development as Diagnostics

忍者AdMax

The Character of Domain 2: Symptom to Prescription

Task	Theme	What is tested
Task 2.1	Choose a modeling approach	Algorithm fit, when to use pre-built AI services, cost and interpretability
Task 2.2	Train and refine models	Training control, regularization, hyperparameter tuning, versioning
Task 2.3	Analyze model performance	Metric selection, baselines, overfitting detection, convergence debugging

Task 2.1: Consider Not Building Before You Build

The cheapest model is the one you do not train. Before custom modeling, the exam wants you to weigh managed AI services such as Amazon Rekognition, Amazon Comprehend, Amazon Transcribe, and Amazon Bedrock. If a ready-made service solves the problem, that is often the right answer on cost and time to value. Custom SageMaker modeling is for when those do not fit.

A Map of Built-in Algorithms, from XGBoost to DeepAR

Algorithm	Task	In a phrase
XGBoost	Classification / regression	Gradient-boosted trees, the first pick for tabular data
Linear Learner	Classification / regression	Fast, interpretable linear baseline
K-Means	Clustering	Unsupervised grouping
PCA	Dimensionality reduction	Compress features, prep for visualization
Random Cut Forest	Anomaly detection	Unsupervised outlier scoring
DeepAR	Time-series forecasting	RNN-based probabilistic forecasts across many series

Task 2.2: The Vocabulary of Training – Epochs, Batches, Distributed Training

Know the levers: an epoch is one pass over the training data, batch size controls how many samples update the weights at once, and the learning rate sets the step size. For large models, distributed training splits the work, with data parallelism replicating the model across GPUs and model parallelism splitting the model itself. SageMaker provides libraries for both.

Fighting Overfitting: Regularization, Dropout, Catastrophic Forgetting

When a model memorizes the training set, reach for L1 and L2 regularization, dropout, early stopping, or more data and augmentation. In transfer learning and fine-tuning, watch for catastrophic forgetting, where new training erases earlier capability. The exam frames these as fixes for a described symptom.

Hyperparameter Tuning: AMT Retires Brute Force

SageMaker Automatic Model Tuning (AMT) searches the hyperparameter space for you. Grid and random search are the baselines, but Bayesian optimization converges faster by learning from past trials, and Hyperband stops weak runs early. Knowing why Bayesian beats grid search on cost is a common question.

Slimming and Combining Models: Ensembles to the Model Registry

Ensembles such as bagging and boosting raise accuracy, while distillation, pruning, and quantization shrink models for cheaper inference. Once a model is ready, the SageMaker Model Registry versions it and gates approval before deployment, linking Domain 2 to the deployment workflow in Domain 3.

Task 2.3: Choosing Evaluation Metrics, from Confusion Matrix to AUC

Pick the metric that matches the business cost. Accuracy misleads on imbalanced data, so reach for precision, recall, F1, and ROC-AUC for classification, and RMSE, MAE, or R-squared for regression. The exam loves scenarios where recall matters more than precision (fraud, disease) or the reverse, and expects you to read the confusion matrix accordingly.

Clarify and Model Debugger: Interpretation, Bias, Convergence

SageMaker Clarify explains predictions with SHAP values and checks post-training bias, while SageMaker Debugger captures tensors during training to diagnose vanishing gradients, overfitting, and stalled convergence. Together they cover interpretability, fairness, and training health.

High-Frequency Checklist: Self-Diagnosis for Exam Day

Conclusion: Learn Model Development as Diagnostics

Domain 2 rewards the engineer who reads symptoms and prescribes the right tool. Internalize the algorithm map, the training levers, the tuning strategies, and the metrics, and 26% of the exam becomes a series of familiar diagnoses.

#AWS #AWS Certification #MLA-C01 #SageMaker #XGBoost

ブラウザだけでできる本格的なAI画像生成【ConoHa AI Canvas】