知識がなくても始められる、AIと共にある豊かな毎日。
AI Coding

MLA-C01 Practice Questions: 30 Questions Weighted by Domain

swiftwand

These 30 practice questions mirror the MLA-C01 domain weighting: 8 for Domain 1, 8 for Domain 2, 7 for Domain 3, and 7 for Domain 4. Each comes with the answer and a short explanation. Treat them as a diagnostic – the domains where you miss most are where your remaining study time should go.

忍者AdMax

How to Use These Questions: Scoring and the Pass Bar

The real exam needs a scaled 720 out of 1000, roughly 70% of scored items. Aim to clear 70% here per domain, not just overall, so a single weak domain does not sink you. Time yourself at about two minutes per question to rehearse exam pace.

Three Principles for Answering

Domain 1: Data Preparation – 8 Questions

Q1. You need a storage format that allows analytical queries to read only a few columns from a huge dataset while compressing well. Which format?
Answer: Parquet. Columnar storage reads selected columns efficiently and compresses better than row formats like CSV.

Q2. A pipeline ingests clickstream events in real time and must deliver them to S3 with minimal management and light transformation. Which service?
Answer: Amazon Data Firehose. It delivers streaming data to S3 or Redshift with buffering and optional transformation, no consumer code required.

Q3. An analyst wants to clean and profile data visually with no code. Which tool?
Answer: AWS Glue DataBrew. It is the visual, no-code option, unlike Glue jobs or EMR.

Q4. You must keep features identical between training and real-time inference. Which service and which layer serves inference?
Answer: SageMaker Feature Store, online store. The online store gives low-latency reads; the offline store backs batch training.

Q5. You must detect class imbalance before training a credit model. Which tool?
Answer: SageMaker Clarify. It measures pre-training bias metrics on the dataset.

Q6. Sensitive medical data must be labeled by trusted staff only. Which Ground Truth workforce?
Answer: Private workforce. Your own employees label sensitive data rather than the public Mechanical Turk workforce.

Q7. A streaming workload needs schema evolution and record-level binary serialization. Which format?
Answer: Avro. Row-based binary with strong schema-evolution support fits record-oriented streaming.

Q8. Categorical features have no ordinal relationship and feed a linear model. Which encoding?
Answer: One-hot encoding. It avoids implying false order, unlike label encoding, which a linear model would misread as magnitude.

Domain 2: Model Development – 8 Questions

Q9. A team needs sentiment analysis quickly without building a model. Which choice?
Answer: Amazon Comprehend. A managed NLP service beats custom modeling on time and cost when it fits.

Q10. Which built-in algorithm is the first pick for tabular classification and regression?
Answer: XGBoost. Gradient-boosted trees are the strong default for structured data.

Q11. You need unsupervised anomaly detection on streaming metrics. Which algorithm?
Answer: Random Cut Forest. It scores outliers without labels.

Q12. Training accuracy is high but validation accuracy is poor. Name two fixes.
Answer: Regularization (L1/L2) or dropout, early stopping, or more data. The symptom is overfitting.

Q13. You want hyperparameter tuning that converges in fewer trials than grid search. Which strategy?
Answer: Bayesian optimization in SageMaker AMT. It learns from past trials to search efficiently.

Q14. Fine-tuning a model on new data erased its earlier capability. What is this called?
Answer: Catastrophic forgetting. Mitigate with lower learning rates, replay of old data, or parameter-efficient fine-tuning.

Q15. A fraud model on highly imbalanced data must catch as many frauds as possible. Which metric matters most?
Answer: Recall. Missing fraud (false negatives) is costly, so recall outweighs raw accuracy.

Q16. You need SHAP-based explanations of individual predictions. Which tool?
Answer: SageMaker Clarify. It provides feature attributions for explainability.

Domain 3: Deployment and Orchestration – 7 Questions

Q17. Traffic is intermittent and unpredictable, and you do not want to pay when idle. Which endpoint?
Answer: Serverless inference. It scales to zero and bills only on use.

Q18. A request carries a 500 MB payload and may take 20 minutes to process. Which endpoint?
Answer: Asynchronous inference. It handles large payloads (up to 1 GB) and long processing with queuing.

Q19. You must score a large, already-collected dataset once, with no persistent endpoint. Which option?
Answer: Batch transform. It runs bulk inference and tears down afterward.

Q20. You host dozens of similar models and want to cut cost by sharing infrastructure. Which approach?
Answer: Multi-model endpoint. It serves many models behind one endpoint, loading on demand.

Q21. A team wants to define infrastructure using Python with loops and conditions. Which IaC tool?
Answer: AWS CDK. It expresses infrastructure in a programming language, unlike declarative CloudFormation.

Q22. You want to release a new model to 10% of traffic first and roll back fast if metrics drop. Which strategy?
Answer: Canary deployment. A small slice tests in production before full rollout.

Q23. Which service chain automates build, test, and deploy from a code commit?
Answer: CodePipeline with CodeBuild and CodeDeploy. Pair with SageMaker Pipelines for ML-specific steps and retraining.

Domain 4: Monitoring, Maintenance, and Security – 7 Questions

Q24. Input data statistics have shifted from the training baseline. Which Model Monitor type detects this?
Answer: Data quality monitoring. It compares live input against the training baseline.

Q25. You need to know who deleted a SageMaker endpoint and when. Which service?
Answer: AWS CloudTrail. It records API activity for audit, unlike CloudWatch metrics.

Q26. You want to find latency bottlenecks across a distributed inference pipeline. Which service?
Answer: AWS X-Ray. It traces requests across components.

Q27. Training jobs are fault-tolerant and you want the lowest compute cost. Which purchase option?
Answer: Spot Instances. Interruptible capacity at a large discount suits checkpointed training.

Q28. A SageMaker job must reach S3 without traversing the public internet. What do you use?
Answer: VPC endpoint (PrivateLink). It keeps traffic on the AWS network from a private subnet.

Q29. A role should access only one S3 bucket and nothing else. Which principle and tool?
Answer: Least privilege via scoped IAM policy (SageMaker Role Manager helps). Grant only what is needed.

Q30. You want alerts before monthly spend crosses a threshold. Which tool?
Answer: AWS Budgets. It triggers alerts at defined cost thresholds, while Cost Explorer analyzes and forecasts.

Identifying Weak Spots and Finishing Strong

Tally your misses by domain. If Domain 1 or Domain 4 is weak, that is common and worth targeted review since together they are more than half the exam. Re-read the relevant domain guide, then redo the questions you missed until the reasoning, not just the answer, feels automatic.

Conclusion: What Lies Beyond the 30 Questions

Thirty questions cannot cover everything, but they reveal where you stand. Pair this set with full-length timed practice exams and hands-on SageMaker work, and you will turn pattern recognition into the confident, fast decisions the real MLA-C01 rewards.

ブラウザだけでできる本格的なAI画像生成【ConoHa AI Canvas】
ABOUT ME
swiftwand
swiftwand
AIを使って、毎日の生活をもっと快適にするアイデアや将来像を発信しています。 初心者にもわかりやすく、すぐに取り入れられる実践的な情報をお届けします。 Sharing ideas and visions for a better daily life with AI. Practical tips that anyone can start using right away.
記事URLをコピーしました