Catalog
concept#Machine Learning#Data#Analytics#Platform

Machine Learning (ML)

Machine learning extracts patterns and makes predictions from data using statistical models and algorithms.

Machine learning is a subfield of AI that uses statistical models and algorithms to discover patterns in data and make predictions.
Established
High

Classification

  • High
  • Technical
  • Architectural
  • Intermediate

Technical context

Data platform and ETL pipelinesModel serving infrastructure (e.g., KFServing)Monitoring and observability tooling

Principles & goals

Data quality over model complexityEnsure explainability and traceabilityIterative experimentation and validation
Build
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Bias and discrimination from unsuitable training data
  • Overfitting to training data
  • Improper use without monitoring leads to wrong decisions
  • Version data, features and models
  • Continuous monitoring for drift and performance degradation
  • Transparent documentation of data sources and decisions

I/O & resources

  • Raw data and labels for training sets
  • Feature definitions and domain knowledge
  • Infrastructure for training and deployment
  • Trained models and validation reports
  • Metrics for model quality
  • Production-ready inference endpoints

Description

Machine learning is a subfield of AI that uses statistical models and algorithms to discover patterns in data and make predictions. It enables automated decision support and iterative model improvement through training on labeled or unlabeled datasets. Typical applications include forecasting, personalization, and anomaly detection.

  • Automated pattern recognition reduces manual effort
  • Improved predictive accuracy over heuristic rules
  • Scalability for large datasets

  • Dependence on availability and quality of training data
  • Limited explainability of complex models
  • Maintenance effort for data and model drift

  • Accuracy

    Proportion of correctly predicted cases among all cases.

  • F1 score

    Harmonic mean of precision and recall for imbalanced classes.

  • Model latency

    Time between input and prediction during production inference.

Predictive models in wind power

Use of ML to predict performance drops and maintenance needs for turbines.

Personalized recommendations in retail

Recommendation systems improve conversion rates using user signals and browsing data.

Anomaly detection in finance

Use of ML algorithms to detect unusual transaction patterns and fraud attempts.

1

Define problem and target metric

2

Data preparation, exploratory analysis and feature engineering

3

Model selection, training and cross-validation

4

Deployment, monitoring and model maintenance

⚠️ Technical debt & bottlenecks

  • Hard-coded features in production pipelines
  • Insufficient tests for models and data changes
  • Monolithic infrastructure lacking reproducibility
Data quality and availabilityCompute resources and costDomain and ML expertise in team
  • Using historically biased data for credit decisions
  • Automatically blocking users based on unvalidated models
  • Deploying to production without monitoring
  • Underestimating effort for data preparation
  • Ignoring hidden bias in training data
  • Missing governance for model lifecycle
Statistics and machine learningData engineering and feature engineeringModel validation, metrics and monitoring
Scalable data pipelines for continuous trainingRobust monitoring for model and data driftEnsuring traceability and governance
  • Legal data protection requirements
  • Limited amount of labeled training data
  • Infrastructure capacity for training and inference