Cross-Industry Standard Process for Data Mining (CRISP‑DM)
CRISP‑DM is an established, phase-based process model for data mining projects that structures work, outputs, and responsibilities.
Classification
- ComplexityMedium
- Impact areaBusiness
- Decision typeOrganizational
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Overspecifying phases leads to documentation overhead without value.
- Ignoring data governance leads to compliance and quality issues.
- Lack of stakeholder involvement can result in wrong problem definitions.
- Engage business stakeholders early to validate objectives.
- Automate data quality checks in every iteration.
- Ensure clear handovers and documentation between phases.
I/O & resources
- Business goals and acceptance criteria
- Raw data from relevant sources
- Domain knowledge and expert input
- Documented project objective and success criteria
- Prepared training and test datasets
- Validated model and rollout plan
Description
CRISP‑DM is a cyclic, industry-neutral process model for data mining projects. It defines six phases—business understanding, data understanding, data preparation, modeling, evaluation and deployment—to organize work, roles and deliverables. Teams use it to align stakeholders, reduce risk and iterate from business goals to operational models.
✔Benefits
- Standardized structure reduces planning and communication effort.
- Promotes reproducibility and documented handovers between teams.
- Helps identify and mitigate business risks early.
✖Limitations
- Not prescriptive for technical implementations or tools.
- May be perceived as too sequential in highly agile environments.
- Does not detail modern model operationalization (MLOps).
Trade-offs
Metrics
- Model accuracy
Measurement of predictive quality using appropriate metrics (e.g., AUC, F1).
- Time to value
Time from project start to measurable business usage.
- Data readiness rate
Share of required data sources available at sufficient quality.
Examples & implementations
Insurance claims classification
Use of CRISP‑DM to structure a project for automated classification of insurance claims.
Retail demand forecasting
Iterative phases for cleaning historical sales data, modeling and rollout into store planning.
Telecom customer segmentation
Segmentation campaign using CRISP‑DM to develop features and run targeted campaign tests.
Implementation steps
Kickoff: clarify business goals, stakeholders and success criteria.
Data inventory: identify sources and run initial quality checks.
Data preparation: handle missing values, consistency and feature engineering.
Modeling: select, train and compare models.
Evaluation: test models against business criteria and robustness.
Deployment & monitoring: plan rollout, set up monitoring and feedback.
⚠️ Technical debt & bottlenecks
Technical debt
- Ad hoc ETL scripts without reusability or tests.
- Insufficient data cataloging and metadata maintenance.
- No monitoring for model performance after deployment.
Known bottlenecks
Misuse examples
- Phase rigidity: completing each phase strictly before moving on without iteration.
- Focusing only on technical metrics while ignoring business impact.
- Starting data projects without governance and later facing compliance issues.
Typical traps
- Committing to a model too early without sufficient validation.
- Unclear success criteria lead to conflicting objectives.
- Neglecting production requirements during prototyping.
Required skills
Architectural drivers
Constraints
- • Limited access to historical or sensitive data.
- • Time constraints imposed by operational stakeholders.
- • Limited resources for data preparation and infrastructure.