Catalog
concept#Data#Security#Architecture#Governance

Data Integrity

Principle ensuring accuracy, consistency, and trustworthiness of data across its lifecycle.

Data integrity denotes the accuracy, consistency, and reliability of data throughout its lifecycle.
Established
Medium

Classification

  • Medium
  • Technical
  • Architectural
  • Intermediate

Technical context

Relational and NoSQL databasesBackup and archiving systemsMonitoring and SIEM platforms

Principles & goals

Enforce uniqueness of data statesVerification along the data flow chainSeparation of responsibilities for correctness and access
Run
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • False assumptions about integrity guarantees can lead to data loss
  • Lack of end-to-end verification in distributed systems
  • Excessive complexity from redundant integrity mechanisms
  • Enforce principle of least privilege and auditing
  • Use checksums and signatures for critical data
  • Versioning and transaction logs for traceability

I/O & resources

  • Data model and schema definition
  • Audit logs and change records
  • Backup strategies and checksums
  • Integrity reports and alerts
  • Corrected and verified data sets
  • Audit trails for compliance

Description

Data integrity denotes the accuracy, consistency, and reliability of data throughout its lifecycle. It includes safeguards against accidental or malicious alteration and mechanisms for detection and correction of errors. Maintaining data integrity is essential for trust, regulatory compliance, and sound decision-making across systems and business processes.

  • Increased trust in decision inputs
  • Reduction of errors through early detection
  • Support for compliance and audit requirements

  • Additional storage and compute overhead for verification mechanisms
  • Increased implementation effort in heterogeneous environments
  • Not all types of errors are fully automatable

  • Integrity check rate

    Share of records periodically verified for integrity.

  • Detection time

    Time between occurrence of an integrity violation and its detection.

  • Recovery duration

    Time to fully restore a consistent state after an incident.

Database constraints to prevent inconsistent states

Use of NOT NULL, FOREIGN KEY and UNIQUE to enforce structural integrity.

Checksums in distributed file system

Regular hash comparisons to detect bit-rot and corruption.

Provenance tracking for data pipelines

Tracking source, transformations and authorship for audit purposes.

1

Analyze critical data paths and requirements

2

Define consistency and verification strategies

3

Implement verification mechanisms and monitoring

4

Regularly test recovery procedures

⚠️ Technical debt & bottlenecks

  • Missing checks in legacy data pipelines
  • Incomplete audit logs lacking integrity information
  • Ad-hoc correction scripts instead of stable processes
I/O performanceData migrationsDistributed consistency
  • Using only local checksums in a distributed system
  • Applying schema changes without a migration plan
  • Performing integrity checks only periodically and never in real time
  • Assuming database ACID solves all integrity problems
  • Ignoring metadata and provenance
  • Insufficient testing of recovery processes
Database design and transaction modelsSecurity and cryptography fundamentalsOperational knowledge of backup and recovery
Traceability and auditabilityAvailability and performance requirementsSecurity and compliance requirements
  • Limited compute and storage resources
  • Regulatory retention periods
  • Heterogeneous system landscape with differing guarantees