Catalog
concept#Data#Analytics#Machine Learning

Feature Store

Method for centrally storing, versioning and serving ML features for training and inference.

A feature store is a centralized system for storing, versioning and serving ML features for training and inference.
Emerging
High

Classification

  • High
  • Technical
  • Architectural
  • Intermediate

Technical context

FeastApache Kafka / Pub/SubSpark / Flink for batch and streaming transformations

Principles & goals

Single source of truth for feature definitionsEnsure consistency between training and servingVersioning and traceability of features
Build
Domain, Team

Use cases & scenarios

Compromises

  • Incorrect feature definitions lead to model degradation
  • Single point of failure with insufficient high availability
  • Governance and privacy issues with sensitive features
  • Separate feature definitions from implementations
  • Automated validation of training vs. production features
  • Clear ownership and SLAs for feature APIs

I/O & resources

  • Raw data from databases or event streams
  • Feature definitions and transformations
  • Metadata for versioning and governance
  • Versioned feature sets for training
  • Low-latency feature APIs for production
  • Monitoring and quality metrics

Description

A feature store is a centralized system for storing, versioning and serving ML features for training and inference. It unifies batch and real-time features, ensures consistency between training and production data, and improves reuse, governance and traceability in ML pipelines. It reduces engineering overhead and accelerates model delivery.

  • Reusability of features across teams
  • Reduction of data inconsistencies between training and production
  • Acceleration of model development and deployment cycles

  • Onboarding effort for infrastructure and governance
  • Increased operational complexity for real-time serving
  • Not all features are suitable for central storage

  • Feature latency

    Measure of time until a feature is available for inference.

  • Reproducibility (train vs. serve)

    Share of models trained and served with identical feature versions.

  • Feature drift rate

    Frequency of deviation between production and training distributions.

Feast (open-source) as reference implementation

Feast is used as an example of a production-ready feature store pattern and demonstrates architecture and interface patterns.

Tecton for managed feature store

Tecton illustrates a commercial managed feature store offering with governance and service levels.

Hybrid architecture with Kafka and Spark

Example of a hybrid implementation combining streaming and batch pipelines with a central serving layer.

1

Requirements analysis and definition of feature schemas

2

Select or build a feature store implementation

3

Implement batch and streaming pipelines

4

Introduce versioning, tests and CI/CD for features

5

Rollout, monitoring and training of user teams

⚠️ Technical debt & bottlenecks

  • Ad-hoc transformation scripts without tests or monitoring
  • Unversioned feature definition files
  • Lack of automation for backfills and migrations
Feature computation latencyConsistency validationStorage and cost optimization
  • Storing raw transient session data as long-term features
  • Centrally storing sensitive PII features without masking
  • Excessive normalization that slows real-time serving
  • Underestimating latency requirements for real-time serving
  • Unclear ownership leads to stale or duplicate features
  • Missing tests for feature transformations
Experience with data pipelines and ETL/ELTKnowledge in ML feature engineeringOperational and observability competence
Consistency between training and productionScalable latency and throughput requirementsGovernance, traceability and versioning
  • Privacy and compliance requirements
  • Budget and operational resources for infrastructure
  • Legacy data formats and heterogeneous sources