Catalog
concept#Architecture#Software Engineering#Platform#Reliability

Caching

Strategy for temporarily storing frequently accessed data to reduce latency and load. Includes forms such as in-memory, HTTP, and CDN caches plus rules for consistency, invalidation, and capacity management.

Caching reduces latency and load by keeping frequently accessed data temporarily closer to consumers.
Established
Medium

Classification

  • Medium
  • Technical
  • Architectural
  • Intermediate

Technical context

Database (as origin)CDN providersMonitoring and observability stack

Principles & goals

Explicit consistency agreements: define allowed staleness guarantees.Data locality: cache where accesses are highest.Fail-safe and fallback: systems must degrade correctly on cache failures.
Build
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Cache incoherence with insufficient invalidation
  • Hotspots and thundering‑herd problems
  • Increased costs due to redundant storage
  • Explicit invalidation strategies and versioned keys
  • Use cache metrics for capacity decisions
  • Defined fallbacks and retry strategies on cache failure

I/O & resources

  • Load profiles and query metrics
  • Data access patterns (hotspots)
  • Operational requirements (SLAs, consistency)
  • Reduced latency and backend load
  • Metrics for hit rate and evictions
  • Documented cache topology and policies

Description

Caching reduces latency and load by keeping frequently accessed data temporarily closer to consumers. It includes placement, consistency, invalidation, and capacity strategies as well as cache types (in-memory, CDN, HTTP, database). Useful for performance optimization but introduces trade-offs in consistency, complexity, and operational overhead.

  • Reduced latency and faster response times
  • Lower load on backend systems
  • Better scalability for read-intensive workloads

  • Staleness: caches may serve outdated data
  • Complexity in invalidation and consistency
  • Additional operational overhead (monitoring, capacity planning)

  • Hit rate

    Ratio of cache hits to total requests; indicator of effectiveness.

  • Latency (P95)

    95th percentile of response times with cache in the path; measures user experience.

  • Evictions per second

    Number of cache evictions due to memory pressure; indicates capacity issues.

Redis as L2 cache for user profiles

Use of a central Redis cluster to offload database read operations and improve response times for high-frequency profile queries.

CDN caching for static web assets

Serving static assets via a CDN with controlled Cache-Control headers and targeted cache-busting during deployments.

HTTP response caching with RFC-compliant headers

Using Expires, Cache-Control, and ETag to effectively utilize browser and proxy caches for public content.

1

Analyze access patterns and identify suitable cache boundaries

2

Select cache type and technology, configure TTL and invalidation

3

Introduce monitoring, load testing, and phased rollout

⚠️ Technical debt & bottlenecks

  • Ad-hoc implemented cache keys without versioning
  • No documented invalidation rules
  • Monolithic cache instances that are hard to scale
Network latencyMemory capacityInvalidation throughput
  • Caching highly write‑frequent records without suitable consistency solutions
  • Using very long TTLs for critical, rapidly changing data
  • Missing monitoring of evictions and memory pressure indicators
  • Underestimating cache invalidation complexity
  • Ignoring security and privacy aspects in shared caches
  • Not accounting for network latency between cache and clients
Networking and infrastructure knowledgeUnderstanding of consistency modelsExperience with caching systems (e.g., Redis, Varnish, CDN)
Latency requirementsRead load and scalabilityData consistency requirements
  • Limited memory on nodes
  • Latency between cache locations
  • Regulatory constraints for data residency