Caching
Strategy for temporarily storing frequently accessed data to reduce latency and load. Includes forms such as in-memory, HTTP, and CDN caches plus rules for consistency, invalidation, and capacity management.
Classification
- ComplexityMedium
- Impact areaTechnical
- Decision typeArchitectural
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Cache incoherence with insufficient invalidation
- Hotspots and thundering‑herd problems
- Increased costs due to redundant storage
- Explicit invalidation strategies and versioned keys
- Use cache metrics for capacity decisions
- Defined fallbacks and retry strategies on cache failure
I/O & resources
- Load profiles and query metrics
- Data access patterns (hotspots)
- Operational requirements (SLAs, consistency)
- Reduced latency and backend load
- Metrics for hit rate and evictions
- Documented cache topology and policies
Description
Caching reduces latency and load by keeping frequently accessed data temporarily closer to consumers. It includes placement, consistency, invalidation, and capacity strategies as well as cache types (in-memory, CDN, HTTP, database). Useful for performance optimization but introduces trade-offs in consistency, complexity, and operational overhead.
✔Benefits
- Reduced latency and faster response times
- Lower load on backend systems
- Better scalability for read-intensive workloads
✖Limitations
- Staleness: caches may serve outdated data
- Complexity in invalidation and consistency
- Additional operational overhead (monitoring, capacity planning)
Trade-offs
Metrics
- Hit rate
Ratio of cache hits to total requests; indicator of effectiveness.
- Latency (P95)
95th percentile of response times with cache in the path; measures user experience.
- Evictions per second
Number of cache evictions due to memory pressure; indicates capacity issues.
Examples & implementations
Redis as L2 cache for user profiles
Use of a central Redis cluster to offload database read operations and improve response times for high-frequency profile queries.
CDN caching for static web assets
Serving static assets via a CDN with controlled Cache-Control headers and targeted cache-busting during deployments.
HTTP response caching with RFC-compliant headers
Using Expires, Cache-Control, and ETag to effectively utilize browser and proxy caches for public content.
Implementation steps
Analyze access patterns and identify suitable cache boundaries
Select cache type and technology, configure TTL and invalidation
Introduce monitoring, load testing, and phased rollout
⚠️ Technical debt & bottlenecks
Technical debt
- Ad-hoc implemented cache keys without versioning
- No documented invalidation rules
- Monolithic cache instances that are hard to scale
Known bottlenecks
Misuse examples
- Caching highly write‑frequent records without suitable consistency solutions
- Using very long TTLs for critical, rapidly changing data
- Missing monitoring of evictions and memory pressure indicators
Typical traps
- Underestimating cache invalidation complexity
- Ignoring security and privacy aspects in shared caches
- Not accounting for network latency between cache and clients
Required skills
Architectural drivers
Constraints
- • Limited memory on nodes
- • Latency between cache locations
- • Regulatory constraints for data residency