Catalog
concept#Platform#Reliability#Architecture#DevOps

Autoscaling

Dynamic automatic adjustment of instances and resources to load, improving availability and cost-efficiency.

Autoscaling automatically adjusts an application's instance count and resource allocation to match real load.
Established
Medium

Classification

  • Medium
  • Technical
  • Architectural
  • Intermediate

Technical context

Container orchestrators (e.g., Kubernetes HPA)Cloud auto-scaling services (e.g., AWS Auto Scaling)Monitoring and observability tools (Prometheus, Grafana)

Principles & goals

Scale based on measurable metrics, not assumed load.Define min/max bounds to avoid over- or underscaling.Safety mechanisms (cooldown, rate limits) prevent oscillating behaviour.
Run
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Oscillation due to aggressive scaling rates and missing cooldowns.
  • Cost explosion from faulty rules or load prediction.
  • Cascading failures if dependent systems are not scaled in tandem.
  • Conservative default bounds and iterative tuning instead of aggressive rules.
  • Combine multiple metrics (e.g., CPU + latency) instead of single metric.
  • Use cooldowns and hysteresis to avoid oscillations.

I/O & resources

  • Metrics (CPU, memory, custom metrics, queue depth)
  • Scaling rules and policies (min/max, targets, cooldown)
  • Monitoring, alerting and observability data
  • Automatically changed capacity (scale up/down)
  • Measurable effects on latency and throughput
  • Auditable scaling events and logs

Description

Autoscaling automatically adjusts an application's instance count and resource allocation to match real load. It improves availability and cost efficiency by dynamically scaling capacity — in cloud platforms, container orchestrators, or hybrid environments. Policies, metrics and thresholds determine scaling behaviour and safety limits.

  • Automatic adjustment reduces manual intervention and operational effort.
  • Cost efficiency through demand-driven resource provisioning.
  • Improved availability and responsiveness during load spikes.

  • Scaling often reacts with delay; very short peaks may remain unhandled.
  • Incorrect metrics or thresholds can lead to misbehaviour.
  • Not all components are easily horizontally scalable (stateful services).

  • Replicas/Instances

    Number of running instances or pods to measure capacity.

  • CPU/Memory utilization

    Percentage resource utilization to trigger scaling actions.

  • Request rate / latency

    Throughput and response time as indicators for needed scaling.

Kubernetes Horizontal Pod Autoscaler

Widely used implementation of pod-based autoscaling based on CPU and custom metrics.

AWS Auto Scaling Groups

Cloud provider mechanism to autoscale EC2 instances using policies and target metrics.

Serverless concurrency/provisioned scaling

Examples from functions-as-a-service (e.g., provisioned concurrency) for controlling cold starts and throughput.

1

Set up metrics and observability; validate relevant metrics.

2

Define scaling rules with min/max and cooldown parameters.

3

Test autoscaling in stages (staging → canary → prod) and monitor.

⚠️ Technical debt & bottlenecks

  • Manual tuning of thresholds without automation or tests.
  • Missing capacity models for templates, permissions and quotas.
  • Incomplete telemetry that obscures reasons for scaling decisions.
Stateful servicesDatabase connectionsInstance startup (cold start) time
  • Autoscaling stateful databases without proper replication strategy.
  • Aggressive scaling rules that increase costs uncontrollably.
  • Scaling only up, without automatic scale-down when load decreases.
  • Ignoring dependencies that remain bottlenecks (DB, network).
  • Missing tests for combined load scenarios lead to surprises in prod.
  • Blind trust in cloud defaults without adapting to own workloads.
Knowledge of cloud platforms and orchestratorsExperience with monitoring, metrics and alertingUnderstanding of scaling strategies and risks
Scalability under peak loadCost optimization via elastic provisioningRecoverability and fault tolerance
  • Dependencies that cannot be horizontally scaled (e.g., monolithic DB).
  • Provider limits or quotas in the cloud.
  • Network or storage throughput limitations.