Hashing Algorithms
Deterministic functions producing fixed-size digests from arbitrary input, used for integrity checks, indexing and cryptographic primitives.
Classification
- ComplexityMedium
- Impact areaTechnical
- Decision typeArchitectural
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Using broken algorithms (e.g., MD5, SHA-1) leads to integrity compromises.
- Incorrect implementation can cause timing attacks or side-channel leaks.
- Ignored compatibility requirements complicate migration to stronger algorithms.
- Use modern, recommended algorithms (e.g., SHA-2/3, BLAKE2, Argon2 for KDF).
- Use vetted libraries instead of implementing cryptography yourself.
- Store metadata (algorithm, parameters, salt) alongside the hash for future verification.
I/O & resources
- Input data (bytes) or streams to hash
- Security requirements (e.g., collision resistance)
- Performance and compatibility requirements
- Fixed digest/hash value
- Metadata (algorithm version, salt, KDF parameters)
- Integrity or consistency indicators for downstream processes
Description
Hashing algorithms are deterministic functions that map arbitrary input to fixed-size digests, used for integrity checks, authentication primitives, and indexing. They prioritize collision resistance, preimage resistance and speed depending on use-case. Choosing one involves trade-offs between security, performance and compatibility; deprecated algorithms (MD5, SHA-1) must be avoided.
✔Benefits
- Efficient integrity checks and comparison of large data sets.
- Enables content-addressing and simple indexing.
- Fundamental building block for many cryptographic protocols and signature schemes.
✖Limitations
- No confidentiality: hashes are not reversible but not secret.
- Vulnerable with deprecated algorithms to collisions and attacks.
- Insufficient alone for password storage without salt and KDF parameters.
Trade-offs
Metrics
- Collision probability
Probability that two distinct inputs produce the same digest; important to assess security.
- Throughput (MB/s)
Amount of data processed per second for a given implementation/hardware.
- Latency per hash
Time to compute a single hash; relevant for real-time applications.
Examples & implementations
Git object hashes (historically SHA-1)
Git uses hashes for content-addressed identification of commits and objects; migrations to stronger algorithms are underway.
SHA-256 in TLS and certificates
SHA-256 is widely used for signature and integrity checks in TLS certificates and signature chains.
BLAKE2 for high-performance integrity checks
BLAKE2 offers high speed and good cryptographic properties; popular in performance-critical systems.
Implementation steps
Assess security and performance requirements and existing dependencies.
Select an appropriate, up-to-date algorithm and vetted libraries.
Implement with correct handling of salt/KDF parameters, test and plan migration strategy.
⚠️ Technical debt & bottlenecks
Technical debt
- Legacy databases with MD5/SHA-1 hashes require migration effort.
- Missing documentation of hash parameters used in systems.
- Monolithic components that hardcode algorithms and block migration.
Known bottlenecks
Misuse examples
- Using MD5 for password storage in a web application.
- Using bare hashes without salt in multi-tenant systems.
- Relying on hashes alone as an access control measure.
Typical traps
- Omitting necessary metadata (algorithm version, salt) prevents later verification.
- Assuming a long hash is automatically secure.
- Insufficient vetting of libraries for side-channel behavior.
Required skills
Architectural drivers
Constraints
- • Existing protocols may mandate specific hash algorithms.
- • Regulatory rules may prescribe minimum strengths.
- • Resource limits on embedded systems restrict choices.