What Is a Hash Function?
A cryptographic hash function takes an input of any length and produces a fixed-length string of characters, commonly represented as a hexadecimal number. The process is deterministic, meaning the same input always yields the same output, and it is one-way, meaning you cannot feasibly reverse-engineer the original data from the hash. Even a single-character change in the input produces a completely different hash, a property known as the avalanche effect.
Hash functions are foundational to modern computing and security. They are used to verify file integrity by comparing checksums, to store passwords securely so that plaintext credentials are never saved to disk, and to generate digital signatures that authenticate the author of a document or software release. Content-addressable storage systems and version control tools like Git also rely on hashes to identify objects efficiently.
Common algorithms include MD5, which is fast but cryptographically weak; SHA-1, which is deprecated for security use; and the SHA-2 family (SHA-256 and SHA-512), which remain widely trusted. For password hashing specifically, dedicated algorithms such as bcrypt, scrypt, and Argon2 add salting and deliberate slowness to resist brute-force attacks.
Frequently Asked Questions
What is the difference between MD5 and SHA-256?
MD5 produces a 128-bit hash (32 hex characters) and is very fast, but it is considered cryptographically broken because researchers have demonstrated practical collision attacks. SHA-256 produces a 256-bit hash (64 hex characters) and is part of the SHA-2 family. It remains secure for cryptographic applications including digital signatures, TLS certificates, and blockchain proof-of-work. For any use case where security matters, prefer SHA-256 or SHA-512 over MD5.
Can a hash be reversed to get the original text?
No. Hash functions are designed to be one-way. It is computationally infeasible to reconstruct the original input from the output alone. However, attackers can use precomputed rainbow tables or brute-force methods to find inputs that match a known hash, which is why passwords should be hashed with salted, purpose-built algorithms like bcrypt or Argon2 rather than raw SHA-256 or MD5.
Which hash algorithm should I use?
For file integrity verification and checksums, SHA-256 is a strong default. For password storage, always use a dedicated password-hashing algorithm such as bcrypt, scrypt, or Argon2 that incorporates salting and key stretching. MD5 and SHA-1 are fine for non-security purposes like cache keys, ETags, or deduplication, but they should not be relied on for tamper detection or authentication.
Is MD5 still safe to use?
MD5 is no longer considered safe for any security-related purpose. Researchers have demonstrated practical collision attacks that can produce two different inputs with the same MD5 hash. However, MD5 is still acceptable for non-security tasks like generating cache keys, checksums for accidental data corruption, or deduplication identifiers where deliberate tampering is not a concern.
What is a hash collision?
A hash collision occurs when two different inputs produce the same hash output. Because hash functions map an infinite input space to a fixed-size output, collisions are theoretically inevitable, but a well-designed cryptographic hash function makes them astronomically unlikely to find. When an algorithm like MD5 or SHA-1 has known collision attacks, it means attackers can deliberately craft colliding inputs, which undermines its security guarantees.