Skip to main content

Hashing

Definition:​

Hashing is a process that transforms input data (or a message) into a fixed-size string of characters, which is typically a sequence of numbers and letters. The output, known as the hash value or hash code, is generated by a hash function. Hashing is widely used in various applications, including data integrity verification, password storage, and digital signatures.

Characteristics:​

  • Deterministic:

    • The same input will always produce the same hash output. This property allows for consistent verification of data integrity.
  • Fixed Size:

    • Regardless of the size of the input data, the output hash will always be of a fixed length, making it easier to handle and compare.
  • Fast Computation:

    • Hash functions are designed to compute hash values quickly, allowing for efficient data processing.
  • Pre-image Resistance:

    • Given a hash output, it should be computationally infeasible to reverse-engineer the original input, ensuring data confidentiality.
  • Collision Resistance:

    • It should be difficult to find two different inputs that produce the same hash output, preventing data tampering.

Common Hashing Algorithms:​

  1. MD5 (Message Digest 5):

    • Produces a 128-bit hash value and is widely used for checksums and data integrity. However, it is no longer considered secure against collision attacks.
  2. SHA-1 (Secure Hash Algorithm 1):

    • Produces a 160-bit hash value. Like MD5, SHA-1 has vulnerabilities and is not recommended for security-sensitive applications.
  3. SHA-256:

    • Part of the SHA-2 family, it produces a 256-bit hash value and is widely used in security applications and protocols, including SSL/TLS and Bitcoin.
  4. bcrypt:

    • A hashing function designed specifically for hashing passwords, incorporating a salt to protect against rainbow table attacks.

Time Complexity:​

  • Hash Computation Time: O(n)O(n)
    The time complexity for computing a hash value depends on the input size, with most hashing algorithms running in linear time relative to the input length.

Space Complexity:​

  • Space Complexity: O(1)O(1)
    The output size of a hash function is constant (fixed size), independent of the input size, leading to constant space complexity for storing hash values.

C++ Implementation of Hashing (Using SHA-256):​

#include <iostream>
#include <openssl/sha.h>
#include <iomanip>
#include <sstream>

std::string sha256(const std::string &data) {
unsigned char hash[SHA256_DIGEST_LENGTH];
SHA256(reinterpret_cast<const unsigned char *>(data.c_str()), data.size(), hash);

std::ostringstream oss;
for (const auto &byte : hash) {
oss << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(byte);
}
return oss.str();
}

int main() {
std::string data = "Hello, World!";
std::string hashValue = sha256(data);

std::cout << "Data: " << data << std::endl;
std::cout << "SHA-256 Hash: " << hashValue << std::endl;

return 0;
}

Summary:​

Hashing is a fundamental technique used in computer science and cryptography for data integrity verification, password management, and digital signatures. The use of secure hash functions, such as SHA-256, provides strong security guarantees against common attacks. Understanding hashing and its properties is essential for implementing secure systems and applications.