How to Hash Passwords in Python

This post will briefly detail a basic password hashing mechanism written in Python. We will discuss how exactly a password can be hashed, and give some examples of the code that can be used to do this. As always, if you have questions please reach out to me on Twitter (@zaeyx).

What is “Password Hashing”?

I’ll touch on this very briefly for anyone who might need a quick refresher. Though if you’re looking for more on this subject I recommend a more in depth study. Here is a good post that goes deeper on the subject of hashing: https://auth0.com/blog/hashing-passwords-one-way-road-to-security/.

In short, when we ‘hash’ a password we are taking that plaintext password (and some additional data – usually a salt or pepper) and passing all this to a hashing function. This function will algorithmically produce an output that is completely indistinguishable from the input. The output is deterministic – meaning that if you put the same inputs in again you will get the same output. But changing even one bit of the input should produce a completely new and random looking output. This means that you cannot infer anything about the input by looking at the produced output. The only way to learn what the input ever was – is to guess the exact same input again.

This is why in an offline password attack – a hacker will guess many passwords via “brute force” – putting those passwords into the function and checking to see if they get the target output. Only that one input will produce that particular output – and guessing ‘close’ to the right password will be indistinguishable from guessing incredibly ‘far’ from the right password.

Hashlib

While you can get libraries to support using just about any common hashing algorithm in Python; today we will focus on the use of a particular library which comes built in. It is called Hashlib; and it contains the features necessary to allow you to generate hashes in a few modern algorithms. In our demonstration we show the example to be defaulting to use the SHA256 hashing function. This is probably the most ‘minimally secure’ modern hashing function. While it is not yet (to my knowledge) known to be ‘broken’ as some other functions – notable SHA1 and MD5 are. It is likely nearing the end of it’s useful life. Just keep that in the back of your mind while working through this example. Do still note that we will be showing how to use Hashlib – not how to use SHA256. You are welcome to swap SHA256 out for a different algorithm at any time!

Salting

Our example will allow for ‘salting’ to be part of the hash inputs. A salt is simply a value which is stored in plaintext – but serves as part of the input to the hashing function. It is not secret in the way the password value is secret. It doesn’t have to be. The point of the salt is to essentially ‘add randomness’ to the hashing function so that even if the user has a common password – the output will appear different. Remember, a hashing function will always produce the same output whenever given the same input. So if two users have the same password – without a salt their password hashes would be the same! This can be a big security issue. Therefore, having a unique salt for every user is critical. Additionally, salting can defend against something called ‘precomputed hash tables’ where an attacker might have a list of common hashing outputs so that they can almost instantly crack those common passwords. But a salt adds randomness that the attacker would not have precomputed into their hash tables.

Pepper

A pepper is another bit of information added to a hashing function’s input. The idea here being that the pepper is to be stored far away from the location where the hashed passwords are stored. A common example might be to store the hashed passwords in a website’s database. But to store the pepper in a configuration file on disk. What this does is make it so that if the database is dumped – the hashes are still uncrackable until the pepper is also retrieved by a hacker. You could imagine that a hacker who dumps a database via SQL injection might then be very annoyed to find that they still need a pepper value stored on disk. This might require another series of hacks to access! While salting is fairly popular, peppers are less so. They are more associated with mature and complex hashing schemes than with the average website or system.

Concatenation Delimiter

In our example we also show the use of a concatenation delimiter. This is a short string we place between all the inputs to the hashing algorithm to break them apart. It’s awkward to explain, but think of it like this. If our input is the combination (concatenation) of salt + pepper + password. Then it is still possible to have weird hashing output collisions in cases where the values of the salt, pepper, or password collide in the input.

For example, if the salt = “1234”, pepper = “54321” and the password = “delta” then the input without delimiters would be equal to: 123454321delta.

In a case where the pepper changed to “5432” – but the user decided to start their password with a 1 the input would still be: 123454321delta. This would mean that the output hash would not change at all! This is a weird edge case. But it’s something to be aware of. For this we use delimiters – something like “:::” to split each section of the input.

So with delimiters the above example might look more like: 1234:::54321:::delta. Then after the changes we demonstrate above, the input would be: 1234:::5432:::1delta. This keeps the outputs from the hash function distinct.

Python Hashing Mechanism

from dataclasses import dataclass

import hashlib


AVAILABLE_ALGORITHMS = ['sha1', 'sha224', 'sha256', 'sha384', 'sha512', 'blake2b', 'blake2s', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'shake_128', 'shake_256']


class InvalidAlgorithmError(Exception):
  """You must choose a valid algorithm."""


@dataclass
class HashedPassword:
  """Holds a password with additional info."""
  hash_value: str
  salt: str
  pepper: str
  concat_str: str
  algo: str


def _hash_pass(passwd: str, salt: str = None, pepper: str = None,  concat_str: str = ":::", algo: str = "sha256") -> HashedPassword:
  """Hashes a password with an optional salt and pepper and returns instance of HashedPassword."""

  if algo not in AVAILABLE_ALGORITHMS:
    raise InvalidAlgorithmError(f"Please provide a valid algorithm: {AVAILABLE_ALGORITHMS}")

  input_builder = [item for item in [passwd, salt, pepper] if item is not None]

  return HashedPassword(
    hash_value = getattr(hashlib, algo)(
                   concat_str.join(
                     input_builder
                   ).encode(
                     'ascii'
                   )
                 ).hexdigest(),
    salt = salt,
    pepper = pepper,
    concat_str = concat_str,
    algo = algo
  )

This mechanism is capable of producing a hash output for inputs provided with at minimum, just a password. If you provide a salt or a pepper it will factor those in as well. It returns all values in a dataclass (struct) form. This gives you the hashed output, the salt, pepper, algorithm used, and the concatenation delimiter in one package. These outputs may then be serialized and stored on disk in a format of your choosing. Or each value in the HashedPassword output may be dropped into columns in a database (though please consider storing the pepper somewhere else – such as on disk in a config file directly).

Discussion

Let’s take a look at each part of the code.

from dataclasses import dataclass

import hashlib

These are the top level imports. You can see this mechanism requires very little in terms of imports for it to run. We have hashlib, which provides the hashing algorithms, and dataclass from dataclasses. This lets us produce that ‘struct’ like output. For more on either of these classes please see:

AVAILABLE_ALGORITHMS = ['sha1', 'sha224', 'sha256', 'sha384', 'sha512', 'blake2b', 'blake2s', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'shake_128', 'shake_256']

Next we have a global variable. This is just a list of strings. It is the list of all available algorithms. This list was taken from the documentation for hashlib. You can see we got it right from: https://docs.python.org/3/library/hashlib.html#hash-algorithms. The point of this variable is so that we can test if the algorithm requested is actually one hashlib can provide. You’ll see this happening later in the code.

class InvalidAlgorithmError(Exception):
  """You must choose a valid algorithm."""

Next we declare a class that inherits from Exception. This is a custom error class that we will use to throw a more helpful error in the case that the user chooses an invalid algorithm. We want to make it clear why the function fails in that case. Having custom errors like this can be really helpful for improving code usability and readability.

@dataclass
class HashedPassword:
  """Holds a password with additional info."""
  hash_value: str
  salt: str
  pepper: str
  concat_str: str
  algo: str

Now we define a dataclass. This is essentially a ‘struct’ that has no methods – and is only used to store values in a way which helps classify their relationship together. It’s a good organizing principle to wrap associated values together like this. Each dataclass instance is essentially a pythonic example of a row in a database / spreadsheet.

def _hash_pass(passwd: str, salt: str = None, pepper: str = None,  concat_str: str = ":::", algo: str = "sha256") -> HashedPassword:
  """Hashes a password with an optional salt and pepper and returns instance of HashedPassword."""

  if algo not in AVAILABLE_ALGORITHMS:
    raise InvalidAlgorithmError(f"Please provide a valid algorithm: {AVAILABLE_ALGORITHMS}")

  input_builder = [item for item in [passwd, salt, pepper] if item is not None]

  return HashedPassword(
    hash_value = getattr(hashlib, algo)(
                   concat_str.join(
                     input_builder
                   ).encode(
                     'ascii'
                   )
                 ).hexdigest(),
    salt = salt,
    pepper = pepper,
    concat_str = concat_str,
    algo = algo
  )

Now we have the function that actually hashes the password. There are a few things going on here so we’ll break them down bit by bit.

def _hash_pass(passwd: str, salt: str = None, pepper: str = None,  concat_str: str = ":::", algo: str = "sha256") -> HashedPassword:
  """Hashes a password with an optional salt and pepper and returns instance of HashedPassword."""

First we have the function definition. We can see the function accepts a password value (called passwd). It also optionally accepts arguments for a salt, pepper, concat_str (delimiter), and algo (hashing algorithm). It has defaults for all the optional values. For the salt and pepper the default is a blank “None” value. For the concat_str the default is “:::”, and for the algorithm the default is SHA256. We can also see thanks to the type hints that this function will return an instance of HashedPassword (our dataclass from above).

if algo not in AVAILABLE_ALGORITHMS:
  raise InvalidAlgorithmError(f"Please provide a valid algorithm: {AVAILABLE_ALGORITHMS}")

This check is to determine if the requested algorithm is one of the ones actually supported by hashlib. This uses the variable AVAILABLE_ALGORITHMS that we declared at the start of the code. If the requested algo is not known, then an error will be thrown. This would look like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "python_password_hashing/password_hashing.py", line 27, in _hash_pass
    raise InvalidAlgorithmError(f"Please provide a valid algorithm: {AVAILABLE_ALGORITHMS}")
password_hashing.InvalidAlgorithmError: Please provide a valid algorithm: ['sha1', 'sha224', 'sha256', 'sha384', 'sha512', 'blake2b', 'blake2s', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'shake_128', 'shake_256']

Next we will create a list of all the elements that need to be passed to the hashing function.

input_builder = [item for item in [passwd, salt, pepper] if item is not None]

This mechanism uses a list comprehension and checks if an item is None (empty) before adding it to the list.

Now comes the most messy part of the code.

return HashedPassword(
    hash_value = getattr(hashlib, algo)(
                   concat_str.join(
                     input_builder
                   ).encode(
                     'ascii'
                   )
                 ).hexdigest(),
    salt = salt,
    pepper = pepper,
    concat_str = concat_str,
    algo = algo
  )

There are a few layers of nesting here. The fundamental thing which is happening is that the function is returning a new instance of HashedPassword:

return HashedPassword(
  ...
  )

Most of the values that fill in the HashedPassword object are known from the inputs to the function. Those are easy.

return HashedPassword(
    hash_value = ... ,
    salt = salt,
    pepper = pepper,
    concat_str = concat_str,
    algo = algo
  )

The tricky one is the hash_value.

hash_value = getattr(hashlib, algo)(
               concat_str.join(
                     input_builder
               ).encode(
                     'ascii'
                 )
               ).hexdigest(),

Here we can see that the hash_value is the result of somehow calling the hashlib object. But it’s a little tricky to see exactly how hashlib is called. A normal call to hashlib with less variables might look like:

hashlib.sha256(b'test').hexdigest()

What is happening here is that the ‘algo’ choice is being used to pick which function in hashlib to call. This is done using that “getattr” function. But what it’s doing is essentially the same as (though this is not syntactically correct)

hashlib.algo(b'test').hexdigest()

Calling getattr(hashlib, algo) allows us to call hashlib.____ where the blank spot is whatever string value the variable algo contains.

Then there is some string processing used to prepare the input to the hash function inline.

concat_str.join(input_builder).encode('ascii')

All this does, is join all the inputs to the hash function – separated by the concat_str using the string “join” method. Then it encodes the joined string (something like: “1234:::my_pepper:::secret_pass”) as ascii values. Then this is all passed into the hashing function. The output is subsequently returned. An example output might look like:

HashedPassword(hash_value='15b3ecb573d25f7fc5e84a736fe7a5f4aed3011392bfd519c9f5035aaf6286d5', salt='1234', pepper='my_pepper', concat_str=':::', algo='sha256')

Remember, if you store this HashedPassword object somewhere it can then subsequently be used to check if a user has entered the correct password later. Though one storage consideration we will note is that you should not store the pepper in the same place as the hash_value and salt; but what happens with these values is up to you. But if you have all the same values – and you pass them into the function again, you should get the same hash_value output. This would let you know that the plaintext password entered was the same one!

About the author

Professional hacker & security engineer. Currently at Google, opinions all my own. On Twitter as @zaeyx. Skydiver, snowboarder, writer, weightlifter, runner, energetic to the point of being a bit crazy.

Leave a Reply

Your email address will not be published. Required fields are marked *