Friday, October 6, 2017

What The Internet Never Told You About Building A Login System

The Internet is replete with "how to build a login form" and "which password hashing algorithm to choose" articles but strangely silent on how to build a professional (well, maybe, semi-professional) login system.

Any login system has two sides: the client side and the server side.  The client side is easier so we'll start there.

To login, a user should provide a username or email address and a password.  For your system, well, which should it be: a username or an email address (or maybe a choice)?

Email addresses are more consistent and, chances are, you'll want to verify accounts by requiring a verified email address.  On the other hand, usernames are shorter and have a little more flair though a user is far more likely to forget their username than their email address.

Many clients send the password in the clear to the server.  But, when I researched the issue, I opted for a more sophisticated approach.  It is:

sha256('yoursite.com'+'user@emailaddress.com'+'password')
sha256('yoursite.com'+'user'+'password')

SHA-256 is a rather insecure hashing algorithm.  The point here is not to secure the password so much as invalidate an attacker's database of SHA-256 hashes and force him to generate a new one.  An attacker might already have a huge database of SHA-256 password hashes.  But, by adding the username or email address to the password string and then doing the SHA-256 hash, an attacker will have to have a much, much larger database of SHA-256 username-password hashes.  Adding an arbitrary string, such as the name of your web site, to the username-password will force an attacker to generate a new database of SHA-256 hashes customized for your login.

This hash is sent from the client to the server as the user's "password".  If an attacker steals your database and recovers the client-hashed password, they will be able to login to your site but the password won't work at any other site.  The arbitrary string will ensure that the hashes are different.  The attacker will be forced to recover the plain text password.

Having client-side SHA-256 hashes, even with usernames and arbitrary strings, isn't enough.  You are protecting other sites (a little bit) from being vulnerable to broken into if your password database is stolen.  But you are providing no protection for yourself (or your admin accounts) and SHA-256 is far too weak to be effective all on its own.  Enter server side hashing.

Like I said earlier, the server side of a login system is more complicated than the client side of the login system.  What server side hashing algorithm should you use?

MD5, SHA-256, PBKDF2, bcrypt, scrypt or Argon2?

Well, no.  You shouldn't choose a hashing algorithm at all.

Hashing algorithms and hardware change over time and, since you are building professional system, meant to adapt over the years, your server should support multiple algorithms.  Your login system, at least the server side, has three requirements:
  1. Hashing algorithms must be assignable per user, not one for the entire system
  2. Each user's hashing algorithm has to be changeable into any other algorithm
  3. Adding a new hashing algorithm should only require one function to be changed
No doubt, your database has a "password" column that contains the user's hashed password.  Having a password in a single string is extraordinarily convenient.  Rather than storing hashing algorithms in a separate column, it is best to store the hashing algorithm in the same column as the hashed password, as part of the same string.  Ideally, you'd use a standard password string format.

As far as I'm aware, there is only one such format in wide use: that's the Linux /etc/shadow password format (a.k.a. the shadow password format).  It looks like this:

$algorithm$arguments$saltthenpasswordhash

The algorithm field indicates the algorithm:

1=MD5
2=Blowfish
5=SHA-256
6=SHA-512
2a,2b,2y=bcrypt
pbkdf2=PBKDF2
scrypt=scrypt
argon2d=Argon2d
argon2i=Argon2i

The arguments field might not exist or might contain algorithm arguments.  bcrypt requires a work factor.  Argon2 has 3 different arguments: iterations, memory and threads.

The saltthenpasswordhash is the salt with the hashed password for this user.  Other articles (by other people) describe the reasons why each password should have its own salt and not a global salt.  Nowadays, it is common for libraries to automatically generate a salt when creating new hashes so there is less reason for me to go over those arguments here.  Since salts are generated along with the hash, it is easier to do it the right way and harder to do it the wrong way, with a global salt.

Storing all this information in a single string in the "password" column of your database using the standard /etc/shadow format will satisfy the first requirement that hashing algorithms be assignable per user.

For new users, I use a global variable to indicate the hashing algorithm that a new user should use so, when a new user is created, their password hash and their hashing algorithm (from the global variable at the time the account was created) is stored together.

When a user logs in, they use the hashing algorithm in their password string from the "password" column of the database to hash the SHA-256 client-side hash.  Then, the hash from the login attempt is compared to the hash from the database, preferably using the special hash comparison function from the encryption library (to blunt timing attacks).  If the hashes are not equal, the login attempt fails.

If the hashes are equal, the login attempt succeeds but you're not quite done yet.  The user's current hashing algorithm is compared to the hashing algorithm in the global variable.  If they are not the same, the client-side SHA-256 hash (which has already be validated to contain the correct password) is hashed using the global variable's hashing algorithm.  The result, in shadow password format, is stored into the "password" column: the user has been upgraded on-the-fly to the latest algorithm.

The global variable provides the ability to "turn up" and "turn down" security by specifying different algorithms.  When the global variable is changed into a stronger (slower) algorithm, new users and any user logging in will be upgraded to the new algorithm seamlessly.  If you make a mistake and deploy an algorithm that bogs down your server, you can "turn down" security merely by changing that algorithm global variable.  Existing users will undergo a one-time delay when they login using the too-slow algorithm and are downgraded to a faster algorithm.

Designing your login system such that a single function provides hashing new passwords, hashing a password with a given salt (for login checks) and comparing two hashes will allow you to deploy new hashing algorithms over the years by changing one function.

Rather than just sending passwords in the clear and hardcoding MD5 (oh, Yahoo!, will you never learn?!), a login system with client-side hashes and upgradable server-side hashing algorithms will provide you a basic framework for the future.

I don't claim these practices to be the best or even adequate.  I'm not a security researcher and don't claim to be one.  But SHA-256 client side hashes, the shadow password format and easy deployment of new server side hashing algorithms are important issues that are worth debating and considering.  And, hopefully, will give you a place to start the conversation.