Till the Quantum computers are powerful enough that they can break the modern encryption standard, I think the current ones would hold good enough for the work that I want to do with Chamber. As such, I have decided that I would like to create a way to allow an advanced user to select one of the many available encryption mechanisms. However, one of the core “features” of Chamber is “ease of use”. In other words, I do NOT want a user to have to learn the details about how encryption works.
Military Grade
This is a term that was famous till a few years ago. That was because of AES. AES can be used with at least 3 key lengths, the most secure of which is AES-256 which has a key length of 256 bits, or 32 bytes and is considered fairly secure. It was recommended by FIPS (Federal Information Processing Standard) and is used for encrypting sensitive documents. So if I too were in delivering punchlines, I would also like to use “Military Grade” with Chamber as well.
Now, AES is an encryption standard. 256 is a bit size for encryption key. But there are multiple modes of encryption and I am going to use the GCM mode as that is the best mode balancing security, speed and availability as a mode in different programming languages as well as third party tools.
Derivation of the Encryption Key
I said “ease of use” earlier (and multiple times). I also mentioned that AES requires 256 bit key for encryption. But the password can’t be used as an encryption key for two reasons:
- Not all passwords would always be 32 bytes.
- It is exceptionally bad to use a password as a key. If one does so, then the encryption key, the most important component is already known and derivable by anyone.
Of course better minds than me already knew that. Hence, to derive an encryption key out of a text-based password, multiple algorithms exist. These algorithms are usually called KDF (Key Derivation Function). Some of the famous ones are PKBDF2, bcrypt, scrypt and Argon2. Now, Argon2 is a great KDF algorith. So I have decided to use that.
Simple. So what’s next?
It might sound simple - choose an algorithm, convert password into key, use key to encrypt and decrypt data. Nice!
Except, no!
Cryptography is not always so easy. In most cases where we need to derive a cryptographically secure content, we usually require a “salt”. And a salt should usually be random and must be saved separately from the password so that it can be reused to recreate the key. This is hard. Why?
Because when used, a salt is only a little less important than the key and storing it in the plaintext can also be a problem. I already said in the The Data Storage format for Chamber that we shall be using SQLite as the data store for Chamber and that only makes it easier to get that salt, if we store it in plaintext.
So what to do?
Beauty of Argon2id
There is something beautiful about Argon2id algorithm. It takes a total of 6 inputs:
password
: Passwordsalt
: Saltiter
: Iteration countmem
: Amount of memory to be usedthr
: Number of threads to usekeylen
: Length of the key
Out of these, the password
would be supplied by the user. The keylen
is fixed at 32
(32 Bytes or 256 bits are needed). Now, the beauty: If I change anything in that input, the output changes completely.
Revisiting the problem-set
Let’s look at each problem we face for this case and how we can handle them?
1. Password length
A password can be of any length. Someone can use password
as the password. Someone can also use just a
as the password. Chamber is an offline tool. If someone is able to derive a password hash, then they can try to brute-force the hash and it would be easy!
There are 2 defenses that I would be employing for this:
- Password policy: The password must at least be 8 characters long and must contain a mix of capitals, smalls and digits.
- Mix, match, expand: Before any hashing algorithm touches the input, the input password will be (deterministically) rotated and repeated along with some character substitutions thrown in so that the input becomes pretty large.
That should discourage the use of any kind of rainbow tables as they are not made for significantly large inputs (say 20 KB).
Nothing beats a big lock!
It does not matter how great the encryption or hashing is - they all fall flat if the password is easily guessable! Please use a strong password.
2. Hash strength
I remember a movie where the hero needs to get inside a locked room to get what he wants. He takes a look at the large lock protecting the door! So he goes back many steps, runs down hard and kicks the door with full force. The lock stays intact, but the entire door falls down (in the locked state) and the hero exclaims “the door needs to be strong too”!
The strength of the hash is the strength of the door in this case. If you have a strong password but the hashing mechanisms are weak, it won’t work. A strong password needs a very strong hashing mechanism.
There are the following defenses that I plan to use:
- Multi-layer hashing with expansion: The input string (the elongated version of the password described in the previous step) will go through a 8-step hashing with each step increasing the length of the hash used above along with similar mixing and matching on the input.
- Large hashes on each step: I plan on using a mix of SHA256 and SHA512 hashing algorithms with some salts (which will be stored in plaintext) in different layers of input. I am aiming for a 32 KB input for the last hashing step.
- Key Derivation at one of the layers: The Key Derivation will be done using one of the hashes from in the middle of the chain. I have thought of either layer 5 or 6 where the input would be fairly large. That way, even if (by some wonder) the input to the penultimate step is derived, the real key derivation would still not be possible.
3. Use of GPU to break the password
GPUs are known to do a lot of stuff in parallel and decrease the total time taken by orders of magnitude. But they are not a silver bullet. GPUs have 2 glaring limitations:
a) GPUs are SIMD devices
SIMD stands for “Single Instruction Multiple Data”. GPUs can take one instruction and run them all at the same time on different processor cores.
For Example, if you want to write a program which looks like this:
step1 := input * 2
step2 := step1 - 1
output := step1 + 2*(step2^2)
Then the GPU can run those instructions on maybe 8000 numbers all at once and give 8000 outputs in no time. But if you write something like this:
if (input % 2 == 0)
do_fizz_stuff();
else if (input % 3 == 0)
do_buzz_stuff();
else if (input % 11 && input % 13 == 0)
do_fizzbuzz_stuff();
else
do_sad_stuff();
Then the GPU can’t optimize all the conditional flows in parallel. It would most probably execute all the qualifying cases in one go for each function in one go but will wait for the rest 3 in that cycle.
To reduce branching, GPU code is refactored into a Map-Reduce method where all operations are mapped as serially in their operations as possible so that they can be fanned out and then later combined or Reduced to get the final output. But that is not always possible.
In short: GPUs are really bad with conditional code. The whole process will use as many if-elses as possible in a manner that cannot be reduced easily.
b) GPUs are short on memory
Think of the latest and the most powerful consumer Graphics Card available at the moment by NVidia - the RTX5090 . It has 32 GB of memory. That sounds like a lot but you have to understand 2 things about it.
- That is GDDR memory, not DDR. A GDDR memory contains various “types” of memory and not everything can be used by a typical program.
- It has over 21760 (over Twenty One Thousand) processing cores.
Even if you assume that entire 32 GB memory is available to be used, each core ideally has access to 32*1024/21760
MBs of memory. That is about 1.5 MB.
If the hashing process involves utilizing more than that in a fashion that none of that memory can be deallocated, then GPUs cannot be used for breaking the hash.
The Final Process of performing the Key Derivation
The final process will look something like the diagram below. Do understand that:
- The word “hash” will mean a hashing mechanism which involves a lot of if-else statements to cause code divergence in the GPU code to defeat the GPU’s parallelism advantage.
- The term R3E stands for Rotate, Repeat, Replace and Expand. Each such stage will perform the operations of Input rotation, repeating the input with rotation, replacing some characters (based on certain conditions) in each repetition and keep appending each such stage to the previous input to expand the input for the next stage.
- The final step can use values from any byte produced in any of the previous step.
I hope this will be good enough for most needs. In case you find problems, please reach back.