If you look closely, you will find that everything that was talked about in Setting Expectations for Chamber revolves pretty heavily around “creating the vault file”. In this document, we are going to lay the foundations for storing the data.
Chamber will NOT deal with raw storage media!
One of the things that I am clear about with Chamber is that I do not want to use Chamber to encrypt raw storage media like USB drives or Hard Disks (or SSDs). I want to use Chamber only to create encrypted vault files inside which other files can be kept.
A rehersal of requirements
I looked at what I wanted and I told myself “have you even thought what you are day-dreaming about?” (with a silent “you idiot” attached to it). So here is the rehersal of the requirements again:
- I have to create a file system. This in itself is such a humongous task that it takes years for some of the greatest minds to get things right. If you do not know about File System, creating one means that you have to deal with:
- Allocation of space when a new file is created, or an existing one is expanded.
- Deallocation of space when a file is edited to reduce the size, or is deleted.
- Creating an index for the files.
- Ability to create hard links within that index.
- Deal with durability and journalling requirements.
- Ability to stream file data to an application
- Ability to modify file contents.
- Be able to copy, paste, move files.
- Deal with copy-on-write requirements.
- It must be “splittable”. This alone started breaking my head when I stopped to think how to implement it. I mean if you are able to create a file system, to create a file system which can span across multiple splits means you are re-inventing something which can do RAID or LVM! Not an easy one. Not at all.
- We want it to be “mountable”. In most cases, mounting on a Unix like system (such as macOS or Linux) would be easier as there are libraries that can help. However, mounting requires you to deal with inodes, permissions and other things. Mounting on Windows is something I don’t even know.
- But we do not want to impose a “file system”. One of the requirements of a vault created by Chamber (called “Chamberfile”) is that is should not impose a file system so that the cross-compatibility could be guaranteed across various OSes. Since I am thinking of getting it to (eventually) work on Mobile devices too, using an existing file system inside the vault would be impossible.
exFAT
file system is a good candidate but I can’t make it work (not easily enough, at least) with the requirements of splittability and durability. - Durability is one of the promises that I want a vault to make. Something that a typical cross platform file system won’t be able to give. Durability in a file system is achieved by using a “Journal” which acts more or less like a WAL (Write Ahead Log). Journaling file systems are much, much harder than simple ones.
The rehearsal of my constraints and benefits
But then, I was not supposed to be dealing with raw devices. Instead I had to create individual files. When you deal with Raw devices, reading and writing to it, from a file system perspective is a very critical task. But then I was not going to have to deal with that. I had to think more about “how would the chunks of my vault file be cached in the OS buffer” rather than “how do you bring in a particular cluster/page of data from disk to memory with reliable flush-to-disk operation”.
I was supposed to think about how to save data coming from a plaintext file into another file in a way that the offset could be recorded. I could make a separate area for index within that file. Since I could make it splittable, maybe I could also create a separate file for index!?
About encryption
I had thought of using AES256 as the encryption mechanism. AES is a block cipher which means that I can easily split any data file into 256 bit parts and encrypt them and join them back into an encrypted file (that is what happens anyway). So if I could split a large file with a predefined MAX_CHUNK size which is a multiple of 256 bits then I could easily create splits, with each split containing a piece of a large file.
Can we use already existing tools?
I was able to look so far because I stood on the shoulders of giants.
Most of software engineering is based on creating layers on top of existing stable layers. We have gone from punching holes on a card to writing binary, to assembly, to C, to Java and now we have enough technology to stare at beautiful unknown people trying to get more likes and views.
“Hmmm… maybe something that already exists can solve this set of requirements.” was the thought.
And sometimes, as it happens, solutions show themselves up just because your mind was looking for them and they have been all around since day one. Or at least that’s what happened in this case.
I have been a full-stack (web) developer and have almost always been backend-heavy on the engineering. I had once upon a time imagined (like many, many others like me) that if I keep the user-uploaded assets like images and attachments inside a database, then it would be so much easier to backup and restore. In one of my earliest projects I actually tried it too and I had actually tried simulating the folder based structure on my database (using a couple of tables) too. And this idea was already in my head, sitting in a corner.
So when SQLite got mentioned somewhere
And I think it was on a blog post or a YouTube video that it happened, my mind clicked suddenly. It was the (almost) literal bulb-glow-above-your-head moment for me.
SQLite. Hmmm… can it solve my problem? Let’s see:
- It is a database. So, I can express the “index” of files pretty easily and it will be pretty fast to implement, execute and verify. I would also be able to add new kinds of attributes much faster. I would be able to search and filter files by any combination and filters over attributes of files without much work being needed.
- It is a file. SQLite databases are saved as single files. So if I can save data and file-index in it, it will all be saved as a single file!
- I can split it. I can easily create two or more SQLite databases in a folder and with some clever data-chunk rules and some well-written code, I can maybe even create splits of the large file into smaller parts and can also read from those multiple database files and dump them into one to join them all back!
- Copy, paste, move, CoW, rename, search, listing was all too easy to be done with SQLite, at least compared to creating a new file layout. All of these were SQL queries! With some extra work, I could handle durability without even enabling WAL mode for SQLite. Allocation now meant inserting new rows in a file. Deallocation meant deleting a few entries and asking SQLite to compact the database.
That’s it. This is what I needed. The mounting would have to be implemented by me for each OS separately. And that would be the answer to most, if not all of the problems on the storage layer. One project, open-source (open to view, but not to contribute, with strict guidelines and promises), with a blessing as a license. That was my answer.
That is what I would use for the storage layer - SQLite.