Basic principles
Randomess is measured in entropy bits. A perfectly random sequence of letters from a 26-letter alphabet contains 4.7 bits per letter, but regular English contains only 1.6 bits per letter, spaces and punctuation excluded. This means that we need to concentrate that randomness by a at least a factor of three, since a one-time pad must consist of perfectly random symbols in order to ensure unbreakability. Perfectly random decimal digits contain 3.5 bits of entropy per digit, but this does not result in any savings since the text must first be encoded as digits if we go this way, which stretches out its entropy over more digits.
Each of the digits 0 to 9 appears pretty exactly one-tenth of the time in a random decimal string. Each letter appears one twenty-sixth of the time in a random string of letters, but in normal text some letters (E,A,O, etc.) appear much more frequently than others (X,Y,Z,Q, and so forth). Even if a piece of text is encoded as digits it is expected that some digits will be seen more often than others. This effect is called bias, and it must be eliminated from a string if it is to form the basis for a one-time pad of sorts.
Additionally, consecutive random digits or letters bear no relation to one another, so that an "A" is as likely to follow "Q" as the letter "U". But in real text, "U" follows "Q" nearly all the time. This is called dependence, and it must also be eliminated.
In order to eliminate bias and dependence, the methods I show here use several tricks:
Each of the digits 0 to 9 appears pretty exactly one-tenth of the time in a random decimal string. Each letter appears one twenty-sixth of the time in a random string of letters, but in normal text some letters (E,A,O, etc.) appear much more frequently than others (X,Y,Z,Q, and so forth). Even if a piece of text is encoded as digits it is expected that some digits will be seen more often than others. This effect is called bias, and it must be eliminated from a string if it is to form the basis for a one-time pad of sorts.
Additionally, consecutive random digits or letters bear no relation to one another, so that an "A" is as likely to follow "Q" as the letter "U". But in real text, "U" follows "Q" nearly all the time. This is called dependence, and it must also be eliminated.
In order to eliminate bias and dependence, the methods I show here use several tricks:
- Multiple addition or subtraction. If you add together (without carries) two different strings of biased digits, the result is less biased. Dependence is also lessened. Even though bias and dependence are not fully erased but simply diminished, this is useful because it is so easy to do. Ciphers BookPad and TripleText are based on this.
- Lagged Fibonacci. The effect is similar if you add to a decimal string numbers taken from the same string but at a different location. Even better, the process somehow ensures that all digits appear with the same frequency, though dependence is still a problem. I've found that doing this twice to a piece of text erases bias and short-distance dependence, though long-distance dependence is still there. The process is not unlike keeping only the last digit of the numbers in the Fibonacci series, hence the name.
- Von Neumann extractor. About sixty years ago, Hungarian mathematician John Von Neumann showed a simple way to use the result of throwing a biased coin in order to produce a non-biased binary sequence. The trick is to take the results in groups of two and use the relative order of the results, rather than the values, in order to derive the final results. This trick can be extended to sets of 6, 24, 120, etc. (whole factorials), ensuring perfect lack of bias. The only requirement and that the individual results be independent, which can be done by drawing them from spots in the text that are sufficiently far apart.
- Transposition. The von Neumann extractor needs the bits or characters to be statistically independent of one another before it is applied. The characters of regular text are not independent by any stretch if they are contiguous, but distant characters are. Shuffling the text by means of a simple columnar transposition puts originally distant characters next to each other so they can be used by the other algorithms.
To encode, or not to encode
The principles outlined above apply to strings of decimal digits as well as to strings of letters, or even binary strings. After all, letters drawn from the standard Latin alphabet can also be considered to be numbers and handled as such, and in that case we will be doing operations in base 26 instead of base 10. Base 26 operations are a little more difficult to do since we did not learn them in kindergarten, but they can be carried out easily with the help of a Tabula Recta (already invented by the the XVI Century) like the one in the picture.
Therefore, we can do the operations directly on the text without having to encode into digits. The operations themselves are a little more involved, but then we'd be skipping the encoding and decoding steps entirely, which might be worthwhile. |
oHere's the classification of the methods described in these pages, according to the basic principle involved and whether or not encoding into numbers is used (FilePad, described at the botom, is a case apart):
Multiple addition |
Lagged Fibonacci |
Von Neumann extractor |
|
Encoding is used |
BookPad |
Numeracci |
DicePad |
Direct letter operations |
TripleText, Snake |
Letteracci, Subtracci |
LetterPad |
And now, a short description of each method. Full descriptions are linked from the next page (Download), which also contains links to JavaScript versions of each. In all that follows the decrypted message is called "plaintext" (whether or not it is encoded), the encrypted message is called "ciphertext", the piece of text used as source for making the one-time pad is called "key text", and finally the one-time pad material derived from it is called "keystream". Addition and subtraction involve no carries unless specified:
Update: As it turns out, these algorithms still work pretty well (especially the lagged Fibonacci generator) if the key text is short (a few dozen characters, which is typically a lot shorter than the plaintext), so I've made a few other programs that work with relatively short text keys rather than keys at least as long as the plaintext. These ciphers are quite resistant to ciphertext-only attacks, which make them better than most classical ciphers out there, but unfortunately they are as vulnerable to known-plaintext attacks as the classical ciphers. A table follows, and after that a description of each one, with links to JavaScript programs implementing them.
- BookPad. Encode the plaintext and the key text into decimal digits. Then take a piece of encoded key text an integer number of times the length of the encoded plaintext, (at least three), split it into parts of encoded plaintext length and add them together. This results in the keystream, from which the encoded plaintext is subtracted in order to obtain the ciphertext. Decryption is the same, except that the ciphertext is subtracted from the keystream, resulting in the encoded plaintext which is then decoded back to the original. BookPad with checkerboard encoding can handle full text with spaces. The encoding can be based on the default checkerboard or (preferred) on a checkerboard derived from the first sentence of the key text.
- TripleText. Same as BookPad, except that no encoding is used and addition and subtraction are performed directly on letters (no spaces) using a Tabula Recta. The Tabula Recta itself can be based on the straight alphabet or (preferred) on an alphabet derived from the first sentence of the key text.
- Snake. Same as TripleText, except that subtractions are used rather than additions. This actually makes the operations a lot more ergonomic, so that four letters can be combined with a single snake-like movement from letter to letter within the Tabula Recta.
- Numeracci. Encode the plaintext and the key text into decimal digits. Then take a piece of encoded key text one digit longer than the encoded plaintext, then place the extra digit below the first and perform at least two extra rows of a lagged Fibonacci generator. This is done this way: add every two digits found vertically and place the result to the right of the bottom digit until a row is completed, then place the last result below the first digit of the bottom row and go on. This results in the keystream, from which the encoded plaintext is subtracted in order to obtain the ciphertext. Decryption is the same, except that the ciphertext is subtracted from the keystream, resulting in the encoded plaintext which is then decoded back to the original. As in BookPad with checkerboard encoding, full text with spaces can be used. The encoding can be based on the default checkerboard or (preferred) on a checkerboard derived from the first sentence of the key text.
- Letteracci. Same as Numeracci, except that no encoding is used and addition and subtraction are performed directly on letters (no spaces) using a Tabula Recta. The Tabula Recta itself can be based on the straight alphabet or (preferred) on an alphabet derived from the first sentence of the key text.
- Subtracci. Same as Letteracci, except that subtractions using the Tabula Recta are used in the lagged Fibonacci step, rather than additions. This is considerably easier to do, and does not require a purpose-made Tabula Recta even when it derives from the key text. Incidentally, subtraction can also be used when simply combining three pieces of text, as in TripleText, resulting in faster processing without reduced security.
- DicePad. Encode the plaintext into base 6 digits using a 6x6 Polybius square. Then take a piece of key text (not encoded) at least three times the length of the encoded plaintext, and split it into blocks of a previously agreed upon size, making three rows with each block. Then look at the relative order of the letters in vertical order and deduce a 0-6 code from each (triads with a repeated letters won't yield a code) . This results in the base 6 keystream, from which the encoded plaintext is subtracted (base 6) in order to obtain the ciphertext. Decryption is the same, except that the ciphertext is subtracted from the keystream, resulting in the encoded plaintext which is then decoded back to the original. DicePad can handle text and decimal digits, but no spaces. The encoding can be based on the default square or (preferred) on a square derived from the first sentence of the key text.
- LetterPad. Like DicePad, but without previous encoding. The key text blocks are made into four rows rather than three, so that the relative order of each vertical group of four letters gives codes from 0 to 23, which can then be converted to letters (a 24-letter alphabet is used for the plaintext as well) in order to obtain the keystream. The final operation is subtracting the plaintext from the keystream using the Tabula Recta. The Tabula Recta itself can be based on the straight alphabet or (preferred) on an alphabet derived from the first sentence of the key text.
- FilePad. This one uses the operations described above on the bits of a computer file. Typically, the number of bits is first doubled through a special "inflation" operation consisting of taking every other bit and appending it to the end of the file. Then the bits are subject to a transposition, then another inflation, and finally a von Neumann extractor to end up with roughly the same number of bits as at the start. Or one can begin with a lagged Fibonacci, followed by a transposition, and then another lagged Fibonacci and another transposition. Either way, the result is a binary sequence with very good statistical randomness, which can then be used to encrypt a second file with a simple XOR operation. Both the randomized "key file" and the encrypted file can then be downloaded.
- BytePad. An update to FilePad that operates with bytes rather than bits (1 byte = 8 bits). Its Javascript implementation runs super-fast and can handle very large files. It drops the inflation stage and von Neumann extractor of FilePad, and focuses on block transpositions and lagged Fibonacci generators. Since all these operations are reversible, the output is finally split in two halves, which are then xored with each other.
Update: As it turns out, these algorithms still work pretty well (especially the lagged Fibonacci generator) if the key text is short (a few dozen characters, which is typically a lot shorter than the plaintext), so I've made a few other programs that work with relatively short text keys rather than keys at least as long as the plaintext. These ciphers are quite resistant to ciphertext-only attacks, which make them better than most classical ciphers out there, but unfortunately they are as vulnerable to known-plaintext attacks as the classical ciphers. A table follows, and after that a description of each one, with links to JavaScript programs implementing them.
2-letter operations |
3-letter operations |
4-letter operations |
5-letter operations |
Shuffle operations |
|
for messages |
Visionnaire, FibonaRNG |
Zigzag |
Skink, Worm |
Serpentine |
Scrabble |
for files |
Filenaire, FiboFile |
FileZag |
FileSkink, FileWorm |
Fileine |
- |
- Visionnaire: This is essentially Blaise the Vigenere's classic Autokey cipher, where the Tabula Recta used for the operations includes substitutions at input and output. The strength of the cipher actually comes mainly from these substitutions rather than the autokey key (which I will call "seed" in these instructions). This is the process: 1, take the keys (there can be 2 of them, for a considerably larger key space, or just one, in which case the one key is used for both input and output substitutions) and start writing new letters as they appear; if a letter appears twice, take the first available letter in the alphabet instead; when you come to the end of the key, write the rest of the alphabet in reverse order. Thus each key generates a "mixed alphabet," one of which is placed at the top of the Tabula Recta, the other on the left side. Now take the seed, write it above the plaintext, and do the following until you use all the letters in the seed: look up a plaintext letter at the top of the Tabula Recta and go down that column until you find the seed letter above it, then left to read a ciphertext letter from the mixed alphabet on that side, and write it below the corresponding plaintext letter. After you use all the seed letters, take a plaintext letter instead, beginning with the first; mark each plaintext letter used so you don't use it again and keep moving forward until all the spaces below the plaintext have been filled. This is the ciphertext. To decrypt, the process is nearly the same, using ciphertext letters instead of plaintext, except that you draw plaintext letters rather than ciphertext as the second letter of the operation after you run out of seed, and look up the ciphertext letters on the side of the Tabula Recta, rather than the top, and read off the plaintext at the top rather than the left side. In addition to regular text written with 26 different characters, it is possible to involve alphabets with a different character set, so I have made a JavaScript program, which I call Filenaire, which takes a file, loads it into the browser as a base64 blob, and applies the same algorithm as Visionnaire but in base 64. The output is a file with extension .aire, which can be later decrypted by supplying the same keys and seed used to encrypt the original file.
- Zigzag: Visionnaire involves only two pieces of text in each operation, which is simple to do but leads to a ciphertext that is far from random. Zigzag produces a more random-like ciphertext by involving three letters in each operation after the initial phase where the seed letters are used. This way for the first one: look up a plaintext letter at the top, then go down to find the first plaintext letter, then left or right to find the second plaintext letter, then down to read the ciphertext letter at the bottom. Mark the first plaintext letter so it doesn't get used again and keep repeating the process with all the letters involved shifted one space to the right. Because of the odd number of "legs" in the operation, the mixed alphabet at the bottom must be the negative version of the one on the left, the one on the right (used when decrypting), the negative version of the one at the top. To make a negative alphabet, take the leftmost letter of the positive alphabet, and then write on the right of it the rest of the letters in reverse order. Its base64 version for encrypting files is FileZag.
- Skink: This is a lot like Zigzag, but involving four letters in each operation after the initial phase, which is identical. Then it goes this way: look up the plaintext letter to be encrypted at the top of the Tabula Recta, go down to find the plaintext letter located a number of spaces equal to the seed length to the left of that one, then right or left to find the plaintext letter to the right of that one, then up or down to find the ciphertext letter below the second letter used, and finally left or right to read the result. Decryption is similar, except that you start on the side and end at top or bottom, and the letters involved after the first are first drawn from the bottom row (ciphertext), and then the last from the top. No negative alphabets are necessary, which makes it slightly easier to set up than Zigzag. Its base64 version for files is FileSkink.
- Worm: The previous three ciphers are not reciprocal, meaning that the processes for encryption and decryption, though similar, are not exactly the same. Worm uses four letters per operation (again, after the initial set of operations involving the seed, which use only two) and only one mixed alphabet, which is placed at top and left, and the process is identical for encryption and decryption. The four-letter operations are like this the first time: take the plaintext letter and look it up at the top of the Tabula Recta, then go down until you find the first plaintext letter, then left or right to find the first seed letter, then up or down to find the first ciphertext letter, and finally left to read the ciphertext letter on the side. Alternatively, you can start on the left so long as you switch direction with each new letter. In this cipher, the repeated seed is used in every operation like one more text stream. It also has a base64 equivalent for files, called FileWorm.
- Serpentine: This reciprocal cipher involves five-letter operations on the Tabula Recta, which is about the limit of what one can do by hand, but it produces a ciphertext that is nearly indistinguishable from random letters. The first letter involved is the one immediately above the ciphertext letter than you want to fill and then a group of letters forming a square, in this order: first plaintext letter not yet re-used, the one after that, ciphertext letter below the first plaintext letter in this group, and then the one following it. Serpentine also has a base64 equivalent for files, which I have called Fileine.
- FibonaRNG: (read it as "Fibonaring") Like Visionnaire, this one involves two-letter operations, but with a twist: they are applied to the seed only, so that it is extended into a "keystream" that ends up having very good randomness, and which is then combined with the plaintext. This has two advantages: first, the resulting ciphertext is random even though the plaintext might repeat a lot; second, it resists a known plaintext attack much better since the attacker must find the keystream first, and this is protected by the encoding on the Tabula Recta. This is the overall process in the simplest mode: 1. Make two scrambled alphabets plus a seed from the key and place the alphabets at the top-bottom, and left-right sides of the Tabula Recta. 2. Take the first two letters of seed and combine them by looking up the first letter at the top of the Tabula Recta and going down until you find the second letter, then left or right to read the result at the sides; write this at the end of the seed; then take the second and third letters to extend the seed once again and keep doing the same until you have produced a number of new seed letters equal to the length of the plaintext, which is written right above them; we'll call this the keystream. 3. Now combine each plaintext letter with each keystream letter as in step 2, except that you write the result below each pair; this is the ciphertext. To decrypt, repeat the same steps with the ciphertext instead of plaintext, except that the operation in step 3 starts on the side and ends at top or bottom. The base64 version of FibonaRNG is FiboFile. This cipher ended up beating all the others here for inclusion in my crypto suite PassLok, renamed PassLok Human mode. It is the same as FibonaRNG, plus a trick to preserve spaces and punctuation, and a previous step to concentrate entropy from long text-based keys.
- Scrabble: This one is completely different, for it involves shuffling the letters in a couple alphabets made of Scrabble tiles. It goes this way: 1. Make a shuffled alphabet from the key and set it above a ruler using Scrabble tiles or similar; this alphabet will be used for the ciphertext; then make a straight alphabet, which will be used for plaintext, and place it right above the other one. 2 . Look up each plaintext letter on the uper alphabet and write down the letter below it as ciphertext. 3. After each letter lookup, take each pair of letters used (top and bottom) and switch it with the pair before it; if it was the first pair, switch it with the last pair; then take the first letter of the top alphabet and move it to the end, shifting all the letters in the top alphabet one step backward. For resistance against a pknown plaintext attack, prepend the plaintext with twenty random letters.
- Now, many of these short-key ciphers are vulnerable to a known-plaintext attack because the basic operations involved are linear. In order to prevent this attack, an easy solution is to use a keyed transposition of the ciphertext, which will be reversed as a first step when decrypting. I have made Javascript versions of some of the ciphers above with this additional step, and the result is SuperSkink, SuperWorm, and SuperFileine.