In my previous article I looked at how encryption worked for Microsoft Outlook PSTs. In this article I am going to see if the encryption is reversable and if coded passwords can be decrypted.
Lets look again at the encryption of 'adad' (based on the content of my previous article):
It | C | PV | BV | P | SV | M | NV |
1 | 'a' 0x61 (97) | 0x00000000 | 0x00 | 0x61 (97) | 0x00000000 | 0x3AB551CE | 0x3AB551CE |
2 | 'd' 0x64 (100) | 0x3AB551CE | 0xCE | 0xAA (170) | 0x003AB551 | 0x36034AF6 | 0x3639FFA7 |
3 | 'a' 0x61 (97) | 0x3639FFA7 | 0xA7 | 0xC6 (198) | 0x003639FF | 0x72076785 | 0x72315E7A |
4 | 'd' 0x64 (100) | 0x72315E7A | 0x7A | 0x1E (30) | 0x0072315E | 0xFA0F3D63 | 0xFA7D0C3D |
As can be seen, 'adad' is encrypted to 0xFA7D0C3D in 4 steps.
So, how can we get from 0xFA7D0C3D to 'adad'? At first glance I think it isn't possible.
Consider the way the password is encrypted. After iteration 1, we have a NV of 0x3AB551CE, which is then right-shifted in iteration 2 so that the 0xCE component is lost. It simply isn't possible to get from the NV 0x3639FFA7 in iteration 2 back to a PV of 0x3AB551CE. The best we could possibly do is get to 0x3AB551??. In addition, there is no way to calculate BV or C at any iteration, because we need two parts of either P, BV or C.
Unfortunately, it appears the only way to decrypt the password is to break it. The encryption is not reversible.
To start the break, I have the value 0xFA7D0C3D, and a list of potential modifier (M) values. Now remember, as I am reversing and breaking the encryption, I am working backwards. The first value I am using is for the last character of the password.
The first thing I do is work out all the possible modifier values that could have been applied to the value I have. This is easy to work out - because my value (NV) starts with 0xFA, I just need to find any modifiers that start with 0xFA, which yields the following:
- 0xFA0F3D63 - 0x001E
- 0xFAF83322 - 0x0177
Now we already know that modifiers at positions higher than 0xFF cannot be generated with the ASCII character set (from my previous article on encryption), so I can discard modifier number 2, which leaves me with a single possible modifier, 0xFA0F3D63.
Right, so I now have a value, a modifier and a position. From these, I can calculate a shifted value, which is 0x0072315E (V XOR M), which will give me a PV of 0x72315E?? (left-shift the SV by 8 bits). This gives me the following:
It | C | PV | BV | P | SV | M | NV |
? | '?' 0x?? | 0x72315e?? | 0x?? | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
Now I still have no chance of calculating BV or C from P alone, but I can extrapolate potential values for these. I am going to substitute C for various different characters, and then XOR these with P to get an assumed BV. Technically, I should submit C for any valid character from the ASCII character set, but for the purposes of this article I am only going to work with the characters 'a' and 'd'. Any final decryption method would clearly need to implement a much larger character set.
Lets have a look at what results we get with each character:
- 'a' is character code 0x61, which would give BV = 0x61 XOR 0x1E (C XOR P), which gives a BV of 0x7F. This would result in a PV of 0x72315E7F.
- 'd' is character code 0x64, which would give BV = 0x7A. This would result in a PV of 0x72315E7A.
Because I have no way of knowing which PV is the correct value, I need to check each of them.
So, I use the first PV, 0x72315E7F (which came from character 'a'). I repeat the same process for this value as I did above for the first value (0xFA7D0C3D). This gives only one potential modifier, 0x72076785 from position 0xC6. I now calculate a potential SV and PV as so:
It | C | PV | BV | P | SV | M | NV |
-1 | '?' 0x?? | 0x3639fa?? | 0x?? | 0xc6 | 0x003639fa | 0x72076785 | 0x72315e7f |
? | 'a' 0x61 | 0x72315e7f | 0x7f | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
And again I repeat the same process of calculating the BV through character substitution, which gives the following:
- C = 'a', BV = 0x61 XOR 0xC6, BV = 0xA7, PV = 0x3639FAA7
- C = 'd', BV = 0x64 XOR 0xC6, BV = 0xA2, PV = 0x3639FAA2
Following this I now take the PV 0x3639FAA7 and run it through the same process. This results in the following modifiers:
- 0x36034AF6 - 0x00AA
- 0x36b8DEAF - 0x01B7
Now I know that modifier positions above 0xFF aren't permitted so I can discard modifier 2, but again modifer 1 is valid so I calculate SV and potential PV with this:
It | C | PV | BV | P | SV | M | NV |
-2 | '?' 0x?? | 0x3ab051?? | 0x?? | 0xaa | 0x003ab051 | 0x36034af6 | 0x3639faa7 |
-1 | 'a' 0x61 | 0x3639faa7 | 0xa7 | 0xc6 | 0x003639fa | 0x72076785 | 0x72315e7f |
? | 'a' 0x61 | 0x72315e7f | 0x7f | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
And again I repeat the same process of calculating the BV, which gives the following possibilities:
- C = 'a', BV = 0x61 XOR 0xAA, BV = 0xCB, PV = 0x3ab051CB
- C = 'd', BV = 0x64 XOR 0xAA, BV = 0xCE, PV = 0x3ab051CE
So I take the PV ox 0x3AB051CB and run it again through the same process. This gives a single potential modifier of 0x3AB551ce at position 0x61.
If I repeat the SV and PV calculations with this modifier, it gives me the following:
It | C | PV | BV | P | SV | M | NV |
-3 | '?' 0x?? | 0x050005?? | 0x?? | 0x61 | 0x00050005 | 0x3ab551ce | 0x3ab051cb |
-2 | 'a' 0x61 | 0x3ab051cb | 0xcb | 0xaa | 0x003ab051 | 0x36034af6 | 0x3639faa7 |
-1 | 'a' 0x61 | 0x3639faa7 | 0xa7 | 0xc6 | 0x003639fa | 0x72076785 | 0x72315e7f |
? | 'a' 0x61 | 0x72315e7f | 0x7f | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
I recalcaulte BV, which gives these possibilites:
- C = 'a', BV = 0x61 XOR 0x61, BV = 0x00, PV = 0x05000500
- C = 'd', BV = 0x64 XOR 0x61, BV = 0xCE, PV = 0x05000505
Lets again take the first PV, 0x05000500, and run it through the same process, which yields a single modifier of 0x05005713 at position 0xC7.
This time, if I calculate the SV and PV values, I get:
It | C | PV | BV | P | SV | M | NV |
-4 | '?' 0x?? | 0x005213?? | 0x?? | 0xc7 | 0x00005213 | 0x05005713 | 0x05000500 |
-3 | 'a' 0x61 | 0x05000500 | 0x00 | 0x61 | 0x00050005 | 0x3ab551ce | 0x3ab051cb |
-2 | 'a' 0x61 | 0x3ab051cb | 0xcb | 0xaa | 0x003ab051 | 0x36034af6 | 0x3639faa7 |
-1 | 'a' 0x61 | 0x3639faa7 | 0xa7 | 0xc6 | 0x003639fa | 0x72076785 | 0x72315e7f |
? | 'a' 0x61 | 0x72315e7f | 0x7f | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
However, this time I know there aren't any possibilites. A PV of less than 0x01000000 will not yield any modifiers, because there are none that start with 0x00. This means that I know there are no valid letter combinations below where I am now, '?aaaa'.
So I can discard this particular path and move back up to the next possibility, which is at '?daaa'. This has an identical modifier list (because it also starts with a NV of 0x05) and I already know that this will not yield a valid PV, so I can also discard this possibility.
This means I now need to move on to the next possibility, which begins at '?daa'.
I repeat the same process over and over, moving through the other potentials, until I manage to dismiss all possibilites that start '?a'. So I know that the password must be '?d', that is it ends with a 'd'.
So now if I go back to where I started, but work with a 'd' instead of an 'a', I get the following:
It | C | PV | BV | P | SV | M | NV |
-1 | '?' 0x?? | 0x3639ff?? | 0x?? | 0xc6 | 0x003639ff | 0x72076785 | 0x72315e7a |
? | 'd' 0x64 | 0x72315e7a | 0x7a | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
I then reapply the same rules as before, which dismisses the combinations 'aaad' and 'daad', until I reach '?dad':
It | C | PV | BV | P | SV | M | NV |
-3 | '?' 0x?? | 0x000000?? | 0x?? | 0x61 | 0x00000000 | 0x3ab551ce | 0x3ab551ce |
-2 | 'd' 0x64 | 0x3ab551ce | 0xce | 0xaa | 0x003ab551 | 0x36034af6 | 0x3639ffa7 |
-1 | 'a' 0x61 | 0x3639ffa7 | 0xa7 | 0xc6 | 0x003639ff | 0x72076785 | 0x72315e7a |
? | 'd' 0x64 | 0x72315e7a | 0x7a | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
Now you'll see we have a P of 0x61, and a SV of 0x0. Look what happens if I calculate the possible BV by using character substitution of C:
- C = 'a', BV = 0x61 XOR 0x61, BV = 0x00, PV = 0x00000000
- C = 'd', BV = 0x64 XOR 0x61, BV = 0x05, PV = 0x00000005
We know the 'd' cannot be valid because a PV of 0x5 is below the minimum allowed, which is 0x01000000, but look at the 'a' - it results in a PV of 0x0. This means there are no further calculations to do - we have broken the password - and given a final result of 'adad':
It | C | PV | BV | P | SV | M | NV |
-3 | 'a' 0x61 | 0x0000000 | 0x00 | 0x61 | 0x00000000 | 0x3ab551ce | 0x3ab551ce |
-2 | 'd' 0x64 | 0x3ab551ce | 0xce | 0xaa | 0x003ab551 | 0x36034af6 | 0x3639ffa7 |
-1 | 'a' 0x61 | 0x3639ffa7 | 0xa7 | 0xc6 | 0x003639ff | 0x72076785 | 0x72315e7a |
? | 'd' 0x64 | 0x72315e7a | 0x7a | 0x1e | 0x0072315e | 0xfa0f3d63 | 0xfa7d0c3d |
Now, I have demonstrated that the password encryption can be broken, and we can definitely get the valid password from the final coded value.
However, breaking the password in this way is very expensive. I have listed some important points that can affect the performance of the break:
CHARACTER SET
If you consider that at each level there are 223 (0xDF) ASCII character possibilities (assuming that control characters below 0x20 are ignored), and that there may be many possible modifiers at each level (up to 4), this means that at each level there could be 892 possibilities.
This is ok if the password is very short, but for a password of length 5 there are 564,708,431,199,232 combinations (over 564 trillion). At length 10 there are 318,895,612,267,497,741,289,677,389,824 (over 318 trillion trillion)!
We can reduce the cost of breaking passwords significantly if we make assumptions during the character substitution stage. We could work with specific character sets, which would reduce the cost as so:
Set | Poss (1) | Poss (length 5) | Poss (length 10) |
All 223 | 892 | 564,708,431,199,232 | 318,895,612,267,497,741,289,677,389,824 |
0-9A-Za-z | 248 | 938,120,019,968 | 880,069,171,864,760,718,721,024 |
a-z | 104 | 12,166,529,024 | 148,024,428,491,834,392,576 |
As you can see, reducing the available character set would make a huge difference to the number of possibilities. The scenario listed above, however, assumes the worst case - that there are 4 valid modifiers at each level, which is unlikely. For most PV there are 1 or 2 modifier values.
CHARACTER POSITIONING
In the English alphabet, certain letters are much more likely to appear in words than others.
Using this information, we could order the characters so the most likely characters to be used come first, which means more likely letter combinations are broken earlier in the process.
A good example would be to put the letter 'e' before all others, because it is the most common letter used in English words. Letters such as 'q' could be placed at the end of the list or maybe excluded altogether - just taking 'j' and 'q' out of the character set 'a-z' would reduce the possibilities in a 10 letter password by 55% - 80,000 trillion.
PASSWORD LENGTH
We know that the maximum password length for a PST password is 15 characters.
Using this knowledge, we can ensure that we do not waste time calculating impossible passwords. As soon as we reach a 16th decryption step, we can assume that this is not a possible password.
Additionally, we can save significant time in processing if we make an assumption about the password length.
Consider the following, if I have to calculate all the possible passwords for the character set 'a-z' to a length of 15 characters with the maximum 4 modifiers at each level, there are ((26 * 4) to the 15) possibilities - an amazing 1,800,943,505,506,915,684,277,314,125,824!
However, if I assume the maximum password size is 8, then this becomes ((26 * 4) to the 8) possibilities - a mere 13,685,690,504,052,736. This is over 131 trillion times less possibilities.
Obviously this is several orders of magniture less, and will require significantly less effort to break.
LOWEST LEVEL LETTER SUBSTITUTION
At the bottom level of a decryption break, where there is a NV of 0, we might have a remaining character that is not part of our character set.
A good example is the password 'fred', which encrypts to 0x1B6D2E42.
This password can be broken using the standard method, but by allowing the lowest letter of the password to be substituted by any valid ASCII character, it is actually possible to break this faster. The result is the password '¹mrzab', which can be copied and pasted into Outlook.
If we have assumed our password length is 8 or less characters, this will result in a significantly faster break (over 50%) for passwords such as these, and there is no additional cost involved for us in allowing this.
The only issue is that you may not necessarily obtain the original password.
Well that concludes my work on Outlook PST password decryption. I believe I have successfully demonstrated that Outlook PST passwords are vulnerable.
Although, as with SourceSafe passwords, I wonder whether there is really a problem that the passwords are vulnerable. Anyone who is concerned about keeping the e-mail very secure will no doubt implement Microsoft Exchange with Active Directory integrated security - which is infinitely more secure than a PST on the desktop.
As always, if you want to keep your data secure use a long password with mixed case letters, numbers and symbols (if permitted). Using such a technique with Outlook PSTs would significantly increase the amount of work required to perform a break by several orders of magnitude. You have been warned...