TPM Carte Blanche

tcb-demo

I like to tinker with the TPM in my spare time. It’s like a great big box of security legos, or like a dryer, punk-er form of Minecraft. It’s pretty fun.

In 2017 I had the privilege of working on mitigations for an issue called ROCA. From then on, I’ve been fascinated by the idea of the Trusted Computing Base.

I believe that Murphy’s Law applies equally to code as it does to which way buttered toast will fall, or whether two intersecting clues in the New York Times crossword will be unusual names of minor celebrities from the 1970’s. The meaning of the TCB isn’t so much “your system is safe because of this smart stuff in this box” as it is “your system is utterly booched if when we find any important mistakes in this box”.

Also, the amount of mistakes we know about in any given box is a monotonically increasing function of time. Nobody says “good news, we’ve discovered some unexpectedly correct behavior in your kernel.”

I recently came into the possession of a Surface Pro 3, which is a machine that I happen to know shipped with TPMs affected by ROCA. I thought “ah, this is my chance to apply my superficial understanding of finite-field arithmetic to learn some more about this bug and how it was discovered.” So, I installed Linux on it, sshed in, and installed some TPM tools. My go-to “hello world” TPM tool is gotpm and reading the PCRs is a pretty basic TPM activity that lets you know you’re talking to a TPM.

So, when I was greeted with these PCRs, I knew that obviously I had made a mistake, and I was talking to some simulated, shim TPM or something:

0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
52dafc83858586083a5b09d80f4c75e77180691ee717d30bc07194713d5884b3
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
0000000000000000000000000000000000000000000000000000000000000000

Except I wasn’t. This was my real TPM. For a sense of which of these rows of nonsense are supposed to not all be 0’s or f’s, see the SHA1 bank:

3dcaea25dc86554d94b94aa5bc8f735a49212af8
b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
27b2c869333dbe59c520294ffb652964da78b7ce
fe1b29141dcded019fe3423df304b5676c6c58d6
b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
03e7b21f363721d4a04a550602c0742291f735b4
c7e3ff980c58cd67bc554519847b585de6b9bd33
6167364dca8c424a2f5b1a9b3df5f121ddcfab4b
e13d0ffa292ef1e530005679978a0c5aae8967d3
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
ffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffff
0000000000000000000000000000000000000000

This is pretty bad, because a bank of empty PCRs means an ~~attacker~~ researcher can just boot up a custom OS that doesn’t make any measurements, and use simple tools to extend whatever they want into those PCRs and attest them honestly (from the TPM’s point of view).

So, what prevents this happening normally, on all of the computers all of the time? Well, theoretically your BIOS has a well-behaved, immutable initial boot block that always makes proper measurements into the TPM, and whenever it fails to make proper measurements into the TPM it creates a small wormhole and sucks itself and your BitLocker key into the Negative Zone or, like, North Dakota, or something.

It turns out the code in your initial boot block is more or less like code you might find elsewhere in your computer, and you should not assume it’s better just because it’s more important.

I have three takeaways from working on this project:

Building a measured and/or verified aka Secure boot attestation system without considering the TCB is like building an ice sculpture on top of a dumpster full of matches. Like, chock full, to the brim of matches. And it’s in Oklahoma in July. Your plans are neat and also doomed.
People who build security frameworks should prefer simplicity over elegance and flexibility. It’s easier not to notice one of the PCR banks hasn’t been capped by the firmware when there are N banks of them (one per hash algorithm) and all the software running on the system can interact with whichever bank suits its delicate, not-crypto-agile preferences. If there were only ever one bank of PCR active at a time, it would be harder to miss a bug like this. (Note I didn’t say it would be impossible.)
The best place for TCB is in the immutable ROM of something that’s very hard to probe, swap, or tamper. Code in ROM should measure/verify “mutable” code (and by “mutable” I really mean “your threat model shouldn’t assume your adversary can’t afford one of these”). This step should be as simple as possible, with respect for the fact that if it’s broken you may need to throw away hardware. TCG DICE is an example of up-and-coming new hotness in this area.

Microsoft’s advisory was published on 2021-10-18.

For more information, see the full writeup at google/security-research.

Opinions expressed here are my own and do not represent the official positions of any employer(s) of mine, past or present