Sandsifter: find undocumented instructions and bugs on x86 CPU

mcculley · on July 28, 2017

This is great. That a program can learn about and exploit the CPU on which it is running from unprivileged userspace reminds me of the notion in Charlie Stross' Accelerando of running a timing attack against the universe to learn about the virtual machine in which we are being simulated.

michaelmior · on July 28, 2017

It would be even better if there was a web service that would collect these logs for different processors so everyone didn't have to invest the time to run the analysis.

tux1968 · on July 28, 2017

Well they do mention in the description:

The results of a scan can sometimes be difficult for the tools to automatically classify, and may require manual analysis. For help analyzing your results, feel free to send the ./data/log file to xoreaxeaxeax@gmail.com. No personal information, other than the processor make, model, and revision (from /proc/cpuinfo) are included in this log.

collinmanderson · on July 28, 2017

It would be nice if there were something like Geekbench where it publicly listed the results from different processors.

michaelmior · on Aug 1, 2017

That's great, but doesn't solve the problem of how long it takes to run a scan.

qubex · on July 28, 2017

I'd never heard of Charlie Stross or his Accelerando book. Thanks for mentioning that, it looks right up my hard-sci-fi alley.

vsviridov · on July 28, 2017

Then definitely also check out the Quantum Thief trilogy, by Hannu Rajaniemi.

lilyball · on July 28, 2017

Amusingly, the cover of The Quantum Thief (at least on iBooks) has the following quote:

> "The best first SF novel I've read in years. Hard to admit, but I think he's better at this stuff than I am." -Charles Stross

moomin · on July 28, 2017

I remain completely unmoved by HR's output. But I'm most definitely in the minority.

okreallywtf · on July 28, 2017

Seconded, in the beginning it almost has a fantasy vibe since what it describes for the most part is so advanced it almost seems magical, but stick with it because it is fantastic.

moomin · on July 28, 2017

He actually posts on here from time to time...

He's fairly prolific and very talented. The only one of his books I wouldn't particularly recommend is his first. The sequel's great, though.

Many of his straight up thrillers e.g. Saturn's Children have well realised universes they inhabit.

pmarreck · on July 28, 2017

I believe there was also a Rick & Morty episode about this (of course...)

winter_blue · on July 28, 2017

You mean the first episode of the third season where Rick breaks out of the virtual reality interrogation room he was in, by simply presenting some data (i.e. equations) that turned out to be code that took control of the system?

anigbrowl · on July 28, 2017

Posting spoilers is not clever.

winter_blue · on July 28, 2017

Sorry, I should've just mentioned the season & episode number. Or nothing at all.

I would like to delete the comment, but it seems I just passed the 2-hour mark wherein editing/deletion is possible, and I can't remove it.

anigbrowl · on July 28, 2017

Oh well, we've all made similar mistakes :)

_wmd · on July 28, 2017

tl'dr of the slides:

    Found on one processor... instruction
    Single malformed instruction in ring 3 locks
    Tested on 2 Windows kernels, 3 Linux kernels
    Kernel debugging, serial I/O, interrupt analysis seem to confirm
    Unfortunately, not finished with responsible disclosure
    No details available [yet] on chip, vendor, or instructions

He's found a new f00f bug, winter 2017 is going to be interesting :)

tempay · on July 28, 2017

For those not aware: https://en.wikipedia.org/wiki/Pentium_F00F_bug

Can these kind of bugs possible to exploit to cause anything more than minor annoyance?

hexadecimated · on July 28, 2017

If it works inside a VM, an attacker could potentially cause a widespread denial of service on cloud computing platforms like Azure and AWS.

viraptor · on July 28, 2017

Use them to exploit the system itself - not likely. (Unless they cause some specific bad behaviour rather than a crash) But you can definitely use a DoS issue for other effects. For example if someone is using an auth revokation system which fails open, you could kill that part to use expired credentials. Or if you're able to sometimes inject data, you can keep killing the caching systems until your response is the saved one. (Like in DNS hijack)

qb45 · on July 28, 2017

Observation: the length of the censored "XXX hardware bug" text on the slides matches neither Intel, AMD nor Transmeta. Unlikely to be VIA too.

Either it's deception or perhaps some obscure low-end embedded vendor.

edit: for the curious, it's "(redacted) hardware bugs" :)

Veedrac · on July 28, 2017

You mean the black bar on the PDF? That just says "(redacted)".

rst · on July 28, 2017

Or they were smart enough to change the size of the box so that it can't be used to easily identify the vendor (from among a very small set of candidates).

ericfrederich · on July 28, 2017

"Obscure" enough to run Windows though

duskwuff · on July 28, 2017

Possibly something weird like Vortex86?

rfth · on July 28, 2017

If I was a betting man I would say ARM.

adrianmonk · on July 28, 2017

Isn't this fuzzing tool x86-only?

hellbanner · on July 28, 2017

Related: https://www.theregister.co.uk/2013/05/20/intel_chip_customiz...

"Everybody hates the golden screwdriver upgrade approach, where a feature is either hidden or activated through software, but the truth of the matter is that chip makers have been doing this sort of thing for decades – and charging extra for it."

""We are moving rapidly in the direction of realizing that people want unique things and they are going to want them in silicon. In some cases, it will be done in software," said Waxman."

Also, Github says "several million" undocumented instructions.. is that right? I don't know much about assembly but that number sounds absurdly high.

bem94 · on July 28, 2017

>> "several million" undocumented instructions.. is that right?

Bear in mind that doesnt really mean that there are several million operations / opcode mnemonics which are undocumented but each distinct instructions.

It is more likely they are "loose" decodings of other instructions, where changing a single bit of the opcode still causes the CPU to decode the same instruction.

Toy example: If I encode my (imaginary ISA) 8bit instruction for "ADD EAX EBX" as 0101_X000 where X is "don't care" then regardless of whether the core gets 0101_0000 or 0101_1000 , it will still execute the ADD instruction.

Now imagine your instructions can be upto 16 bytes long, and you see how loose decoding can lead to a lot of instructions which are undocumented, but that the processor is perfectly happy to execute.

SilasX · on July 28, 2017

Correct me if I'm horribly misunderstanding [1], but isn't there a more general point here?

A CPU is, at root, a massive Boolean circuit wrapped in a flip-flop and some persisted state. The binary [sequence corresponding to an] opcode is just an input that determines which inputs go where.

Thus, for an n-bit opcode width, there are 2^n valid opcodes. Only m of them will correspond to intelligible, "I might want to use that some day" instructions. (Others, as you note, will be functionally equivalent versions of the m and ignorable as well.)

And so you will have 2^n - m "undocumented opcodes".

[1] based on reading NAND to Tetris

thethirdone · on July 28, 2017

> And so you will have 2^n - m "undocumented opcodes".

Not quite, there will be 2^n - m possible opcodes, but not all of them will have functionality attached. Many may end up being illegal.

So you could have a processor with m=400 and n=16, but no valid opcodes besides the 400. All 2^16 - 400, could throw an Illegal instruction exception.

SilasX · on July 28, 2017

So I must have some big misunderstanding then -- in what sense can a different binary input to a boolean circuit throw an illegal exception? How does the concept make sense at that level?

muricula · on July 29, 2017

Processors have "traps" or "exceptions" which work kind of like interrupts. They are utterly unlike exceptions in say C++. They transfer control to another location you specify in an interrupt table. The most well known of these is the "page fault" which occurs when you write/read/fetch from memory you're not allowed to.

Say you have a userland process which dereferences a NULL pointer in C. In the CPUs page table the NULL virtual address is not mapped to a physical address, so a page fault occurs. Control transfers to the OS page fault handler which then delivers the SIGSEGV signal to the process.

There's also a division by zero exception, and several more. You can read more about them here: http://wiki.osdev.org/Exceptions

SilasX · on July 29, 2017

That much makes sense, but the original framing of the opcode as illegal still comes off as a category error to me (though perhaps that phrasing is common and excepted). The opcode itself isn't illegal, as opcodes are a CPU-level concept, where nothing is illegal.

Rather, the opcode may bring the memory state to something that might violate some OS's security model. But the opcode still does something to the CPU (Boolean) circuit state.

sand500 · on July 30, 2017

You could think of it as a switch case statement where the valid opcodes are the cases and there is a default that catches all non defined opcodes and throws an error. Its illegal in the sense you should never expect to run an instruction with those opcodes.

hellbanner · on July 28, 2017

Thanks. The scale is still hard to wrap my head around but I see what you're saying.

Could this tool find hardware backdoors?

duskwuff · on July 28, 2017

> Could this tool find hardware backdoors?

Only very crude ones. A competently implemented hardwre backdoor would probably be data-dependent. For instance, it might trigger when REP CPUID is called with four specific 64-bit values in R8, R9, R10, and R11 -- and if that were the case, there would be absolutely no way to discover it by searching.

teeray · on July 28, 2017

There's also the fascinating variant where a control line charges a capacitor over time to activate backdoor behavior. Triggering it would look like a bunch of nonsense instructions that just so happen to keep that control line energized long enough for the capacitor to cross some activation voltage.

hellbanner · on July 30, 2017

No way to discover because of the immense search space? And/Or that if the backdoor was triggered, no way to immediately detect its effects?

bem94 · on July 28, 2017

I guess that if there was a special "Open backdoor" instruction which was undocumented, then yes I guess it could find it.

Backdoors tend to be separate systems which pry into something larger though (like the intel managment engine being a small, separate core which probes the main system). This means you normally need other means of access to the system other than the standard instruction sequence. Again, the IME needed network access to be abused I think, rather than instructions running on the main processor itself.

Implementing backdoors is stupid. Opening / accessing them with an undocumented instruction is moronic, but distressingly possible.

wolfgke · on July 28, 2017

I would rather assume that the backdoor can be accessed via some at least a little bit documented opcode, such as setting some value in an MSR (model specific register) or let some instruction do interesting side effects on some obscure preconditions, such as if the registers are filled with specific values, the instruction will do something completely different.

bluGill · on July 28, 2017

Depends on how you count. Is the instruction that adds register one to register two and stores it in register three different from the instruction that adds register one to register two and stores it in register four? The only difference is the register the data is stored in after the add. I can argue this either way, and you should be able to as well (though you may find one side is a lot more compelling).

Really an undocumented instruction is just anything that isn't documented, it is good practice to leave some blank space in your instruction set so that you can implement the next feature that needs a new instruction. As such we expect to find millions of potential instructions: there is nothing there but the next CPU might have something.

wolfgke · on July 28, 2017

It is a difference to reserve space in the instruction encodings vs. having an undocumented instruction.

For the former when trying to encode it an "undefined instruction" interrupt should occur.

dtx1 · on July 28, 2017

This is highly interesting. I assume a lot of those are going to be debug and instructions to help the binning process. Some of these might even unlock access to parts of the CPUs we aren't supposed to have access too, opening the doors to custom microcode (unlikely that anyone outside the CPU OEM can do that though) but may allow us to disable "security features" such as the Management Engine. This is a really interesting approach and i would love to see the results ported to other hardware/vendors. The same could potentially be done with GPUs, ARM-CPUs, etc.

abainbridge · on July 28, 2017

I expect Intel burn a fuse bit at the end of the binning process to prevent such features being accessed in the finished product.

duskwuff · on July 28, 2017

Separate research has been done on microcode. The general consensus is that Intel's microcode binaries are encrypted, and are secured with a RSA2048-SHA256 signature.

http://inertiawar.com/microcode/

webreac · on July 29, 2017

I am surprised private keys have never leaked.

duskwuff · on July 30, 2017

I wouldn't be surprised if those keys are in a HSM, so they can't be leaked. That'd be the safe way of handling it, and it's well within Intel's resources.

fovc · on July 28, 2017

Here's a link to the slides [pdf]: https://github.com/xoreaxeaxeax/sandsifter/raw/master/refere...

partycoder · on July 28, 2017

Also from the same author https://sites.google.com/site/xxcantorxdustxx/visual-re

chungy · on July 28, 2017

That looks fun, but the site doesn't seem to have any downloads available?

partycoder · on July 28, 2017

There was a demo around somewhere. Not hard to find.

There is a similar --but less featured-- open source project. https://github.com/wapiflapi/veles

cornchips · on July 28, 2017

https://www.reddit.com/r/ReverseEngineering/comments/1izity/...

SAI_Peregrinus · on July 28, 2017

Christopher Domas does some very cool work. His System Management Mode exploit a few years back was quite nice. It will be interesting to see which processor it is that he found the ring 3 hard lockup instruction in...

rasz · on July 28, 2017

He works for spooks - Battelle Memorial Institute, a long-time NSA/CIA contractor. One of the places that hires officially retired spies.

azinman2 · on July 28, 2017

and therefore.... ?

qb45 · on July 28, 2017

Therefore there is suspicion that spooks may have been using that for years and revealed it once they've had enough for some reason. I'm not familiar with any proven case of such situation, but people speculate.

d33 · on July 28, 2017

...isn't the usability of the tool limited because it's running in userspace, which has fewer privileges in terms of what instructions can be ran?

c12 · on July 28, 2017

I was wondering this myself until I read the pdf:

> For effective results, the injector should be able to identify instructions in more privileged rings, even if it cannot actually execute those instructions.

>This approach allows the injector to detect even privileged instructions: whereas a non-existing instruction will throw a #UD exception, a privileged instruction will throw a #GP exception if the executing process does not have the necessary permissions for the instruction. By observing the type of exception thrown, the injector can differentiate between instructions that don’t exist, versus those that exist but are restricted to more privileged rings. Thus, even from ring 3, the injector can effectively explore the instruction space of ring 0, the hypervisor, and system management mode.

michaelmior · on July 28, 2017

So basically the same as throwing a 403 instead of 404 for authenticated resources in HTTP :)

poizan42 · on July 28, 2017

When it comes to discovering possible bugs then there is really no guarantee that the instructions are acting as they should though.

askvictor · on July 28, 2017

As the slides say, this approach prevents the system from falling over entirely, while still resolving instructions from deeper rings.

d33 · on July 28, 2017

Makes sense. I was thinking if there could be a bootable fuzzer of this kind, but you're right that it would be very difficult for it to be both usable and not crash very quickly.

dtx1 · on July 28, 2017

I think once you found all possible instructions it shouldn't be too hard (for someone much smarter then me) to essentially generate a minimal OS that systematically tries out all found instructions from ring 0. That should dramatically reduce the amount of reboots necessary to actually try them all out compared to bruteforcing them from ring 3

d33 · on July 28, 2017

What about hangs? I think you'd need a watchdog in order to have the process automated.

qubex · on July 28, 2017

Firstly, the paper clearly explains that a userland (ring 3) process can easily discriminate between instructions that do not exist and instructions that the process lacks the credentials to execute simply because the two cases give rise to different exceptions (undefined and general protection, respectively).

Secondly, one could trivially log "I am going to try this" to a file, go ahead and do it, and if the machine crashes you consult the log and read exactly what was about to be done and evidently caused a hang. You don't necessarily need to write a log entry after doing something, in this case you can deterministically log what you are about to do and make deductions therefrom.

dtx1 · on July 28, 2017

I'm sure there are ways to read out a CPU Hang and reset if that happens, if necessary via external hardware^(I'm guessing)^(I don't actually know)

flopunctro · on July 28, 2017

Some enterprise-grade server platforms already have this functionality, it's called a "watchdog". Linux supports this since (at least) 2.4.

I've been using this with HP ProLiant servers and (for me, at least) it has always worked as intended.

For more info, search "linux watchdog timer".

wongarsu · on July 28, 2017

It would also be fairly simple to build for any computer using just an arduino or a raspberry pi. The computer could send a periodic signal over the serial port, if the arduino doesn't receive a signal for some time it shorts the mainboard's reset pin, causing a reboot.

qb45 · on July 28, 2017

I think they are common on Intel systems, my old laptop with ICH7-M chipset has one.

hamiltonkibbe · on July 28, 2017

You'd be hard pressed to find a processor without a WDT

fooker · on July 28, 2017

Sure, just have the 'os' ping the external device in N second intervals. Without a ping, reset through a jumper.

partycoder · on July 28, 2017

Lot of weird stuff done happening nowadays in CPUs.

There's a lot of mystery in microcode (equivalent to the CPU firmware), the "system management mode" aka protection ring -2, and the infamous management engine.

tonyg · on July 31, 2017

I wonder what dbe0, dbe1, and df{c0-c7} do? They are present and undocumented in all of Intel, AMD and VIA's variations (see p4-p5 of the paper).

pbsd · on July 28, 2017

For what it's worth, the size-prefixed jcc/call binutils bug had already been fixed a couple of years ago: https://sourceware.org/bugzilla/show_bug.cgi?id=18386

pwdisswordfish · on July 28, 2017

The slides mention an 'apicall' opcode 0ffff0; searching the web turns up nothing but these same slides. Does anyone know anything about it?

wmu · on July 28, 2017

It seems to be a MS antivirus bug: http://securityaffairs.co/wordpress/60434/hacking/microsoft-...

rurban · on July 29, 2017

Regarding the ring 3 hard lockup he didn't disclose yet: isn't that the recent kaby lake/skylake error, released about a month ago?

ngneer · on July 28, 2017

Chip vendors do the same in the course of validation, and technically even before any silicon has been fabricated, using simulators.

shdon · on July 28, 2017

No instructions there to disable the IME?

pgeorgi · on July 28, 2017

If anything, I'd expect such a flag to hide behind MSRs (http://wiki.osdev.org/Model_Specific_Registers)

That's a mostly unused namespace of 2^32 64bit registers. To hide things even better, it would also be possible to change behavior based on officially unrelated registers (eg. MSR $x only acts as IME-switch if the calling address also ends in $y and esi is $z)

cesarb · on July 28, 2017

They could also be multiplexed (MSR $x is address/command, MSR $y is data). Or require a sequence of operations (write this magic sequence of numbers to MSR $z). Or memory-mapped/IO-mapped (with the mapping enabled/disabled by MSR or PCI registers). Or be locked by the BIOS during the boot sequence.

But IMO, it probably can't be disabled at all. The "disabling" would be to change the program it runs to a program which does nothing. So there wouldn't be a "disable IME" bit; there would be bits to either make its memory visible to the main CPU cores, or to read/write to its memory, and it's possible that these bits are accessible only from the IME side, or from SMM.

Kliment · on July 28, 2017

The easiest ways to access them is to rewrite that section of the BIOS directly, such as https://github.com/corna/me_cleaner/wiki/How-does-it-work%3F which literally overwrites them with nops

__jal · on July 28, 2017

Given Intel's behavior around the IME, I rather doubt there's an instruction for that. The only verifiable way to do so that I know of, on some chips, is here:

https://hardenedlinux.github.io/firmware/2016/11/17/neutrali...

YMMV, not responsible for bricked chips, and note the caveats at the end.

egberts1 · on July 28, 2017

found another that is QEMU-specific.

https://github.com/unicorn-engine/unicorn/issues/364

egberts1 · on July 28, 2017

It is more about modifying executable code space and not making it stick. Good enough for fooling AV.

purpleidea · on July 28, 2017

wow... anyone have a link to the video of his talk?

pmarreck · on July 28, 2017

Is this basically a CPU fuzzer?

deathanatos · on July 29, 2017

The subtitle at the very top of both the page and the README…

> The x86 processor fuzzer

brawny · on July 29, 2017

Out of curiosity, are there any toy compiler projects out there that try and make use of the incedental instructions? Could you possibly expect to see a with while performance boost (I'm thinking it would be unlikely...)

m00dy · on July 28, 2017

Someone built a fuzzer for cpus