Monday, June 23, 2014

Fire in the (root) hole!

This will, I think, be the first time blogging about something quite so retroactively, but for reasons which should be apparent, I could not blog about this little adventure until now.  This is the story of CVE-2014-0972 (QCIR-2014-00004-1), and (at least part of) how I was able to install fedora on my firetv:

Introduction..

Back in April, I bought myself a Fire TV, with the thought that it would make a nice fedora xbmc htpc setup, complete with open src drivers, to replace my aging pandaboard.  But, of course, as delivered the Fire TV is locked down with no root access.

At the same time, there was a feature of the downstream android kernel gpu driver (kgsl), per-context pagetables, which had been on my TODO list for the upstream drm/msm driver for a while now.  But, I needed to understand better what kgsl was doing and the interactions with the hardware, in particular the behaviour of the CP (command processor), in order to convince myself that such a feature was safe.  People generally frown on introducing root holes in the upstream kernel, and I didn't exactly have documentation about the hardware.  So it was time to roll up my sleeves and get some hands-on experience (translation: try to poke and crash the gpu in lots of different ways and try to make sense of the result).

Into the rabbit hole..

The modern snapdragon SoCs use IOMMUs everywhere.  Including the GPU.  To implement per-context gpu pagetables, basically all the driver needs to do is to bang a few IOMMU registers to change the pagetable base addr and invalidate the TLB.  But this must be done when you are sure the GPU is not still trying to access memory mapped in the old page tables.  Since a GPU is a highly asynchronous device, it would be a big performance hit to stall until GPU ringbuffer drains, then reprogram IOMMU, then resume the GPU with commands from the new context.  To avoid this performance hit, kgsl maps some of the IOMMU registers into the GPU's virtual address space, and emits commands into the ringbuffer for the CP to write the necessary registers to switch pagetables and invalidate TLB.

It was this reprogramming of IOMMU from the GPU itself which I needed to understand better.  Anyone who understands GPU's would have the initial reaction that this is extremely dangerous.  But kgsl was, it seemed, taking some protections.  However, I needed to be sure I properly understood how this worked, to see if there was something that was overlooked.

The GPU, in fact, has two hw contexts which it can switch between.  Essentially it is in some ways similar to supervisor vs user context on a CPU.  The way kgsl uses this is to map the IOMMU registers into the supervisor context, but not user contexts.  The ringbuffer is mapped into all the user contexts, plus supervisor context, at the same device virtual address.  The idea being that if the ringbuffer is mapped in the same position in all contexts, you can safely context switch from commands in the ringbuffer.

To do this, kgsl emits commands for the CP to write a special bit in CP_STATE_DEBUG_INDEX to switch to the "supervisor" context.  Then commands to write IOMMU registers, followed by write to CP_STATE_DEBUG_INDEX to switch back to user context.  (I'm over-simplifying slightly, as there are some barriers needed to account for asynchronous writes.)  But userspace constructed commands never execute from the ringbuffer, instead the kernel puts an IB (indirect branch) into the ringbuffer to jump to the userspace constructed cmdstream buffer.  This userspace cmdstream buffer is never mapped into supervisor context, or into other user's contexts.  So in theory, if userspace tried to write CP_STATE_DEBUG_INDEX to switch to supervisor mode (and gain access to the IOMMU registers), the GPU would immediately page fault, since the cmdstream it was in the middle of executing is no longer mapped.  Ok, so far, so good.

Where it breaks down..

From my attempts at switching to supervisor mode from IB1, and deciphering the fault address where the gpu crashed, and iommu register dumps, I could tell that the next few commands after the switch to supervisor mode where excuted without problem.. there is some prefetch/pipelining!

But much more conveniently, while poking around, I realized that there were a couple pages mapped globally (in supervisor and all user contexts), which where mapped writable in user contexts.  I used the so called "setstate" buffer.  So I simply had to construct a cmdstream buffer to write the commands I wanted to execute into the setstate buffer, and then do an IB to that buffer and do the supervisor switch in IB2.

Ok.. but do do anything useful with this, I'd need a reasonable chunk of physically contiguous pages, at a known physical address.. in particular 16K for first level pagetables and 16K second level pagetables.  Fortunately ION comes to the rescue here, with it's physically contiguous carveouts at known physical addresses.  In this case, allocate from the multimedia pool when there is no video playback, etc, going on.  This way ION allocates from the beginning of the carveout pool, a known address.

Into this buffer, construct a new set of pagetables, which map whatever physical address you want to read/write (hint, any of kernel lowmem), a replacement page for the setstate buffer (since we don't know the original setstate buffer's physical address.. which means we actually have two copies of the commands copied into setstate buffer, one copied via gpu to original setstate page, and one written directly by cpu in the replacement setstate page).


The proof of concept that I made simply copied the string "Kilroy was here" into a kernel buffer.  But quite easily any random app downloaded from an untrusted source could access any memory, become root, etc.  Not the sort of thing you want falling into the wrong hands.

Once I managed to prove to myself that I understood properly how the hw was working, I wrote up a short report, and submitted it (plus proof of concept) to the qualcomm security team.

Now that the vulnerability is no longer embargoed, I've made available the proof of concept and report here.

Originally I planned to (once fixes were pushed out, so as to not put someone who did not intend to root their device at risk) release a jailbreak based on this vulnerability.  But once towelroot was released, there was no longer a need for me to turn this into an actual firetv jailbreak.  Which saves me from having to figure out how to make an apk.

Parting thoughts..

  1. Well, knownledge about physical addresses and contiguous memory in userspace, while it might not be a security problem in and of itself, it sure helps turn other theoritical exploits into actual exploits.
  2. As far as downstream vendor drivers go, the kgsl driver is actually pretty decent, in terms of code quality, etc.  I've seen far worse.  Admittedly this was not a trivial hole.  But imagine what issues lurk in other downstream gpu/camera/video/etc drivers.  Security is often not simple, and I really doubt whether the other downstream drivers are getting a critical look (from good-guys who will report the issue responsibly).
  3. I used to think of the whole one-kernel-branch-per-device wild-west ways of android as a bit of a headache.  Now I realize it is a security nightmare.  An important part of platform security is being able to react quickly when (not if) vulnaribilites are found.  In the desktop/server world, CVEs are usually not embargoed for more than a week.. that is all you need, since fortunately we don't need a different kernel for each different make and model of server, laptop, etc.  In the mobile device world, it is quite a different story!