Search This Blog

Thursday, August 25, 2005

Let's Make a CPU - Dilemma - The Stack

As of right now, I have two major dilemmas in constructing the architecture. The first of these is what to do about the stack and function calls. I can think of 3 different ways to do it. But first, let me explain what I've already decided about the stack.

The x86 architecture (and possibly others that I don't know about) has the concept of pushing onto and popping off of the stack, using special instructions which cause the CPU to advance the stack pointer and store the contents of a register on the stack, in a single instruction. This is not how I'm going to do it. Like other RISC architectures, my CPU will treat the stack pointer the same as any other pointer: the program, prior to making a function call, will reserve stack space for the parameters by modifying the stack pointer, then store the parameters at the memory locations just reserved, and, once the call has returned, modify the stack pointer so as to remove the parameters from the stack.

Now, the three candidate stack models for me to choose between:

Integrated Stack
The first method is used by the x86 architecture. In it, a special (hardware defined) stack pointer register exists, which the program can modify as necessary (either by treating it as a pointer, or using the PUSH/POP instructions), and write to the stack. In addition to function parameters, the CPU uses the stack to hold the return addresses in function calls: the CALL instruction pushing the return address onto the stack, and the RET instruction popping it into the instruction pointer. The model here is a single, integrated stack, shared by both hardware (the CPU) and software (the program).

This method is elegant in it's simplicity: there is only a single mechanism to cover both bases. However, there are a few negatives to this method. On a philosophical level, this method completely dissolves the line between hardware and software; both the CPU and the program access the stack, mostly independently, but this still requires the software to know what the CPU is doing, and work around it. The practical side of this is that, as the software has full access to the stack, it's possible for the software to break the hardware (that is, by altering the stack pointer so as to cause an access violation when the CPU jumps to the return address), or to interfere with it (that is, the return address can be altered by the software, either intentionally - a function modifying the return address - or unintentionally - a buffer overflow exploit). Just the same, it's possible for the hardware to break the software (try forgetting about the 4/8 bytes the return address takes on the stack in an assembly program, and see how far your program gets before it blows up!).

Minimalistic Hardware
Several architectures, such as MIPS, do exactly the opposite. The stack is totally owned by the software, and the hardware has no knowledge of the stack. Function calls are handled by the call instruction copying the return address to a register; once this register receives the return address (inside the called function), this address becomes the property of the software (which must preserve this value, if the function makes function calls of its own), until ultimately the function returns by jumping to the return address in the register. The idea here is to have the hardware do the absolute minimum necessary, which amounts to copying the return address and jumping; everything else is done in software, including the jump to the return address.

This is elegant in its own way, in that almost everything is done by the software. The software, rather than hardware, takes responsibility for program flow, allowing (potentially) greater optimization of resource (particularly register and memory) usage. It has the added benefit of resisting (though not being immune to) code injection by buffer overflow, as only places where the return address is saved on the stack (as opposed to moving it to another register) are vulnerable. The downside is that it still allows functions to modify the return address; as well, there is still some mixing of software and hardware (although much less than with the integrated stack).

Software-Hardware Isolation
The final method is used, I believe, by Itanium. In this method, the CPU is again ignorant of the stack (at least, the stack used by the program), and uses a private, system stack (pointed to by a private register) for return addresses. (which may even be stored in on-chip cache, increasing the performance of returns considerably, as it allows the CPU to know ahead of time exactly where the function will be returning to, and prefetch the initial instructions).

This has the obvious advantage that it's immune to alterations (either intentional or unintentional) of the return addresses, as software does not have access to the system stack. The philosopher would also be pleased by the complete isolation of hardware (the call stack) and software. The downside is that it adds additional complexity to the CPU as well as the system, as now the system has two stacks (even though one is implemented in software, it still exists).

No comments: