And so, after several days of thinking about it, I've decided how I want to do the stack and return address with my CPU. The theme of my CPU is bare bones simplicity while still being practical for use. As such, I've gone with the simplest possible solution.
At the time I thought that the link register method was the simplest solution, but this turned out to be incorrect. The link register method makes the stack a software structure, but leaves the link register up to the hardware (and the register can be accessed by software, no less). The simplest solution would be to leave both entirely up to software; the reason it wasn't obvious to me at the time was that there's no way to do this on x86-32, PPC, or MIPS.
All that's really needed for software to handle return addresses is a way to GET the return address. This can be accomplished by an instruction which loads an address relative to the current instruction pointer. Such an instruction would not be specific to return addresses, as it would be useful for other applications: it would make it possible to address data relative to code; neither x86-32, PPC, or MIPS can do this, and it would be of substantial benefit given how current operating systems work (x86-64 can, as I praised in my summary a month or two back).