Well, I'm actually getting pretty excited about the x86-64 architecture, now that I'm learning about it. The reason I'm just now learning about this is really that I've never really needed to know, before. For quite some time the x86-64 was only available on high end server systems, which I would never have, and it was not expected that my programs (mostly related to games) would be used on such a system. But now, x86-64 is available on the desktop, and Windows is ready to support it.
So, let me give a brief overview of how x86-64 differs from x86-32 (legacy x86), and why I'm excited about it.
x86-64 is a 64-bit incarnation of the x86 architecture, supporting the same instruction set and features (most of the features, anyway). In practical terms, that means that it can do math on 64-bit numbers in 64-bit registers with single instructions. This means that, on average, code to work with 64-bit (or larger) numbers will be half as large, and take half as many cycles to execute (as 32-bit processors would have to do multiple operations to emulate 64-bit math).
Associated with 64-bit registers is the ability to use 64-bit pointers. 64-bit pointers, of course, mean an exponentially greater amount of memory that may be accessed by the processor. But even more importantly, you don't need more than 4 gigs of memory (the maximum that may be addressed by 32-bit pointers) to benefit from this. Most OSes these days use virtual address spaces, where the memory pointed to by a pointer doesn't need to be at that same address in physical memory. Some of the other things used for the address space besides physical memory (and empty space) are memory mapped ports and files. A larger address space, even with the same amount of physical memory, can allow larger regions of (or even more) files or device memory to be made available at once. Now, instead of continually mapping and unmapping small pieces of a 500 gig database, you can map the whole thing at once (theoretical example only; I'm not a database coder, and I have no idea if you would actually do this in a high-performance, commercial database system).
Yet despite this, the x86-64 architecture remains backward compatible. In addition to supporting the legacy CPU modes (real mode, protected mode, and virtual 8086 mode), which run either 16-bit or 32-bit programs and OS, it also supports two new modes: long compatibility mode and long 64-bit mode. Long 64-bit mode is exactly what you'd expect a 64-bit processor to do: run programs and OS natively in 64-bit mode, with 64-bit registers, pointers, stack, etc. But this requires that both the OS and program be compiled for x86-64.
Long compatibility mode is what's really cool. It's a hybrid of the legacy modes and long mode, allowing a 64-bit OS, which takes full advantage of the 64-bit x86-64 features, to natively run old 16-bit and 32-bit programs. This means that x86-64 supports the best of both worlds: you can have a new, 64-bit OS, but still run old programs on it, natively.
Okay, so that's progress. But there are two other reasons I'm excited about x86-64, that don't have anything to do with it the fact that it's 64-bit. In addition to supporting the 8 64-bit integer registers (RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP), the 8 64-bit floating point/MMX registers (MMX0-7/FPR0-7), and 8 128-bit XMM registers (XMM0-7), x86-64 supports 16 totally new registers. 8 of these are new integer registers, named R8-15 (kinda reminiscent of the PowerPC registers), and the other 8 are new XMM registers (XMM8-15). To me this is highly significant, as the limited number of registers (especially the integer registers) was one of the key limitations of the x86 architecture (so I believe). Having twice as many registers to work with means that code can do much more complicated math without needing to access memory to fetch/store variables.
Another significant limitation of the x86 architecture, I believe, was the inability to access memory relative to the current instruction. Branches and calls are generally relative to the current instruction, but memory accesses are always absolute (that is, they go to the same memory location regardless of the location of the current instruction).
If I had to make an educated guess as to why the x86 architecture did not initially support relative data access, I'd say it's due to the fact that originally x86 programs were segmented. You had a 64 KB code segment, a 64 KB data segment, a 64 KB stack segment, and a few other optional segments (don't ask what happens when you need more than 64 KB for your code or data, you don't want to know), which could be anywhere in memory, and were pointed to by segment registers. Branches and calls were made relative not only to the current instruction, but also to the base offset of the code segment. Memory accesses were made relative to the base offset of the data segment.
But things aren't done that way, anymore. Segments are rarely used in modern OSes, replaced mainly by virtual address spaces and memory mapped files. In Windows, an executable (or library) is mapped as a single piece (although there can be holes). All the addresses in that executable are fixed, relative to the base address (unlike with segments, where you could have each segment be anywhere in physical memory). Yet the base address of an executable is not fixed; it can be loaded almost anywhere in the virtual address space. So you now have lots of pointers that point to global variables at fixed addresses, when the variables themselves could be anywhere in memory. This requires the OS to correct all these pointers when it loads the executable into memory, which takes time. This problem is elegantly remedied by the use of pointers relative to the current instruction, as the relative offset would never change, no matter where the executable gets loaded.