I certainly hope that the quip about the recompiler core being written entirely in assembly is misinformation or a joke. The emit handlers no doubt emit run-time x86 assembly code, but there is no good reason to write the compiler routines in assembly. This will only serve to make the code much more difficult to understand, especially if complex instruction-selection and code-optimizer logic is added. All of my dynamic recompiler related routines are written entirely in C++ with no inline functions or macros, and the compilation time pales in comparison to the amount of time spent executing the native code blocks. In other words, the routines should efficient, but writing them in assembly probably isn't favored over using smarter compilation methods.
In any case, the only non-trivial modification needed to run in 64-bit mode should be to add support for the REX prefix and to add logic to support the appropriate encoding needed when the REX prefix is used to access the additional registers and such. Other than the slightly annoying need to eliminate unsupported instructions from the emit handlers (INC/DEC/etc), the move should not be very painful, provided the compiler framework was designed with flexibility in mind.
Assuming a full-blown dynamic compiler is implemented for all stages involving executable code, my educated guess would be that the performance of the emulator would be dependent mostly on the graphics rendering (e.g., how much can be offloaded to the video card, particularly the pixel shader unit) and the efficiency of the native code blocks. The latter issue also includes the expense of the dispatch loop and how often the emulator must switch from native code execution to interpretation and/or performing book-keeping.
The main benefit of adding 64-bit support is the availability of the eight extra integer registers. The x86 is register poor and many instruction handlers require the use of all the registers (for efficient implementations) such that register allocation for caching purposes is relatively useless. Having an extra eight registers available for the sole purpose of register allocation is undoubtedly beneficial for the purposes of decreasing reliance on the memory subsystem.