Excuse me: there is NO right or wrong. Its the dev's choice. Yes, its insane using a dynamic recompiler on a C64 emulator (which I may add, HAS been done before, and done very well), but there should be no "shoulding" people about what can and can't be done.
alright boss.
what i am having a little trouble with, what would the difference be between having the line in normal code format compared to asm? like with your example, would it be faster to store that value in the original variable (V01) or to do it the recompilation way? just seems to be faster the normal way.
well there's a bunch of different factors that make dynarecs faster (note: the factors listed below depend on the dynarec implementation):
1) a great speed benefit comes from not having the overhead of repetitive function calls for interpreted opcodes.
you basically recompile a bunch of the opcodes into blocks, and you're not doing any function calls while executing (or very little function calls).
2) great use of cpu cache, which is a big speedup.
3) you can generate optimized code from constant parameters.
for example, pretend you want to recompile something like:
Code:
// Assume the system you're emulating has this opcode:
addimm reg1, reg2, 0; // reg1 = reg2 + 0
the interpreter function in an emulator for this opcode might look like this:
Code:
void addimm(int reg1, int reg2, u32 immediate) {
cpu.regs[reg1] = cpu.regs[reg2] + immediate;
}
if recompiling the code, you can do optimizations like this:
Code:
// Note: the code below uses pcsx2's emitter syntax.
// different emu's use their own emitters to generate
// x86 code more easily.
void rec_addimm(int reg1, int reg2, u32 immediate) {
if (immediate == 0) {
MOV32MtoR(EAX, &cpu.regs[reg2]); // move memory to general purpose register (gpr)
MOV32RtoM(&cpu.regs[reg1], EAX); // move gpr to memory
}
else {
MOV32MtoR(EAX, &cpu.regs[reg2]); // move memory to gpr
ADD32ItoR(EAX, immediate); // eax = eax + immediate
MOV32RtoM(&cpu.regs[reg1], EAX); // move gpr to memory
}
}
notice that when immediate is equal to zero, you can just move reg2 to reg1, since reg2+0 == reg2.
since you're recompiling this code, you spend more cpu time doing this optimization once, but then you can execute the optimized code as much times as you want.
if you were to do the same thing in the interpreter function like this:
Code:
void addimm(int reg1, int reg2, u32 immediate) {
if (immediate == 0) {
cpu.regs[reg1] = cpu.regs[reg2];
}
else {
cpu.regs[reg1] = cpu.regs[reg2] + immediate;
}
}
the conditional if-statement will run every time you call that interpreter function, and therefor you'll actually loose speed compared to the original interpreter function w/o the conditional (because the compiler would change the latter code into compares + jumps, and it'll be faster to just always do the addition instead...)
4) theres something called reg-allocation/reg-cacheing that you can do with dynarecs, that basically has the idea to "keep the emulated-system's regs in your CPU's registers for as long as possible".
this way you can eliminate a lot of reading/writing from/to memory.
also, a lot of reg-to-reg x86 operations are shorter than reg-to-memory/memory-to-reg, and therefor have less cpu 'cache-clutter' and speed things up.
5) you can reorder opcodes and even perform stuff such as constant propagation and constant folding to optimize and speed things up.
the basic idea of dynarecs is "do expensive work once, then run the generated optimized code many times."
hope that helps.
and the best way to understand dynamic recompilation IMO is to play around with the source of some emu that does it.
i didn't understand the difference between interpreters and dynarecs before i started working on pcsx2.
and now a year later, i wrote my own dynarec
