Okay, I didn't realize you were talking about the VU recompilers. This conversation seems very familiar now, I feel like we talked about this before. Do the VUs have interlocking? If they don't then the pipeline emulation is probably important for far more than just timing. Do branches take the same amount of time whether they're taken or not? That wouldn't be surprising if their delays were 100% eaten by delay slots. Lack of interlocks and a lot of delay slots, starting to make me think of TI's C6x DSPs...
the pipeline emulation was the most important part indeed.
and branches take a constant time whether they're taken or not.
most vu instructions were interlocked, but some weren't; this complicated things, but worked well with my idea to save pipeline state info with cached blocks.
the only problem was xgkick which can stall depending on external dma transfers from other parts of pcsx2.
this was the only instruction i couldn't handle accurately because i couldn't determine the instructions stalling/latency at recompile time.
our dma system is a complete hack anyways, so even if this instruction was coded exactly how it behaives on the ps2, it wouldn't work correctly with the rest of pcsx2.
More predictable, still with some components.. anyway, this was all just an aside. None of it solves the switch granularity I was talking about.
the granularity problem can be solved in multiple ways; obviously without attempting the idea i'm not going to know the best way to handle it.
at this point though, i'm leaning towards just sticking to an interpretter, since it may just be more entertaining to go for a simple and accurate interpretter.
Wastes program space though. Yeah, I know, doesn't really matter, but still. I'm going to guess that the emitted code takes way more space than its compiled equivalent would..
well you allocate the amount of memory for the emitted code. usually we just give it some buffer room and fill it with 0xcc (INT3).
if you're really picky you can allocate the buffer to exactly the same space the emitted code takes up; but its really dumb to be that picky IMO unless your target platform has very limited ram.
but since you need to have the 'emitting function' AND the 'emitted code', it does take up a bit more memory than the compiled equivilant.
It's because you're using MASM and GAS respectively, right? Is there a technical reason why you didn't want to use an unrelated assembler which is compatible with both of them?
well i didn't code the external asm code.
some older member decided to do it that way; and its nicer to just switch the code to the emitter which we use for all our recs, so we're all familiar with...
Please give me an example to indicate the emitter code being harder to read. I'm envisioning that things look like this:
mov eax, [ecx + 10]
VS
emit_x86_mov32_memory_reg_imm32(X86_REG_EAX, X86_REG_ECX, 10);
And I just don't see an argument for the latter being more readable. I suppose you can use function overloading and typing enums (in C++ do instances of enums take their type, instead of just taking int like in C? Otherwise you'd need constructors instead), and you can use operator overloading, but you still don't have the freedom of syntax to reproduce the conciseness an assembler provides.
the emitter we have in pcsx2 does what you mentioned and takes advantage of operator overloading.
the above example would look like this:
xMOV(eax, ptr[ecx + 10]);
or to show you what i meant earlier, you can essentially do something like:
Code:
for (int i = 0; i < 10; i++) {
xADD(eax, ptr[ecx + i*4]);
}
taking advantage of the highlevel loop, instead of writing the ADD 10 times...
now i think thats pretty cool
its also nice writing emitter code in highlevel functions, and then calling them in another highlevel function in specific orders to generate unique x86 machine code.
with asm any function calls like that would at least add a few JMPs/CALLs into the mix.
with macros its possible to do the same thing i guess; but my experience with vc++ inline asm and macros is pretty bad.
does it even support macro'd inline asm functions? i just remember i couldnt' get some stuff to compile so i switched to using the emitter.
Maybe some people find less conciseness to be more legible in cases like these, but it's very subjective. I feel that people working on a group involved open source project, especially if they're not a dominating developer over everyone else, should be very careful about restructuring code based on subjectivity. Of course it's undeniably/objectively better to not have redundant source files but is this really a solution with good universal appeal?
well if you're going to be doing anything lowlevel in pcsx2 emulation-wise, you're going to have to learn the emitter anyways.
its essentially saying "you just have to learn the emitter syntax, instead of getting familiar with multiple assemblers syntax + the emitter syntax."
i personally have worked more with emitters than assembler syntax (i got into lowlevel coding when i started working on pcsx2 using the emitter), and so i just like using it so much more.
I'll concede that it does allow for some nice things using procedurally generated code, but I don't think this really comes up much in such a way that the same thing can't be done with macros in an assembler. I personally use CPP (C preprocessor) with assembler, but if I were doing stuff x86 only then I'd stick with YASM and its macro capability, which is pretty competent.
the benefit of using something we already need to have in pcsx2 (for the dynarec) makes it appealing and i think a good idea.
any other method is of course just as subjective.
i'm not saying you have to use emitters in your projects or the likes...
i'm just saying thats what we've decided to do with pcsx2; and since Jake, pseudonym, and I are the ones that have been doing the most lowlevel coding in pcsx2 for the past 1+ years, we're at a point where we can decide the way we want the project to conform to regarding asm.