Next Generation Emulation banner
1 - 20 of 40 Posts

·
Read Only
Joined
·
5,614 Posts
Discussion Starter · #1 ·
I was thinking of writing a JIT for my Chip8 emu[overkill, yes, but it'd be a good lesson in how to do it, at least to me..]. Does anyone have any insight more specifically on what a JIT is, as well as how one would be written? Emulation info seems to be scarce. Let's try to make more info public for aspiring authors like me! :)
 

·
Level 9998
Joined
·
10,591 Posts
I was thinking of writing a JIT for my Chip8 emu[overkill, yes, but it'd be a good lesson in how to do it, at least to me..]. Does anyone have any insight more specifically on what a JIT is, as well as how one would be written? Emulation info seems to be scarce. Let's try to make more info public for aspiring authors like me! :)
Just-in-time compilation - Wikipedia, the free encyclopedia

Basically... a high-level JIT takes a block of opcodes (instead of just one opcode) then if it matches your JIT table, you just execute the entire block as if it was a single opcode. Of course codes at this time would be different to when it's only a single opcode. It takes a lot of time to reverse-engineer applications and to decide which patterns of codes appear the most so you can optimize your JIT compiler more... thus, this is not recommended in emulators if performance is not exactly an issue for your emulator. :(

And you are better off doing this in ASM if you absolutely require the performance boost because I doubt you can achieve the same thing on high-level stuffs... (since you have to rely on the compiler to generate machine codes in this case)

And like mudlord said, it's much too overkill for Chip-8 unless you're writing an interpreter for a calculator... but even then, a calculator is still fast enough.

But you may still try to rearrange your opcode table to gain some performance boosts. Basically... bring the opcodes that you think will appear the most to the top... and the ones that appear the least to the bottom. That ought to do.
 

·
Level 9998
Joined
·
10,591 Posts
oO Then what do emus like PCSX2 and Dolphin use? I thought they used JIT..
I can't tell if I don't take a look at the source code, and I'm literally blind if it's something someone else wrote. :p

But yeah... Dolphin! Duh... how could I overlook it. I guess Dolphin does have a JIT engine. However, JIT tends to be... buggy because if you overlook some kind of condition while executing an entire block of code then... that's asking for more troubles. Anyway, if you'd like to implement a JIT engine, I think it's time to do a lot of hex editing.

P.S.: to be honest, I have a JIT engine for Chip-8 as well, but it's buggy as hell...
 

·
Premium Member
Joined
·
8,437 Posts
@Dax, is dynarec what you really wanted to learn? If so, even that's overkill for a small emulation project. I recommend saving dynarec until you attempt a console with higher CPU clocks or where speed is more important (i.e. PSX, Saturn, N64, DC, iPhone, PS2, or worse yet... Xbox!). Zenogias wrote a beginner tutorial on dynarec, but now it's only on web.archive.org but you can search this forum for it as it's already been posted.
 

·
You're already dead...
Joined
·
10,293 Posts
@nosound.97:

i talked about dynamic recompilation in one of my blog posts
Trash Can of Dreams: What is microVU?

towards the middle of that post i explain its benefits versus an interpreter, and how dynamic recompilation basically works.


However, i should note that dynarecs/JIT doesn't work well with older consoles.

the reason being that older consoles such as a NES, rely on very cycle-crucial timings.
If say you're off by ~5 cycles when you update the PPU or other components, then you might get completely different results.

Dynamic recompilation involves recompiling 'blocks' of code until you reach a branch or you force stop it (taking a variable amount of cycles).
but if you do that on something like a NES emu, then when you update your PPU, APU, and any cart mapper stuff, the cycles are going to be in greater intervals, and cause inconsistent and inaccurate emulation.

the solution would be to make blocks really small, and then exit out of the CPU recompilation/execution to update the other parts of the system. however this'll eliminate the speedgain from dynamic recompilation (because there's always an overhead in the recompilation and searching phases, and its only justified if your execution phase runs a lot of code; else, and interpreter would be faster)

on modern systems like the ps2, you can recompile code in big blocks and let them run for 1000's of cycles, and its still fine for most games because they don't rely on such tight cycle syncing (games will also have code in place to sync the processors, because even on the real hardware its impossible to know exactly when another processor will be finished without monitoring its status).

so because of this, we're allowed to do optimizations like dynarecs on modern systems, but on older systems you should stick to interpreters.
 

·
Banned
Joined
·
2,548 Posts
so because of this, we're allowed to do optimizations like dynarecs on modern systems, but on older systems you should stick to interpreters.
Excuse me: there is NO right or wrong. Its the dev's choice. Yes, its insane using a dynamic recompiler on a C64 emulator (which I may add, HAS been done before, and done very well), but there should be no "shoulding" people about what can and can't be done.

Who made you the boss? Are you PCSX2 people going to turn out like MAMEdev because of how popular you people are? You people make me sick.

You make people hold grudges more that way ;) >_>
 

·
Banned
Joined
·
35,081 Posts
Nipping this in the bud now, keep it off the forums guys.
 

·
Emu Author
Joined
·
616 Posts
on the subject of dynarec, i have never seen any "good" sources of info on it, i seen that small tutorial on them but it barely went into any kind of detail so i am wondering if there is any really good sources online that people know of so i could look into it just for the knowledge.
 

·
Premium Member
Joined
·
8,437 Posts
The concept is generally easier to understand when you can see an example. I'll assume your host CPU is x86 (as is most), but if you're using different host (i.e. PPC, MIPS, SPARC, Itanium, etc.) then the "encoding" process will differ. The bottom line is you need to understand the instruction set (on a binary level) of both the target and host CPU. I'll give you an example of how to encode a "code block" and execute it. There are many different ways to do this and you may have better ideas, but I generally had to learn how to write a dynarec core by myself! Here I'll show you how to create a code block, execute it, and call a function from assembly. I normally use 100% pure C, but in this case, I'll use C++ instead. But first, let me give you this scenario to clarify things a bit.

You'd might want to translate the following oddball instruction on some oddball CPU:
-> move V01, #FFFFFFFF
to something more recognizable (in the form of microcode code):
-> mov eax, 0xFFFFFFFF
-> mov ebx, emuRegAddress_V01
-> mov [ebx], eax
What the x86 code does is take the value to be written to the register V01, get the address of the emulated register and write to that address so that the emulated V01 register in your CPU context has that value! That's basically dynarec in a nutshell.

Code:
#include <stdio.h>
#include <malloc.h>


// Basic typedefs
#define uint32	unsigned int
#define sint32	signed int

#define uint08	unsigned char
#define sint08  signed char


// The code block
uint08* code = NULL;

// A function to test function calling in assembly
void __stdcall func( uint32 Eax, uint32 Ecx )
{
	printf( "eax = 0x%8.8X\necx = 0x%8.8X\n\n", Eax, Ecx );
}


void main()
{
	void (__stdcall *f)(uint32,uint32) = &func;		// Function pointer to the actual function
	uint32 i = (uint32) f;	// Get the function's address

	// Allocate 6 bytes for this code block since the code is only 6 bytes for now
	// Feel free to allocate more when needed.
	uint08* code = (uint08*) malloc( sizeof( uint08 ) * 6 );

	// Write the microcode to the code block directly.  See the intel/amd or sandpile.org
	// documentation for more opcodes and encoding methods.
	code[0] = 0xB8, // mov eax, 0xFFFFFFFF
	code[1] = 0xFF, 
	code[2] = 0xFF, 
	code[3] = 0xFF,		
	code[4] = 0xFF;
	code[5] = 0xC3;	// ret

	// Get the address of this code block
	unsigned int* c = (unsigned int*) code;

	// Make sure the that the address and the function pointer point to the same location.
	// If not, then something is wrong...
	printf( "f = 0x%8.8X\ni = 0x%8.8X\n\n", f, i );

	if( (unsigned int) f != i )
		__asm int 3

	// This is one way to execute your code block
        // Good: Portable
        // Bad: Slightly more overhead
	void (*Execute)(void) = (void(*)()) code;

	Execute();

	// This works just as good, but not every compiler supports it (like this anyway)
        // Good: Less overhead
        // Bad: Not portable (in general)
	__asm call c;

	// Now we'll call the test function from inline assemble
	__asm 
	{
		push ecx	// Parameter 2
		push eax	// Parameter 1
		call f		// Make the call
	}

	// Free the code block.
	free(code);
}
You'll have to manage each code block yourself. Create a class or structure that contains all the attributes of the code block. Save the EIP at which it was created so that when a JUMP or call instruction is called, you can search for that code block if it was created. If not, create it first. I'll explain some DOs and DON'Ts about dynarec that I learned from personal experience. Mudlord, feel free make corrections if necisarry.

DOs
  • When writing microcode to the code block, you need to include a return instruction at the end, or else the host CPU will continue to execute non-executable memory and crash.
  • If you need to call a function within your code block, make sure you use CALL rel32. The others will not properly mantain the stack unless you do it manually, which is a pain and much slower!
  • All functions called within microcode MUST have the __stdcall calling convention! Refusing to do so will cause major problems!
  • When you reach a JUMP or CALL instruction, end the code block with a return instruction (0xC3) so that normal execution can resume. Then you can do things such as handle interrupts and exceptions.

DON'Ts
  • Never execute privelaged instructions within your code block, unless you are running in the right privelage level AND you know what you're doing! Doing so can cause crashes and/or ruin the state of your VM (in other words, you could really [email protected]#% some [email protected]#% up)!
  • Any CALL or JUMP instructions must be emulated in software. Never emulate a JUMP, CALL, RET or any instruction that modifies the EIP register directly in your code block. Emulate these in software Only use CALL when you need to call an emulator function that emulates the needed functionality in software (i.e. host memory access, privelaged instruction, emulated devices, etc.)

When I get time, maybe I can write a complete tutorial on this. The code is also attached below. Hope it helps!
 

·
Emu Author
Joined
·
616 Posts
thank you for that, it did clear up a few things, not sure if zilmar's is the one i saw, if you could post a link i would still read it, more i can learn on the topic the better i suppose, especially if his is not the one i read.

so if i understand right, with dynarec you choose what parts you want recompiled and while executing you check if you read in one of those parts than you call your recompiled code block that is saved else where?

what i am having a little trouble with, what would the difference be between having the line in normal code format compared to asm? like with your example, would it be faster to store that value in the original variable (V01) or to do it the recompilation way? just seems to be faster the normal way.

i guess sense i have never done a dynarec before it just seems odd to me even though i sort of understand it.
 

·
Banned
Joined
·
2,548 Posts
1 - 20 of 40 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top