Next Generation Emulation banner

Dynamic Recompilation - An Introduction

34806 Views 85 Replies 21 Participants Last post by  Padremayi
In another thread sniff_381 asked me to explain what dynamic recompilation (dynarec for short) is.
Since most NG emus have a dynarec now, I thought it might be a good idea to cover the topic in a separate thread.

I have to admit that I haven't programmed a dynarec myself yet, but I have decent knowledge of the basics and some details. I'm not totally sure how much I should go into depth anyway. I guess I'll see if there is enough interest to talk about details.

First of all, the term "dynamic recompilation" is a bit odd, because "to recompile" often means to compile the source code of a program again, but the older term "binary translation" is more precise, since the binary code of a game or application is translated - not the source code - and "dynamic" only means that it is done during runtime and on demand.

So what's the difference to "traditional" or "interpretive" emulation?
An interpretive emulator always picks the instruction the program counter (PC) points to, decodes it, and executes it, just like a real processor would do it. So every time the emulator comes across it same instruction it has to do all the steps again.
In his article How To Write a Computer Emulator (http://fms.komkon.org/EMUL8/HOWTO.html) Marat Fayzullin uses this pseudo C-code sequence to describe the process:
Code:
Counter=InterruptPeriod;
PC=InitialPC;

for( ;; )
{
  OpCode=Memory[PC++];
  Counter-=Cycles[OpCode];

  switch(OpCode)
  {
    case OpCode1:
    case OpCode2:
    ...
  }

  if(Counter<=0)
  {
    /* Check for interrupts and do other */
    /* cyclic tasks here                 */
    ...
    Counter+=InterruptPeriod;
    if(ExitRequired) break;
  }
}
Dynamic recompilation deviates from this procedure by working with whole blocks of code instead of single instructions, and that those blocks are translated into the machine language of the processor the emulator is running on. There wouldn't be a speed advantage if the translated blocks weren't cached and simply recalled as soon as the program counter enters that block again.
Here is some sample code from my (unfortunately still unfinished) DRFAQ (http://www.dynarec.com/~mike/drfaq.html):
Code:
/* the following line defines a 'function pointer',               */
/* which can be used to call the code generated by the translator */
/* CTX is the context of the processor, ie. the register values   */

int (*dyncode)(Context *CTX);


/* the following simplyfied loop is often called the "dispatcher" */

for( ;; ) {

  /* try to find the current address of the PC in the translation cache */
  address = block_translated(CTX->PC);

  /* nothing found, ie. first translate the code block starting at the PC address */
  if (address == NULL)
    /* do the translation and add it to the translation cache */                                      
    address = translate_block(CTX->PC);

  /* point the function pointer to the address of the translated code */
  dyncode = (int(*)(Context*)) address;

  /* call the translated code with the current context */
  status = (*dyncode)(CTX);

  /* handle interrupts and other events here */                                 
}
That's basically how a dynarec works, only that I still haven't explained how the translation cache and of course the translation are handled.

I spoke of code blocks several times, and it might be a good idea to define the term, since not all will be into compiler theory...
In compilers the smallest block of cohesive instructions is called a basic block. Such a block has a starting point and ends with the next conditional jump or branch, ie. as soon as there is a possibility that the program counter changes apart from pointing to the next instruction the block ends. It's also important that no other code block can jump into the middle of the basic block, only at the starting address, because only that way the compiler can see it as a separate collection of code that can be optimized in every possible way.
Most dynarecs probably work with basic blocks, but some use end the block with the next unconditional jump or branch, which leads to larger blocks, often called translation units. This leads to faster code, because all conditional branches can jump in the translated code without having to go through the dispatcher loop first, but it can be problematic to handle interrupts since you don't have a guarantee that the code returns to the dispatcher. Of course that could be handled within the generated code, but that makes things more complicated.

I think that's enough for an introduction. If there are any questions feel free to ask, and if there is interest in extending the parts I haven't covered yet, I could write something about the translation cache, some translation problems, register allocation, the difference to threaded interpretation, etc.
See less See more
1 - 1 of 86 Posts
If someone want to test on Linux the code from the first post in this thread, use mmap() instead of malloc(), here the working code (Ubuntu 20.04 64 bit) compiled with -m32 (it's fundamental!!!) flag:

C:
// Include the prototypes of the functions we are using...
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>


/* In the beginning we'll have to define the function pointer.
* I called the function 'dyncode' and gave it an int argument
* as well as an int return value just to show what's possible.
*/

int (*dyncode)(int); // prototype for call of dynamic code

/* The following char array is initialized with some binary code
* which takes the first argument from the stack, increases it,
* and returns to the caller.
* Just very simple code for testing purposes...
*/

unsigned char code[] = {0x8B,0x44,0x24,0x04,  // mov eax, [esp+4]
                        0x40,                 // inc eax
                        0xC3                  // ret
                       };



int main(void)
{
    /* To show you that the code can be dynamically generated
     * although I defined static data above, the code is copied
     * into an allocated memory area and the starting address is
     * assigned to the function pointer 'dyncode'.
     * The strange stuff in front of the malloc is just to cast
     * the address to the same format the function pointer is
     * defined with, otherwise you'd get a compiler warning.
     */

    dyncode = mmap(NULL, sizeof(code), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    memcpy(dyncode, code, sizeof(code));

    /* To show that the code works it is called with the argument 41
     * and the return value sould be 42, obviously.
     */

    printf("Return value = %d\n", (*dyncode)(41) ); // call the code and print the return value

    munmap(dyncode, sizeof(code));

    return 0;
}
See less See more
1 - 1 of 86 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top