Next Generation Emulation banner
1 - 10 of 10 Posts

·
Registered
Joined
·
774 Posts
Discussion Starter · #1 ·
I'm currently working on programming an emulator of my own, although I'm going to keep exectely what the emulator is for a secret until I at least finish implementing enough of it to get the emulator to show something. Besides, I'm still not entirely sure I'm going to make it a public release. At any rate, I've been thinking about the issue of address translation, and it's causing me a bit of trouble. It seems that there is no fast way to implement it fully in software, but I hear it's possible to do enirely in hardware using the x86's MMU. So far all the work I've done with assembler is relatively basic, but this is an advanced level problem. Can anyone give me an example of how I could implement hardware address translation in x86 assembly? Preferably one with no extra function calls invloved once the intitial translation has occured.
 

·
Registered
Joined
·
6 Posts
Can you describe your problem in detail. What kind of adress you what to translate, maybe some example for problem, like i have it and i want this.

I'm currently working on programming an emulator of my own, although I'm going to keep exectely what the emulator is for a secret ...
Please give some kind of hint, i'm very interested.
 

·
Registered
Joined
·
774 Posts
Discussion Starter · #3 ·
Well, basicly what I had in mind was for my opcode translations to keep their original address, or at least an offset one to put it in a memory region my program doesn't use. Then I execute the code with that address and have the x86's MMU translate it to where the memory is really located. I know it's not possible to do without a bit of trickyness, due to the fact a user mode program can only address 2GB of memory, while my target CPU to emulate can address a full 4GB of memory in privelaged mode (and that's the only clue you get :p). I think maybe if I did some address table swapping though, if that's even possible, then it could be done. I'm just not sure what opcodes I need to use to interact with the MMU, and how to avoid having it mess up address translation in the rest of my code which is written mostly in C (so God only knows where it ends up in memory).
 

·
Emu author
Joined
·
1,488 Posts
How much do you know about how x86's MMU works? You should know that pages are 4kb, whereas chances are whatever you're emulating has pages that are either variable size or larger than that. At any rate, the OS isn't going to let you modify a process's page table directly, so unless you plan on doing this independantly of an operating system it's not even worth worrying about how to accomplish it. I do believe the 2GB figure you cited only applies to Windows; Linux should have 3GB for user apps but you still might not get all of that to do as you wish with. I doubt any of this means that much to you though...

I'm going to assume you're using Windows. What you would want is an OS call that would let you bind a specific part of the process's address space to somewhere in physical memory - exactly where may not be important, as long as you could allow other areas to be bound there too. Unfortunately this isn't possible (as far as I'm aware). What's more, you would also want the to be alerted when memory accesses are made to areas outside of the emulated machine's physical RAM but in other important areas of its address space (mem-mapped I/O). This would be impossible without low level access to the exception handler of the operating system, and what's more, the granularity of page accesses might not be good enough for this. So you'd get stuck checking all memory accesses for address region anyway, and what's more, you'd most likely have to do it after translation which is impossible if you're letting the hardware translate for you. At this point you're going to be doing enough work as it is that if constructed cleverly you can throw in the address translation for little or no additional cost.

This is something I've considered before, but the bottom line is that it just isn't going to happen. The bare minimum would be to do this independantly of an OS, but even that simply isn't enough unless the platform you're emulating is in some way heavily restricted (and a 4GB address space doesn't indicate this).

- Exo
 

·
Registered
Joined
·
774 Posts
Discussion Starter · #5 ·
Exophase said:
How much do you know about how x86's MMU works? You should know that pages are 4kb, whereas chances are whatever you're emulating has pages that are either variable size or larger than that.
I'm aware of both those things, however I don't really see how the excact page boundries being different sizes would cause many issues.

Exophase said:
At any rate, the OS isn't going to let you modify a process's page table directly, so unless you plan on doing this independantly of an operating system it's not even worth worrying about how to accomplish it. I do believe the 2GB figure you cited only applies to Windows; Linux should have 3GB for user apps but you still might not get all of that to do as you wish with. I doubt any of this means that much to you though...
I'm writing my emu for Windows and I haven't made portability a big concern, so how much memory linux gives my doesn't mean a thing to me. I really wonder why Windows couldn't let you modify your own processes page table though. As long as it can't interfere with other programs, it doesn't seem like much of a problem.

Exophase said:
I'm going to assume you're using Windows. What you would want is an OS call that would let you bind a specific part of the process's address space to somewhere in physical memory - exactly where may not be important, as long as you could allow other areas to be bound there too. Unfortunately this isn't possible (as far as I'm aware). What's more, you would also want the to be alerted when memory accesses are made to areas outside of the emulated machine's physical RAM but in other important areas of its address space (mem-mapped I/O). This would be impossible without low level access to the exception handler of the operating system, and what's more, the granularity of page accesses might not be good enough for this. So you'd get stuck checking all memory accesses for address region anyway, and what's more, you'd most likely have to do it after translation which is impossible if you're letting the hardware translate for you. At this point you're going to be doing enough work as it is that if constructed cleverly you can throw in the address translation for little or no additional cost.

This is something I've considered before, but the bottom line is that it just isn't going to happen. The bare minimum would be to do this independantly of an OS, but even that simply isn't enough unless the platform you're emulating is in some way heavily restricted (and a 4GB address space doesn't indicate this).

- Exo
Ok, I see what you mean. Well, it was just a thought. You know, I think what they really need to do is design a CPU with emulation in mind. It's really not such a far-fetched idea considering that emulation is the best (cheapest) way to do backwards compatibility. Hardware optimizations for emulation would make the whole thing so much easier.
 

·
Registered
Joined
·
1,577 Posts
__Xzyx987X said:
I'm writing my emu for Windows and I haven't made portability a big concern, so how much memory linux gives my doesn't mean a thing to me. I really wonder why Windows couldn't let you modify your own processes page table though. As long as it can't interfere with other programs, it doesn't seem like much of a problem.
The point is that 2GB is an operating system imposed limit. Also, the operating system would get in way at any rate. Meaning that it is impossible, which does seem like a big problem to me.
 

·
Registered
Joined
·
774 Posts
Discussion Starter · #8 ·
I meant that it wouldn't be a problem for windows to allow it, not that it isn't one that it doesn't allow it.

Exophase said:
It's called Transmeta Crusoe. :)

- Exo
Damn, I'm never the first one to think of anything :p.
 

·
Emu author
Joined
·
1,488 Posts
Modern OS's will allow you some memory management options that will let you alter the address space of a process but how much low level control you have is limited for some good reasons. First of all, the kernel space must be in the address space of all processes (which is why you don't get a full 4GB for user apps) and changing any of this address space would be irreverseably detrimental; it could, for instance, cause the OS to crash or even triple fault the computer when an interrupt occurs because an interrupt will not necessarily change the address space. Furthermore, you can't necessarily control where an app is placed in memory, even if there's a field for it in the executable header. Chances are it'll be placed at 0 onwards, which is a location of prime interest for almost virtual address space conceivable (in other words, the apps you're emulating will probably want it, in addition to the upper space the OS's kernel is taking up). Beyond that, the OS will allocate at least one stack and a heap, both of which can grow. To be able to then specify your own bindings in this already heavily cut up address region may be possible but it's dangerous. You would at the very least have to know which pages are already allocated and avoid taking any near the ends of the stack or heap because if it grows beyond that it could overwrite it or the program could crash. Obviously it's not as if you can control where the emulated app wants to translate to - if you could you wouldn't need to emulate translation in the first place. What's more, the OS can't give you direct control to physical memory because you don't know what processes are already taking which parts of it and you may not even know which ranges are legitimate to begin with.

The very best the OS can give you is a mechanism for having part of the address space point to SOMEWHERE; you may never even know where this place is. Unix has a mechanism for this, it's called mmap. It's used for sharing memory across multiple processes. I'm sure Windows has something like this but I have no idea what it is. Beyond this the best you might get is something even weaker like brk which will extend the address space of the heap by some amount.

The most damning limitation is surely the inability to do with physical RAM as you please. Perhaps if you could ask for the OS to give you a large continuous region of physical memory you could work with that but I doubt any will let you do something like this because with the way these allocations quickly fragment the physical memory space it's almost impossible to guarantee very much more than a single page of physical memory granted being in succession. I suppose you could even work with segmented physical memory if you used a table to convert from the emulated machine's address space to the physical memory you've allocated, but this is still pretty rough, and the OS probably wouldn't let you sit on physical memory no matter how you slice it up (it isn't "fair" to other processes for one to not go through the virtual memory of the OS).

A ring 0 kernel app or device driver on Windows might let you use kernel functions to accomplish what you want, I don't really know. Sadly you're still hit with the problem of memmapped I/O. Unless your target platform does memory mapping in the virtual address space and not the physical address space (it's possible, I suppose), letting you automatically tag certain pages to fault, you can't really do this.

Can you tell us anything else about the MMU of this platform? There might be some tricks to help you do this as quickly as possible. For instance, you can double up a page mapping system to indicate where code resides (so you can quickly check for self modifying code) and memmapped I/O. For instance, if the page table entries are anything like x86 you're going to have several bits in the lower 12bit region of the PTE's that don't correspond to anything. You can use a single bit to determine whether or not it's real memory or not and within your recompiled code test this bit and call a handler if it's true - this jump will be very easily predictable so it shouldn't be terribly expensive. For something like this, let's imagine eax holds the destination address and it's a paging scheme like x86. We'll also say that bit 0 indicates whether it's memmapped or not (that page) and bit 1 indicates whether or not there's code there AND whether or not it's memmapped. The reason to use two bits here is because for reads you only want to check if it's memmapped, and for writes you have to check if it's memmaped or if there's code. You do want to have the rest of the lower bits "clean" if you can help it (all 0's), otherwise you'll need another instruction to clear them. Note that the translated code block will NOT be marked as a code page in the x86 page table. Now, for say, a write.. let's say eax has the target address and ebx has what you want to write:

mov ecx, eax
shl ecx, 12
mov edx, [address_table + ecx]
test edx, 10b
jne okay
call write_exception

okay:
and eax, 0FFFh
mov [eax + edx], ebx

It looks pretty awful (yay x86 ASM), but it's "only" 6 instructions over what you would have done otherwise AND it accounts for memmapped I/O and self-modifying code. In those there's also only one additional memory read, hopefully to a well cached location. There's a jump, but like I said, branch prediction should almost always catch it because it's almost never going to change within the same block. Also, you can save a bit more if your location has been const cached. Note that you do have to maintain a 4MB single layer page table for this, but that should be fine right?

- Exo

P.S. Crusoe is that in theory, but in practice they only actually emulate x86 for some reason. But the fact that they can emulate x86 with a machine language that doesn't seem to resemble it very much is impressive. I do think there ARE some "helper instructions" though, especially more likely in the later iterations. Crusoe isn't that fast but it's certainly a lot faster than a software emulator would be... then again, it tends to be much better to emulate x86 on a platform that's in any way RISC-like than the other way around...
 
1 - 10 of 10 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top