Modern OS's will allow you some memory management options that will let you alter the address space of a process but how much low level control you have is limited for some good reasons. First of all, the kernel space must be in the address space of all processes (which is why you don't get a full 4GB for user apps) and changing any of this address space would be irreverseably detrimental; it could, for instance, cause the OS to crash or even triple fault the computer when an interrupt occurs because an interrupt will not necessarily change the address space. Furthermore, you can't necessarily control where an app is placed in memory, even if there's a field for it in the executable header. Chances are it'll be placed at 0 onwards, which is a location of prime interest for almost virtual address space conceivable (in other words, the apps you're emulating will probably want it, in addition to the upper space the OS's kernel is taking up). Beyond that, the OS will allocate at least one stack and a heap, both of which can grow. To be able to then specify your own bindings in this already heavily cut up address region may be possible but it's dangerous. You would at the very least have to know which pages are already allocated and avoid taking any near the ends of the stack or heap because if it grows beyond that it could overwrite it or the program could crash. Obviously it's not as if you can control where the emulated app wants to translate to - if you could you wouldn't need to emulate translation in the first place. What's more, the OS can't give you direct control to physical memory because you don't know what processes are already taking which parts of it and you may not even know which ranges are legitimate to begin with.
The very best the OS can give you is a mechanism for having part of the address space point to SOMEWHERE; you may never even know where this place is. Unix has a mechanism for this, it's called mmap. It's used for sharing memory across multiple processes. I'm sure Windows has something like this but I have no idea what it is. Beyond this the best you might get is something even weaker like brk which will extend the address space of the heap by some amount.
The most damning limitation is surely the inability to do with physical RAM as you please. Perhaps if you could ask for the OS to give you a large continuous region of physical memory you could work with that but I doubt any will let you do something like this because with the way these allocations quickly fragment the physical memory space it's almost impossible to guarantee very much more than a single page of physical memory granted being in succession. I suppose you could even work with segmented physical memory if you used a table to convert from the emulated machine's address space to the physical memory you've allocated, but this is still pretty rough, and the OS probably wouldn't let you sit on physical memory no matter how you slice it up (it isn't "fair" to other processes for one to not go through the virtual memory of the OS).
A ring 0 kernel app or device driver on Windows might let you use kernel functions to accomplish what you want, I don't really know. Sadly you're still hit with the problem of memmapped I/O. Unless your target platform does memory mapping in the virtual address space and not the physical address space (it's possible, I suppose), letting you automatically tag certain pages to fault, you can't really do this.
Can you tell us anything else about the MMU of this platform? There might be some tricks to help you do this as quickly as possible. For instance, you can double up a page mapping system to indicate where code resides (so you can quickly check for self modifying code) and memmapped I/O. For instance, if the page table entries are anything like x86 you're going to have several bits in the lower 12bit region of the PTE's that don't correspond to anything. You can use a single bit to determine whether or not it's real memory or not and within your recompiled code test this bit and call a handler if it's true - this jump will be very easily predictable so it shouldn't be terribly expensive. For something like this, let's imagine eax holds the destination address and it's a paging scheme like x86. We'll also say that bit 0 indicates whether it's memmapped or not (that page) and bit 1 indicates whether or not there's code there AND whether or not it's memmapped. The reason to use two bits here is because for reads you only want to check if it's memmapped, and for writes you have to check if it's memmaped or if there's code. You do want to have the rest of the lower bits "clean" if you can help it (all 0's), otherwise you'll need another instruction to clear them. Note that the translated code block will NOT be marked as a code page in the x86 page table. Now, for say, a write.. let's say eax has the target address and ebx has what you want to write:
mov ecx, eax
shl ecx, 12
mov edx, [address_table + ecx]
test edx, 10b
jne okay
call write_exception
okay:
and eax, 0FFFh
mov [eax + edx], ebx
It looks pretty awful (yay x86 ASM), but it's "only" 6 instructions over what you would have done otherwise AND it accounts for memmapped I/O and self-modifying code. In those there's also only one additional memory read, hopefully to a well cached location. There's a jump, but like I said, branch prediction should almost always catch it because it's almost never going to change within the same block. Also, you can save a bit more if your location has been const cached. Note that you do have to maintain a 4MB single layer page table for this, but that should be fine right?
- Exo
P.S. Crusoe is that in theory, but in practice they only actually emulate x86 for some reason. But the fact that they can emulate x86 with a machine language that doesn't seem to resemble it very much is impressive. I do think there ARE some "helper instructions" though, especially more likely in the later iterations. Crusoe isn't that fast but it's certainly a lot faster than a software emulator would be... then again, it tends to be much better to emulate x86 on a platform that's in any way RISC-like than the other way around...