Next Generation Emulation banner

1 - 8 of 8 Posts

·
Registered
Joined
·
423 Posts
Discussion Starter #1
Hi there, for al you lurkers out there, here's another update on the Dxbx status.

Again, I'd like to talk about API detection (although I should probably start calling it symbol-detection, but more on that later).
You might think : didn't part 1 and part 2 cover this already? Well... indeed, but it's a complex subject.
But before I start on that, you might remember we've been struggling for months with what we've been calling "the FS[$20]"-problem. A few weeks ago (as a matter of fact, two days before our christmas release) I finally found what the problem was :

To emulate an Xbox1 enviroment, a structure called the Thead Information Block (TIB) needs to exist in two forms : An Xbox1 form, and a Windows form. The Xbox1 form is used when running Xbox1 code (obviously) and I let you figure out when the windows-version is used ;)
The TIB can be accessed by indexing the 'FS' processor register, so to swap between Xbox1 and windows TIBs, we have to swap the contents of the FS register.
Now, normally we won't call into any windows XP kernel code when running Xbox1 code. But I finally discovered we did do some logging while running with the Xbox1-FS active. Tracing our logging function, I found out that this logging code ultimately calls a few win32 I/O kernel API's (WriteFile for example) - and lo-and-behold : These API's alter the FS register!
Once this damage is done, any following Xbox1 code that accesses the TIB will eventually fail, bringing Dxbx to an early halt.
Once I discovered that, the fix was easy and Dxbx finally continued futher with it's logging of what patches have been hit.

After that we've been working on translating more-and-more of the Cxbx code to Delphi (and I had to do some fixups on bad translations), which brings us now to a point that we're very close to seeing our very first screen-updates. (We can tell, because we compare our logs with those from Cxbx).
By the way : To keep tabs on our translation-progress, we use a special comment-line for every (emulation-related) symbol in our code. I've build a tool to extract this information into an XML file, and Shadow_tj is currently working on a method to upload this progress-report to his web-site. As soon as there's something to see, we'll inform you.

So, that being said, I've recently started with another refactoring of the API detection code.
The thing is, with our current code we don't scan further that the first hit on a function - which is kind-of a problem when that hit is not correct! Also, we didn't use all patten-data available to us (I only used the first node on our radix trie) and we checked only one cross-reference (while on some cases there dozens sometimes), which all resulted in a less-than complete set of API detections.
So I decided to refactor this code (again), and do it better:
- add all available cross-references to our radix trie, and use them during scanning
- scan the whole trie, not just the first child.
- detect not only functions, but also variables!

This last part needs some explanation :
The cross-references I've been talking about are actually symbol-references, which means it's not just about functions, but also about variables! This means, that we can detect global variables and other types of named symbols automatically too! Cxbx contains hard-coded logic to determine a few of those, but I'm convinced that I can come up with a detection algorithm for Dxbx that can not only tell us where all the library-functions are located, but also show us where the global variables are! This would make it one more step easier for us to improve upon the emulation (after we've translated all other Cxbx code correctly ofcourse).

This new detection algorithm is not an easy thing though. I've been pondering it for more than a week already, and no end in sight yet. So you won't see a commit for this in our SVN for a while, but rest assured : I'm hard working on it!


Well, it's become another long read again. I hope you liked it.
Until the next installment - PatrickvL.


PS: I've also posted a FAQ yesterday, in the hope to answer the most common questions. If you have a legitimate question not already on this list, please send them to me and I'll see what I can do.
 

·
Registered
Joined
·
10 Posts
TODO comments said:
1. What does this '@' after the FunctionOffset mean?

2. There's a '@' after the cross-reference offset sometimes
As far as the first one goes, the pattern documentation explains for the "LIST OF PUBLIC NAMES" fields that:
If the offest is negative, it is represented like this:
:-XXXX name
If a name is local, it is represented as
:XXXX@ name
i.e. there is '@' after the offset.
The other TODO you have which I believe is for the "LIST OF REFERENCED NAMES" fields may use this same idea for local symbols. However that is just an educated guess.

I've been researching the IDASGN file structures (using FLAIR 5.2) and I have a basic understanding of it's layout. If you've ever looked at the plb.exe output .ERR file, the binary .SIG file follows a similar layout (just with some extra information in the radix tree nodes). I'll be happy to present you the format (for version 7 of .SIG at least) once I'm finished if you're still curious about the .SIG file (as I saw you gave up that path in Part 1). I've been interested in creating a .PAT exporter for IDA (using an existing database you've marked up) since I've surprisingly never heard of one (and I dont think I'm the only one who would benefit from it)


Also, might I suggest a custom intermediate pattern file? One that would allow specifying some more specific things like which XBE segment a symbol would reside in (and possibly more fine tuning)? While LTCG (looking at you DirectX!) would inline various functions into the .TEXT segment (or possibly another segment the xbox developer may have thought up; possibly like RAD Tool's BINK segments) it would allow for faster matching and minimize potential false negatives (especially when it comes to smaller functions). Then if it can't find it in that segment it can then pass thru all the other segments marked as executable of course.
Another thing you can go even FURTHER on is symbol ordering. When I say that, I'm referring to how the symbols are ordered in the original .OBJ file (archived in a .LIB file when it comes to directx and such; ex. dgcreate.obj in d3d8.lib). You can specify the symbol order (using just an implicit linear layout or some sort of number system, etc) then you can narrow down crap you can't identify that resides in between two symbols (from the same .OBJ) you can identify. The list of things you can deduce and go from goes on. Three cheers for code segments! Anyway, just my two cents :thumb:

Another idea (which you may already have, I'm more C++\C# tuned, I've only skimmed dxbx's source) is to have some sort of preprocessor for XBEs when it comes to statically analyzing them for the require symbols which you need intercepting. This preprocessor could then output a cache file for the XBE with the analysis results along with a unsolved symbols file which the end (advance) user could supply addresses to the symbols which they found (as the static logic one puts in place for this analysis of symbols can't trump a reverse engineer who knows their way around the block). Like say, I've only skimmed the source you have, so I'm not sure if you've already been working on such a system, but IMO, to cut down on time taken during startup and to allow user intervention for unresolved symbols, a emulation cache may be something to look into.
 

·
Registered
Joined
·
423 Posts
Discussion Starter #3
@kornman00 : Thanks for your (lengthy) reply - I like the various suggestions you make.

I haven't coded out the preferred segment scanning yet, although it's definitely on my to-do list. As for using a cache for symbol-locations : Xeon was actually the first to use it (actually, it didn't have an automated function-finder, like Cxbx and Dxbx have). Cxbx uses a cache too, but I haven't implemented it for Dxbx yet.

About Symbol ordering... I don't know - can I really depend on each and every linker ever used, maintaining symbol-order?

As for the .sig format : If you do have some more information on it, please post it!

I ditched them, because it doesn't contain the cross-reference information that's present in the .pat files. Instead, they've used unique tuples of (offset + byte value) to differentiate a function between it's siblings.
 

·
Registered
Joined
·
10 Posts
Well, at least MS's linker is pretty good about maintaining the symbol order (that is, for the symbols that are used from the .OBJ file). I can't exactly vouch for other linkers but I wouldn't think too many developers ventured outside VS for their IDE and compiler tools. It would be nice analysis pass to have implemented though.
The problem is that the linker doesn't really seem to maintain the order that the .OBJ files (which the symbols reside in) take while stored in their respected .LIB archive. The order they follow in the executable is dependent on the game's library and code usage.
 
1 - 8 of 8 Posts
Top