More C0D4G3

Monday, July 31, 2006

Part 1 done

The entire IllexBoyAdvance core has been rewritten, so now it outputs code while it executes it. Actually, this is pretty old news, and the guys at GP32X have known it for a while. I'm just lazy when it comes to updating this blog. :P

Part 2, while requiring much less code than Part 1, will be much harder as it's not repetitive nor simple:
Picture a bank of memory. It contains all the memory that can contain executable code. This includes the BIOS, RAM, ROM, and whatever else.

A normal static-recompiler will output one code-block for each address it reads from on run-time.
Now, we have a very small, but fast RAM, and a very large but slow ROM. How much do you want to bet that people are swapping their code into RAM as needed to speed things up? This creates a little problem: Each address can contain many different instructions in the course of execution.
Self-modifying code is the Achilies Heel of static recompilers. So, does that mean I got myself into a dead end?
No. But that is why Part 2 is so much harder than Part 1.
I see three paths to overcoming this obstacle:
1) Ignore it.
Maybe I'm wrong and nobody executes stuff on RAM. Maybe only a few games do. I have to do some tests and find out if I actually have a problem in my hands or not.

2) Binary Tree style.
Remember that bank of memory I mentioned earlier? For each executable instruction, I'd have a pointer to a Bintree, where previously executed instructions are the keys. Whenever there's a Jump/Call, the emulator looks for the appropriate tree in a big array, uses the instruction to find out if it has been decompiled before or not, and then:
If it has NOT been decompiled previously:
Compile it, add the code offset to the tree.

If it has been decompiled: Compare each instruction of the code currently being decompiled, and the version that was done previously. If a jump is reached and all opcodes are the same, then there is nothing to do. If not all the opcodes are the same, store the offset of the first different opcode, output a new disassembly, without overwriting the previous one.

When executing code that has already been recompiled, each time there's a jump, look in the array for the corresponding tree, get an offset, compare it with the the opcode it's supposed to execute. If it's good, jump to the pre-compiled code. If not, resort to the interpretor.

3) Dynarec Style.
For each code block, there's a translated code cache. When an instruction overwrites memory that has been recompiled, the cache block is invalidated (thrown away). Of course, there's no chance of doing that here. Instead, a new cache would have to be made, and some sort of hash would differenciate the caches.

Both methods 2 and 3 have a heavy performance penalty on the emulator, but it remains faster than a Dynarec. As can be seen, neither are trivial to implement, so I have to give this a whole lot of thought before I start coding. If anybody has any suggestions (even if it's to say, "Your blog entry made no sense. Please refrain from blogging at 1AM") I'd really like to hear them.
That's what I created the A7Board for. http://tkf15h.phpnet.us

Thursday, July 13, 2006

More work on IBA

IllexBoyAdvance has seen a good amount of development lately.
Currently around line 3589 of 7190, which puts me at around 50% done.
Yesterday I spent time doing a revision, making sure I wasn't leaving too many mistakes about. And I caught quite a few bugs, mostly writing the output code to a string and then forgetting to dump it afterwards. :-P
BTW, I made a temporary webpage for this project: illexboyadvance.sourceforge.net

I am feeling rather confident that this rewrite I'm doing is capable of getting fullspeed GBA emulation on the GP2X. These are a few reasons why:
1) Each instruction states which registers it needs, and for the emulator to access them, a series of ANDs/Shifts must be done. While Shifts and Ands are not costly operations on an ARM processor, doing them often still hurts performance. The VBA C core does this various times per instruction when it could calculate it once and store the value in a variable. Illex does all these calculations when outputting code, so once compiled it will not have to be done again.

2) GCC optimizing the output code is sure to be much faster than the current core.

3) There are various instructions where certain registers are treated as special cases. These checks are being eliminated from output code, unless they really need to be done.

4) VBA has routinary tasks, things it does for each opcode, that don't need to be done all that often. Keeping R15 updated, for example. Going from one opcode to the next. The usual slow-downs associated with interpretors. All that's going away.

Then there's a bunch of little things that could be changed:
5) Useless turnary Op in VBA core:
Z_FLAG = (res == 0) ? true : false;
Assuming GCC doesn't optimize this fully, the turnary operator uses a branch, just like an "if()" would. Therefore, it's a better idea to simply do:
Z_FLAG = (res == 0);
A single branch isn't a big deal, I'm just giving an example of the things you'd find in VBA.

6) Whatever unnecessary operations (N/Z/C flag calculation, for example) do end up in the output source are going to be optimized away by GCC.

7) Various other things can be done, such as using the hardware blitter, eliminating clockcycle calculation if we're desperate, bribing the processor into thinking it's an AMD64, the GP2X MMU can be used to emulate the GBA's, outputting inline ASM so as to make use of the native N/Z/C/V flags rather than calculating them manually, and finally beating it with a bat untill it can't take any more and does my bidding!

Tuesday, July 11, 2006

34

Well, to be more exact, 34.682241691. That's the percentage of the ARM CPU core I've re-written so far.
In the process, I've seen some atrocities that make me wonder how come this thing even runs on the PC. Due to this, I'm starting to think the GP2X will be perfectly capable of running GBA games, and maybe even have space for a turbo option. I'd have to see how much emulating the rest of the hardware costs, but I think it's safe to say the 200Mhz ARM will be able to run GBA games faster than a real GBA (because Pokemon is unbearable at the native speed). Let's wait and see.

As for my computer, I bought the video card but I couldn't get my hands on the mother board. The store had sold out. Bah. Hopefully later durin this week another one will show up.

Saturday, July 08, 2006

>_<

Yay, quite a bit of good stuff to write here...

Since my last post, I've been working a whole lot on FishMotor.
And it is beautifull!
It currently loads 3DS files (non-animated), textures them (almost any image format you can think of) and displays them with lighting. Most of it through a plugin system that is very easy to work with.
A Linux version is being made side-by-side, and currently is about 100fps slower than the windows version. This is probably due to horrid open-source drivers for my old video card.

My old video card mentioned above won't be a problem for me much longer though:
On Monday I'll be buying an ATI x1300, which while not top-of-the-line, is a big upgrade from my ole 7500.
Actually, saying it's a big upgrade is an understatement.
To go with the fancy new card, I got myself an Athlon64 3500+. Yup, I can finally get back to work on that Dreamcast emulator project, Minerva, except...

Talking about emulators, I'm modding VisualBoyAdvance by making it spit out C++ code from whatever it executes. This code will then be compiled and linked to the emulator. Result? Hopefully full-speed GBA emulation on a GP2X by means of static-recompilation. It's a great educational experience and whatever I learn here will benefit Minerva. The VBA port is preferencial, as it is much less ambitious than Minerva. Being simpler, a GBA emu will pave the road for a DC one.