I could’nt stop thinking about optimisation

On my last post I was complaining on how slow the development process was low because of the IO emulation routine cost. I decided to stop developing it.

But the next morning, I could not stop reflecting on the speed problem, and I found out a method. Replacing PPUADDR and PPUWR call system by an array of routines depending on the address. Array[PPUADDR >> 2] = routine@. This array is big but it fits in WRAM.

It allowed to remove all the PPUADDR incrementing code who changed the IO routine depending on the address. It made a 20 rendering lines gain by removing the PPUADDR write IO emulation from Bank zero where the emulation code is, and replacing it with a short routine set in ram.

I believed that the gain in cycles was not enough because of sound emulation, but it looks like that sound emulation in the SPC700 needs only to be updated once per frame. In Super Mario Bros, it can be done between line 80 and line 240 where the game does nothing. In fact plenty of cycles are available for sound emulation update.

All in all, it IS possible to run Super Mario Brothers on the Snes with automatic conversion.

Upernes, conclusion

I have been developing this software for a total of 1.5 year and it is finished.

I had many problems with the scrolling and I fixed that thanks to the NES community. But then I still had glitches, they came from missing vblank end because the IO emulation took too much time. It went past the Vblank end while the smb code was looking for it. While on average, emulation does not take that much (because it calls only 10 routines), it calls up to 60 IO routines when updating the background. Event with optimising, it caused one last glitch. Maybe there is room for more optimising and removing this last glitch. By optimising a little more (I spent all day on it to remove 2 missing frames of 3), it could work with smb1 but it would leave no room for improvement or stabilisation.

Therefore, I reverted it to a working Ballon Fight and a SMB1 with more glitches. Donkey kong also works.

It looks like a NES game, it feels like a NES game much more than “smb all stars”, but it is not 100% perfect. The Snes PPU is too different and the CPU is not fast enough to handle the IO emulation cycle cost. Upernes uses a ton of tricks to be able to play games like in the picture below, and the console has some design compatibility (the 1rst one being HW cpu emulation) but it does not fit at 100%. It lacks a few details, just a few CPU cycles, but it’s not enough.

Anyway, it works with non scrolling games and Super Mario Bros can be played directly from the conversion.

The project was interesting, very exotic, it went further than what I expected but it is not an aesthetic conversion where everything fits (that was my goal). However it is fast, despite the few missing frames it really feels like the NES. I am not looking forward to squeeze cycle count per IO access. And I leave it like this.

I will just take a look at How to integrate Memblers work but I will probably not add it, given the problems with graphics.