Yeah, you can optimize out those eor's. Just to show something as an approach.
Yeah, the PCE architecture is pretty simple and clean. Being able to read and write to vram during active display is pretty nice IMO. It might not have a fast local to vram DMA like the SNES and Genesis, but in a good amount of cases open vram access can balance that out (games like Sapphire with large area animation updates show this off).
On a related note.. (source code layout optimization?)
I used to think the lack of a bigger linear PC address range (local to the cpu) was a design hindrance, but then I realized that all my optimizations were local anyway, and macros for 'far jsr' makes the code structure help lend itself to a more linear like layout (kinda. In the source it looks that way). I typically have a layout of 8k I/O, 16k of ram, 16k of code, 16k of data, and 8k of fixed library.
I have multiple vector banks, with the top 4k with repeated code/data, and the lower 4k with different stuffs - with the lower 4k being usually tables for speeding up code/etc - relative to the subroutine called. The upper 4k always has the code (along with the macro) to do the far calls and far returns, while always having the fixed lib funcs and video/timer interrupt routines, etc. So you get a 16k code+4k fast table mapping, and still have 16k for other 'data'. Or call an 8k code+24k data, etc. Or 8k code + another 8k code, etc. It works out pretty well. I'm usually not concerned with wasting a little bit of fat on code, since code generally takes up a small percentage compared to data.
Do you guys ever map anything in the typical I/O bank area? After working on nes2pce stuff, I've found myself mapping other banks to this area (MPR0). Interrupt routines that need access to the I/O bank can mapped it bank in for that interval. I mean, if a specific subroutine isn't writing/read vram or writing to the sound hardware, why not map something else there? Matter of fact, having done nes2pce stuff - I don't find it odd to map the I/O bank to something like the 4000-5fff or 6000-7fff range either ($6002,$6003, $7403, $6404, etc). It gives you another 8k of address range to work it otherwise (ram, data, code, etc).