Hi all, i have this in mind for a while ..
How 6280 can compete with Md 68k in performance
Of course in case of video game consoles, not in general use(like a computer) .
It seems that in bloc transfert ,the 68k is more capable (with more code of course) ..
The 68k (the original 68000) processor is pretty slow. That is to say, the instruction cycle times are fairly lengthy. But it was the original; the very first model (1980 IIRC). From an hardware engineering perspective, it's a 16bit cpu (16bit ALU and hardware macro instructions, also microcoded). But a software engineering perspective, it's definitely a 32bit cpu. You have 32bit wide registers, you have opcodes read/write/modify 32bit wide data at a time, etc. The instruction cycle times are slow, but they're powerful at the same time. So it balances that out in general. Of all the old processors, the 68k reminds me a lot of the z80. Slower instruction times but a better ISA to compensate, hardware macro instructions (handling larger data element than the ALU size), doesn't hog the bus, etc.
Anyway, the 68k is soooo easy so code for. I'd probably recommend it for a beginner cpu for anyone looking to learn Assembly language. It's almost retarded proof. The performance difference between unoptimized assembly code and optimized code, isn't that great - IMO. Not compared to processors based on the 65x or even just 8bit in general. That's a bonus. Of course larger registers, number of registers, and flat memory model make it perfect for compilers. Though I think you mean how they compare in just optimized performance, so I'm going to assuming this question is for Assembly and not C or other such languages.
The 6280 is obviously based on and a branch of the 65xx processor. It's 2 revisions ahead and one branch off(6502->65c02->R65c02s \.. Huc6280). It's got some decent refinements. The 6280 (and 65x in general), shares some common design philosophies with RISC. Simple but fast instructions and hard wired VS microcoded. To do anything, you need to write quite a bit of code - compared to the 68k. Although that has more to do with date element sizes between the two. There's a up side and a downside to this. The up side is that you have a number of different approaches to optimizing your code because of the simpler instruction set. The downside is for the very same reason, optimization isn't so clear cut. Sure, you can do some basic rookie optimization, but I'm referring to the really crazing stuff (up to and including self modifying code). And there in lies another problem; convoluted code. Sure, it's fast but it's not very readable definitely very unflexible (complete re-write/re-design for small changes).
Anyway, for all the advantages and luxuries the 68k has over the 6280 (flat memory model, larger register size, more data reigsters) - that means little when your optimization came down to small localized segments of code (usually less that 1k). Things that are looped hundreds or thousands at a time. That is where optimization matters and the 6280 can keep up with the 68k and even surpass it, if you're willing to write some crazy ass code. That said, I think in general, everyday, normal type of coding - then the 68k is going to have an edge almost every time (i.e. Japanese developers really didn't optimize their code for speed. At least not from what I've seen personally, stepping through debuggers and not from what I've heard of other hackers doing NES stuffs). Great game design
does not equate to efficient or optimized written code. Konami made some great games, but wrote some piss poor code.
I've written quite a few code between the 68k and the 6280. I'm not sure I have much of it anymore, because it was really for my own curiosity. But here's a small segment that has seemed to survived:
http://pastebin.com/f76e312e0 (top is 68k, middle is 6280, bottom is 65816)
I had a discussion between Steve Snake and I, and a few others, about the 68k and 65816. The talk was about what processor could do what, better. The 6280 got an example or two, from a side discussion. In the end, the discussion only proved that really small examples out of context, really do nothing to prove or disprove the abilities of these processors over each other (the architectures are just so different). But in one example, I did prove that if you had put a 68k in the PCE - rather effects would take more cpu resource. The 68k has a very large cycle latency for interrupts (the base is 44 cycle + time to finish the instruction it's in the middle of) and RTI (hell, even RTS is slow). The 6280 is 7 cycle for the interrupt call (and must finish out the instruction, but 6280 instruction times are much smaller than 68k's) and RTI is faster too. And given the way the VDC in the pce works, you don't need to write a 16bit word to the VDC registers; they are buffer without a latch mechanism so you can update either lsb/msb at anytime. I used this on the 6280 code side to write some pretty fast code for a full screen/scanline hsync routine for both processors. 6280 came ahead with a pretty healthy lead. On the Genesis/MD, you don't use the hsync interrupt to do raster effects. There are tables in vram that you update during vblank. I can only assume because of this reason (VDP x and y must have a small timing window, since you can do column scrolling).
A few pet peeves with the 68k; some instruction times are long. JSR and RTS both are much longer than they need to be. Indexing on the 68k isn't free like on the 6380/65x/'816. That means small fast LUTs aren't really worth it (for non linear table access). If I'm doing sequential memory access on the 68k, I just manually add the based address with the offset and do self increment or such. It's faster than indexing. Memory alignment for word and long word elements (most emulators don't care, but the deal machine will generate an internal interrupt to handle the exception). No speed benefit for using 8bit/byte size elements. I'd mention slow instruction cycle times, but the instructions themselves are usually pretty powerful, so that mostly balances out. For a processor that is register based, 7 address vectors/registers is not enough (one is reserved for the stack). I've even find myself running out of Data registers too, in some routines. Pushing and popping from the stack isn't a big deal, but for some reason I always hated having to do that (even way back when I started with x86 asm). A set of 16 and 16 would have been better.
A few pet peeves with the 6280; wasted instructions. Clx, Cly, Cla. They save a byte, but are the exact same cycle length as simply ldx #$00, ldy #$00, lda #$00. Those instructions logic space could have been put to better use. No 16bit memory increment (would be nice for pointers, since ALL the address vectors are in ZP instead of internal register memory). No quick add; an inc by 2 and inc by 4 would be nice. And finally, at least 1 indirect long access port (self incrementing/decrementing). Not that you couldn't fix this by having it on cart or such (arcade card has four of these, at least for AC memory), but it would have been nice to have one in hardware. A few other small ones like inc A:X or such. SAX and SXY are almost useless because of the 3 cycle length. 2 cycle would made them more useful.
Other than that, I really have no problems with the 6280. When I need clear looking code, I use my custom macros for non speed critical segments. Make the code easier to read and scroll though. I love that fact that all of ZP can be used for address vectors. Having up 128 address vectors is quite nice, although I only tend to reserve about 32 pairs of bytes for addressing. What does it matter if it's in ram or not? It's an instruction like any other address vector instruction. If you don't like numbers, give them register names with equates (I use R0-R7 and A0-A7 most of the time). Hell, I even use D0-D7 for ZP equates.
On a related note I think the 6280 keeps up with the 68k, outside of optimization, because there isn't a real need for data elements larger than 8bit in game code. That is to say, there is a whole lot of bit testing, comparing, and branching that makes up a single frame/time slice (1/60th). The 68k receives no benefit from processing 8bit/byte wide data, but obviously the 6280 does. And so overall I think this is where you the two processors even out. Other things like adding/subbing an 8bit var to a 16bit data set, instead of 16bit<>16bit->16bit, also saves the 6280 some cycle time. The end result might need be 16bit/word based, but you save on cycles for working with an 8bit data against it.