Sometimes, you might spend time optimizing useless crap, so it's best to optimize the most often-called things first.
I agree. Throwing ASM at everything, isn't necessarily going to net you much.
It's not just speed, in making your code faster, but in the context that if you have faster methods - you might be able to user more advance concepts. Better design, etc.
Uhm.. let me see if I can think of an example. Animation is one area. Not the animation cells them selves, but how you interface with them. You could just use fixed code branches that handles most objects, but as processing resource increased as game and arcade systems advanced, the use of multiple layers of indirect of data types allows for more flexibilty in design. Even some NES games use this, to some degree. It allows you to make changes to the game design, without modifying a lot, or any, code. It also might allow for more advance or precise definitions related to timing and response of objects.
The data structure might look something like this:
Frame-> has attributes: meta cell patterns, X/Y offsets, duration of frame until next, etc and the
length of this set (because the data elements could be variable in number/size).
The frame would be part of a larger definition, such as walk, run, jump, attack, etc. So through a series of table of pointers and indirection, data/definitions are accessed and objects are built. Simply pointing to a different data structure builds different objects. If I were to do this entirely in HuC, it would be pretty slow. I could circumvent it by building sets of code to directly handle specific objects, but then you lose flexibility in design and changes, etc. Object behavior/reaction/AI/logic can be handled in the very same way, with layered data structures that eventually points to code as well.
This is how I've done my stuff. Even in ASM, I trade a little bit of performance for flexibility. I'll cheat in areas where some objects are simplistic in design, so I'll use simpler code paths, but ultimately I want to use external tools to create complex objects (enemies, events, whatever).
The problem inherent in HuC, or at least 3.21, is that pointer use is fairly slow. The other problem is that language structure itself is expecting some instances or forms recursion (whether you make a recursive function or not), so there is a lot of internal (software) stack manipulation. The two together, really bog down the processor in its current implementation. Learning a little but of ASM, and how pointers work, can save a lot of performance itself - but it can also allow for more complex design like I (tried to) demonstrated above.
tl;dr - Speed isn't just speed. It's also translates into other things. You might say, "my game runs fast enough". And that would be fine. So maybe that speed translates into something else; more complex or capable and flexible game and code design?