I can help with some parallax code for HuC (I've written a few libraries for it). HuC does offer hsync, or scroll, support. That might be enough for what you need, though.
If you find that things start slowing down, it's most likely array or pointer access. This is painfully slow on HuC (local variables are as well; use globals when possible). Bit shifting is also pretty slow. But surprisingly, it's not that difficult to overcome with some function/asm/code. I've done stuff like make C fastcall pragma functions, with asm on the target side, that can easily be used to access far data or pointer/array data. So instead of var=array[index], you'd do something like var=array_access(array,index). HuC treats all near data as far data when a pointer is used, so you get no speed increase when using local ram for arrays (which you would normally think you would).
Anyway.. dynamic tiles, hsync scrolling, sprites - that's how you're going to get parallax out of the system.