This is one my area of specialties. I've traced through hucard and CD games, and have documented (and even wrote compressors/decompressors) for such compression schemes. I've also done my own compression stuff for hucard and CD projects (homebrew, I guess you could call it). So I can answer a few questions on it from that experience
First, almost all hucard compression schemes are simple. You are right in that system ram (8k) plays a big part in this. The reason for this is that decompression in real time, even for the simplest schemes, adds over head. Thus you run into cpu resource problems (slowdown, etc). The Genesis(64k) and SNES (128k) system ram is very nice for solving this issue. It works be decompressing the needed frames into a large cache/buffer for instant access through out a 'level' or area, etc. So on the hucard you don't have this luxury per se. Almost all hucards employ the "decompress the tiles and sprites into vram at the start of the level" sort of tactic. But they don't update much of anything (besides the tilemap and possible palette ram) doing the stage/area/etc. This has the draw back of less animation or unique BG detail for that area/stage/etc.
Popular schemes for hucard are RLE, RLZ, and bit plane compression. RLE varies in small differences (usually WORD RLE for sprites and BYTE RLE for tiles). RLE isn't terribly effect, but it can be better than nothing most of the times. RLZ is like RLE in way, but instead of repeating the last byte or supplying a byte+repeat length, you have a bit mask and a global var for that chunk of data (however the string of data is setup). The bit mask is usually 1bit checks, one value means fetch a new byte and the opposite value means write the global value. It's popular that a hucard uses both RLE and RLZ, with a header in the string of data to decide the decoding type. And it's also popular that the global value is fixed (usually 0x00 or 0x0000).
The other popular compression scheme (and used quite a bit by early hucard titles) is bit plane compression. All graphics on the PCE take of 4bit planes to make the 0-15 indirect colored pixel. If you only use 0-7 colors for that cell or block (tile or sprite cell), then you only need to store 3 bit planes. That's an instant 25% savings. And even better if you only need 0-3 colors (2bit planes for 50%) or 0-1 colors (1 bit plane for 75% savings). The upside that there really is no decompression needed. If you double buffer in vram (say for sprites), you don't even need system ram to buffer to; you just write to vram and the decompression algo selected (you choose: 3bit write, 2bite write, 1bit write) will automatically write the needed 0x00's or 0x0000's you need for padding. Hell, if you know the destination in vram is always going to contain padded cells of that same format - then you can skip padding for some bytes (because of how the planes are stored).
RLE is great for tilemap compression (along with pointer/reference compression). It's not so great for tiles and sprites if you have a good amount of color usage and detail. This is LZ compression schemes come in. They give you a much better savings but they often require much more cpu resource to decompress. They also require a build out buffer, which can chew through that 8k system ram for hucards pretty easily. Later CD games tend to use LZSS compression (even for text, but all kinds of other data too). Many Genesis carts employed a variant of LZ compression early on while hucards didn't. This meant for game on both systems and of the same size, the Genesis equivalent had more decompressed data size than the turbo equiv. LZ is a variable size/output compression scheme, so you have to compress and see what kind of savings you get.
Like I said, LZSS is pretty popular for later SCD games. Very few earlier SCD games use it. The earliest game (IIRC) that uses it is Gate of Thunder. The programming team for GoT were pretty damn skilled. The game amazingly decompressions sprites of enemies in real time as the game plays (over a series of frames) without any hint of slowdown. I traced through GoT and wrote a decompressor for the format (it's standard fare LZSS). Lots of stuff in the game uses it. Cinema graphics, title screen, all enemy sprites, tiles, etc. The only thing that's not compressed is the main ship and its weapons.
But there's a problem with planar graphics (and composite planar format of the tiles) and LZSS compression; it doesn't compress as good a packed pixel format. If you leave your graphics in packed pixel format and compressed with LZxx, you get much better compression savings. You just need some more buffer room and cpu resource to convert the packed pixel format back into pce planar format. Gate of Thunder *also* does this (which also makes it impressive).
A number of years back, I found a really nice compression scheme called Pucrunch. Its original target was for the C64 and thus written in 6502 assembly. I converted the decompression source code into 6280 for PCE setup. Pucrunch is really nice, it employs more than just LZSS. With LZSS and other LZ variants, you get free RLE compression. That is to say, you don't need a special control code to tell the decompressor to decompress an RLE stream. It's inherent in the LZ design and circular buffer (reference) system (often called a 'sliding window'. Because 'window' refers to the size of the reference and 'sliding' because the relative point of the window increases as the data is decompressed). So as long as the LZ compression is worth its salt, it should detect and compression RLE strings nicely. But Pucrunch goes sooo much further. Pucrunch adds control codes to the compression scheme manages the size of the elements for via Elias Gamma encoding. It makes for a very nice compression package (even rivaling ZIP and other PC compression packages of today).
The downside is that it's slow. The upside is that you can manually define the circular buffer size (sliding window). The upside to that is, you can use a circular buffer to decompress to a port instead of a linear memory. Normally, LZ is decompressed into (linear) memory because of how the reference window works. But if *only* store the reference window/buffer in memory, but keep the data destination to a port - then you can make it useful for systems that only have a small amount of ram to work with. The key is that reference window is really nothing more than a circular buffer. That is to say, it wraps around (it's usually forward reference only) on the end of the buffer. So it doesn't need to 'slide' at all when decompression to a port; say like the VDC. Matter of fact, since the source of the compressed file is sequential access only - you can decompression from port to port (source being a port, destination being a port).
I also wrote a packed pixel to linear decompression addition to the Pucrunch 6280 decomp lib. It monitors the decompressed bytes written into the circular buffer, then when it hits its target length (128bytes for a sprite cell) then it temporarily jumps out of the compression stage to convert the packed pixel data in the circular buffer back into pce planar format, and write to vram, then jumps back to the decomp lib. Completely invisible to the user. The only thing is; is that you need to separate sprite data from tile data. Otherwise it wouldn't know what planar format of the PCE to convert too (because sprite planar and tile planar are *not* the same format).
There are some other compression apps out there for 65x code base. Aplib and Packfire. Mic converted the 65 source to 6280 WLX source and I converted his source to PCEAS. I used both for a small demo that I wrote, but ended up using Packfire because I was limited to 8k rom space and the depack lib was the smallest - so total lib+packed file was smallest using Packfire.
So, that's my 2 cents