I noticed that my original version used ZP for the 8:8 scaler. Speeds it up, but I'd rather use ZP for something better suited elsewhere. This version uses full addressing+indexing. To make up for the added cycle, I optimized the high byte of the whole number. Worst case scenario, it's the same cycle count as before (98 cycles). Best cast scenario, it's 88 cycles.
;6280 object
2 ldx #$xx ;2
3 jsr AddVelocity ;7
AddVelocity:
3 lda x_float,x ;5
1 clc ;2
3 adc x_float_inc,x ;5
3 sta x_float,x ;5
3 lda x_whole.l,x ;5
3 adc x_whole_inc,x ;5
3 sta x_whole.l,x ;5
bcc .skip1 ;4/2
inc x_whole.h,x ; /7 = 36/41
.skip1
3 lda y_float,x ;5
3 adc y_float_inc,x ;5
3 sta y_float,x ;5
3 lda y_whole.l,x ;5
3 adc y_whole_inc,x ;5
3 sta y_whole.l,x ;5
bcc .skip2 ;4/2
inc y_whole.h,x ; /7 = 36/41
.skip2
1 rts ;7
PS: I think the troll will go away if we stop feeding him.