You might want to try out VocalShift at the Low Latency setting, and mirror the more basic settings in VocalTune. There’ll still be some latency (nominally, about the same) with all that’s going on in the background processing.
I don’t find it objectionable with a good balance between dry & wet. You can use that to your advantage for some separation. At 100% wet hard-tune, you might need to fold in some masking tricks.
The tradeoff is that you lose the full-featured compressor, but you do gain many more unique options.