Because you know a sprite is behind a tile by it's Y value, you don't need to sort the tile masks. The sprites must be in correct order to prevent things as further sprite being printed on top of another that is closer. You can have tile masks for each tile row in a mask buffer, so if you have vertically 32 tiles, then you'd have 32 mask buffers for tiles (or two more, if you support full screen scrolling).

Yup... pretty complex stuff and I hope I haven't written garbage. I've often thought about how this can be done, but I haven't written my own RPG engine; don't have the time to code games.