It's going to take several layers....
first I'd remove all CR LFs ... since they don't render in HTML anyways, no sense in storing them.
Then I'd replace all <br> with <br />
Next I'd loop, replacing double breaks <br /><br /> with single breaks <br /> until there are no more double breaks
Next I'd look for <br /></p> and replace them with </p>
Next would be replacing <p>&nbsp;</p> with empty string
And lastly, replacing <p><br /></p> also with empty strings...
unfortunately that won't prevent them from doing this: <p><br /><i>&nbsp;</i><br /></p> or some other junk...

-tg