Thanks CornedBee, it was indeed elsewhere in the listing... Being set up for speed the compiler had inlined the functions and it had optimised the C++ version.

Woss, you'll be pleased to know that turning off optimisations (forcing the compiler to make proper function calls) got me these results:
Code:
C++ code time: 782 ms
ASM code time: 765 ms
It was rather hasty and used GetTickCount but you can see the times are about the same, as you'd expect.