Alright I got a function which acts like memset except instead of filling the variable passed the single bytes it works in QUADS -- 4bytes at a time.

//dest = &variable
//data = 0
//count = sizeof(variable) / 4;
inline void MemSet_QUAD(void* dest, unsigned int data, int count)
{
_asm
{
mov edi, dest;
mov ecx, count;
mov eax, data;
rep stosd;
}
}


I was wondering if anyone could help me create a function which MemCpy_QUAD.
Copy 4 bytes at a time -- assuming that regular memcpy only does 1 byte at a time.