-
Jul 14th, 2024, 03:53 PM
#1
Thread Starter
Frenzied Member
Which is the faster way to ZeroExtend 2 bytes to 4?
<<Note that the code shown in this post is using the NASM syntax (first operand is destination).>>
I can either zero the destination 4-byte register and then copy in the two bytes like this:
Code:
xor eax,eax
mov eax,[mem_ptr]
or I can do it in one operation like this:
Code:
movzx eax,[mem_ptr]
While the MOVZX opcode takes only one line of assembly code to do 2 things, my question is if it's actually faster for the CPU to execute. Often times in programming, code that takes more lines to write seems to be faster to execute. It all depends on the number of clock ticks that are needed to execute the MOVZX instruction compared to the number of clock ticks that are needed to execute the 2 instrunctions XOR and then MOV (the sum of the clock ticks for each of these 2 added together).
-
Jul 15th, 2024, 03:39 AM
#2
Re: Which is the faster way to ZeroExtend 2 bytes to 4?
Intel provides detailed info re the various instructions that include clock cycles used. See:
https://www.intel.com/content/www/us...intel-sdm.html (vol 2)
All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
Jul 15th, 2024, 03:54 AM
#3
Re: Which is the faster way to ZeroExtend 2 bytes to 4?
Originally Posted by Ben321
It all depends on the number of clock ticks that are needed to execute the MOVZX instruction compared to the number of clock ticks that are needed to execute the 2 instrunctions XOR and then MOV (the sum of the clock ticks for each of these 2 added together).
In your sample XOR and MOV are probably pipelined i.e. executed at the same time so effectively the second instruction takes 0 ticks to execute.
There is a reason you'll often see several (for instance 4) XOR + MOV combos stacked together so these are executed together but on multiple pipelines simultaneously.
The funny thing is that the question in OP is hard to even test properly. You don't execute these instructions on their own and usually these get pipelined i.e. get a lot of XOR + MOVs for free around more expensive (other) instructions.
cheers,
</wqw>
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|