|
-
Jun 17th, 2003, 12:40 PM
#1
Thread Starter
Lively Member
Fast Direct Memory Access
Hi,
The following should replace the word "Hello" in memory with "HHHHH".
I am just testing the following code so that I can understand how asm works, but I keep getting an unhandled memory exception when i get the 'mov byte ptr [eax], 048H' line.
I have tried to change it to 'move byte [eax], 048H' but it won't even compile.
PHP Code:
// amtest.cpp : Defines the entry point for the application.
//
#include "stdafx.h"
void test(LPSTR,int);
int APIENTRY WinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
// TODO: Place code here.
CHAR * szWord = "Hello";
test ((LPSTR)szWord, sizeof(szWord));
return 0;
}
void test(LPSTR ptrAddress, int len)
{
_asm
{
pusha //push all registers onto stack
mov eax, [edi + 8] // load eax with ptrAddress
mov ecx, [edi + 12] // load ecx with length of string (loop counter)
loop_start:
mov byte ptr [eax], 048H //replace memory position with Ascii('H')
inc eax //move to next memory location
dec ecx //decrease counter
jnz loop_start //loop
popa //get registers back
}
return;
}
If anyone could help me I would appreciate it. Thanks
-
Jun 18th, 2003, 06:24 AM
#2
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 18th, 2003, 06:09 PM
#3
Fanatic Member
Could you use a LOOPNZ instruction to eliminate the DEC ECX and the JNZ loop? (Just for a little more efficiency... )
-
Jun 19th, 2003, 05:03 AM
#4
I think so, but isn't the instruction simply called LOOP?
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 19th, 2003, 05:21 AM
#5
Thread Starter
Lively Member
I had been lead to believe that using LOOP takes more machine instructions than using DEC and JNZ together, so is more time consuming.
-
Jun 19th, 2003, 05:27 AM
#6
I can hardly believe that. If it were so then LOOP would have been implemented by combining DEC and JNZ
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 19th, 2003, 05:33 AM
#7
Thread Starter
Lively Member
Maybe, but can you know for sure that the is not some other instructions called for additional error control. A single line of Machine code usually mean several machine instructions, some more than others.
Actually, is there any source which tells you exactly what each Machine Code call does inside the processor and how many machine instructions each calls?
-
Jun 19th, 2003, 09:59 AM
#8
Fanatic Member
Oops. My bad. LOOP.
According to NASM readme in a Pentinum:
LOOP: 5 to 6 clocks
DEC: 1 clocks
JNZ: 1 clocks

This is odd. Why would an instruction that does the same thing as two equivalent instruction take more cycles?
If anyone could check the Intel Architecture Software Developer's Manual Volume 2 and get us the numbers from that, we would be most appreciative.
-
Jun 19th, 2003, 01:01 PM
#9
Must be an error in the readme...
Anyway, here's what my reference page says for the 386:
LOOP: 11 + pipe flush
DEC: 2 on (e)ax, 6 on others
JNZ: 7 + pipe flush
So LOOP is 2 instructions faster (since you'll probably use ecx for the counter anyway).
But maybe they optimized DEC and JNZ on later CPUs, but not LOOP (still sounds strange to me).
http://webster.cs.ucr.edu/Page_TechD...386/0_toc.html
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 19th, 2003, 07:51 PM
#10
Fanatic Member
Maybe I am reading this wrong but it looks like that the DEC instruction will take 2 clock cycles on registers and 6 on memory addresses.
-
Jun 20th, 2003, 01:09 AM
#11
Do you think?
Still sounds very improbable to me.
Why don't do this:
Code:
MOV ecx ffffffffh
mark:
LOOP mark
Code:
MOV ecx ffffffffh
mark:
DEC ecx
JNZ mark
Then benchmark them.
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 20th, 2003, 05:17 AM
#12
Thread Starter
Lively Member
A-ha, definitive results:
I tested the following benchmark 3 times each to ensure consistency:
PHP Code:
// benchtest.cpp : Defines the entry point for the application.
// S. Caulfield 20/6/03
#include "stdafx.h"
#include <stdlib.h>
int APIENTRY WinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
// TODO: Place code here.
_SYSTEMTIME st; //Systemtime Structure
int s1; //Miliseconds before
int s2; //Miliseconds after
int dif; // difference in Miliseconds
CHAR op[11]; //CHAR array for results in MessageBox
GetSystemTime(&st);
s1 = ((int)st.wMilliseconds) + ((int)st.wSecond * 1000) + ((int)st.wMinute * 60000);
// Comment out loop which is not being used
// looped 5 times to give best results
_asm
{
pusha
mov eax, 05h // x loop
jump_x:
mov ecx, 0ffffffffh // y loop
jump_y:
dec ecx
jnz jump_y
//loop jump_y
dec eax
jnz jump_x
popa
}
GetSystemTime(&st);
s2 = ((int)st.wMilliseconds) + ((int)st.wSecond * 1000) + ((int)st.wMinute * 60000);
dif = s2 - s1;
_itoa (dif, op, 10);
MessageBox (NULL, op, "Benchmark", MB_OK);
return 0;
}
The results I got are:
LOOP = 32.375 secs, 32.375 secs and 32.390 secs
DEC & JNZ = 21.593 secs, 21.594 secs and 21.594 secs
So it looks like the DEC & JNZ is one third more effecient than LOOP, I rest my case. Would the defence like to cross-examine?
-
Jun 20th, 2003, 06:09 AM
#13
The defence admits defeat and files a case against Intel for idiocy.
BTW, what CPU do you have?
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 20th, 2003, 06:17 AM
#14
Thread Starter
Lively Member
I have to say that I am surprised by these results.
the dec & jnz are excecuted about 20 billion times, which takes just over 21.5 seconds, which is just under 1 billion excecutions per second. I have a 2 GHz processor which must mean that the dec & jnz each take only one clock cycle.
What surprises me is that the Windows management and multitasking does not seem to eat up that many clock cycles, this process seemed to be using nearly all of the processor time. I thought that the constant switching between tasks and all the other stuff goimg on in the background would be using a lot of the processor time and make the apps go quite slow in relation to the clock speed.
I am now convinced of the benifit of being able to have intimate control of program excecution using asm.
Where did you get your source on how many clock cycles each instruction takes, if I could get an up to date accurate list of how processor intensive each of the instructions are, I could make lightning fast routines by being selective about what instructions I use.
-
Jun 20th, 2003, 06:18 AM
#15
Thread Starter
Lively Member
I have an Athlon XP 2400+ which runs at about 2GHz
-
Jun 20th, 2003, 09:00 AM
#16
Given your calculations (I didn't think about it much at first) it seems to me that the result is simply impossible.
a) While windowing probably takes no time at all if you don't do other things, thread management should take a little. Very little indeed, as multitasking is very efficient, but a little.
b) No jump takes 1 clock cycle. The jump instruction takes more than one cycle even if it doesn't jump.
c) Jumping flushes the queue! An Athlon has a ~8 stage pipeline, this means that every jump instruction makes the next instruction take AT LEAST 8 cycles.
d) You have code before and after the test itself. The time calculations are little, yet they do take a little time.
The problem is that we don't know what really happens inside modern chips. They have jump prediction mechanisms and similar things and we'll never know what they do exactly.
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jun 20th, 2003, 10:08 AM
#17
Fanatic Member
My source came from the NASM opcode help. (Its worth it for the manual itself. Heck, its free. )
It makes sense that its so fast because you are using registers and that the instructions are in different pipelines (at least according to the help file.)
Now, I believe that the AMD processors can process simple instructions very quickly so that could be a factor.
-
Jun 20th, 2003, 10:10 AM
#18
Thread Starter
Lively Member
If you are using an Intel chip, I would suggest trying the above code to see if you get a similar outcome.
-
Jun 22nd, 2003, 05:39 AM
#19
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
-
Jul 30th, 2003, 07:20 PM
#20
Fanatic Member
I have access to a Pentium. I'll try it.
Sorry about being so late with this.
"Can't" and "shouldn't" are two totally separate things.
All questions should be answered. All answers should be true. That is why I post.
-
Aug 1st, 2003, 04:34 PM
#21
Fanatic Member
Here are the numbers that I were in the message box using a Pentium:
60928 for DEC / JNZ
246645 for LOOP
DEC / JNZ beats LOOP!
"Can't" and "shouldn't" are two totally separate things.
All questions should be answered. All answers should be true. That is why I post.
-
Aug 4th, 2003, 07:05 AM
#22
Must be the RISC architecture of the newer CPUs.
All the buzzt
 CornedBee
"Writing specifications is like writing a novel. Writing code is like writing poetry."
- Anonymous, published by Raymond Chen
Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|