Results 1 to 22 of 22

Thread: Fast Direct Memory Access

  1. #1

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88

    Fast Direct Memory Access

    Hi,

    The following should replace the word "Hello" in memory with "HHHHH".

    I am just testing the following code so that I can understand how asm works, but I keep getting an unhandled memory exception when i get the 'mov byte ptr [eax], 048H' line.

    I have tried to change it to 'move byte [eax], 048H' but it won't even compile.



    PHP Code:
    // amtest.cpp : Defines the entry point for the application.
    //

    #include "stdafx.h"

    void test(LPSTR,int);

    int APIENTRY WinMain(HINSTANCE hInstance,
                         
    HINSTANCE hPrevInstance,
                         
    LPSTR     lpCmdLine,
                         
    int       nCmdShow)
    {
         
    // TODO: Place code here.

        
    CHAR szWord "Hello";

        
    test ((LPSTR)szWordsizeof(szWord));

        return 
    0;
    }


    void test(LPSTR ptrAddressint len)
    {
        
        
    _asm
        
    {
            
    pusha    //push all registers onto stack
            
    mov eax, [edi 8// load eax with ptrAddress
            
    mov ecx, [edi 12// load ecx with length of string (loop counter)
    loop_start:
            
    mov byte ptr [eax], 048//replace memory position with Ascii('H')
            
    inc eax //move to next memory location
            
    dec ecx //decrease counter
            
    jnz loop_start //loop
            
    popa //get registers back
        
    }

        return;


    If anyone could help me I would appreciate it. Thanks

  2. #2
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    See C++ forum.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  3. #3
    Fanatic Member
    Join Date
    Jan 2003
    Posts
    1,004
    Could you use a LOOPNZ instruction to eliminate the DEC ECX and the JNZ loop? (Just for a little more efficiency... )

  4. #4
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    I think so, but isn't the instruction simply called LOOP?
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  5. #5

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88
    I had been lead to believe that using LOOP takes more machine instructions than using DEC and JNZ together, so is more time consuming.

  6. #6
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    I can hardly believe that. If it were so then LOOP would have been implemented by combining DEC and JNZ
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  7. #7

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88
    Maybe, but can you know for sure that the is not some other instructions called for additional error control. A single line of Machine code usually mean several machine instructions, some more than others.

    Actually, is there any source which tells you exactly what each Machine Code call does inside the processor and how many machine instructions each calls?

  8. #8
    Fanatic Member
    Join Date
    Jan 2003
    Posts
    1,004
    Oops. My bad. LOOP.

    According to NASM readme in a Pentinum:
    LOOP: 5 to 6 clocks
    DEC: 1 clocks
    JNZ: 1 clocks


    This is odd. Why would an instruction that does the same thing as two equivalent instruction take more cycles?

    If anyone could check the Intel Architecture Software Developer's Manual Volume 2 and get us the numbers from that, we would be most appreciative.

  9. #9
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Must be an error in the readme...

    Anyway, here's what my reference page says for the 386:
    LOOP: 11 + pipe flush
    DEC: 2 on (e)ax, 6 on others
    JNZ: 7 + pipe flush

    So LOOP is 2 instructions faster (since you'll probably use ecx for the counter anyway).

    But maybe they optimized DEC and JNZ on later CPUs, but not LOOP (still sounds strange to me).

    http://webster.cs.ucr.edu/Page_TechD...386/0_toc.html
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  10. #10
    Fanatic Member
    Join Date
    Jan 2003
    Posts
    1,004
    Maybe I am reading this wrong but it looks like that the DEC instruction will take 2 clock cycles on registers and 6 on memory addresses.

  11. #11
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Do you think?
    Still sounds very improbable to me.


    Why don't do this:
    Code:
    MOV ecx ffffffffh
    mark:
    LOOP mark
    Code:
    MOV ecx ffffffffh
    mark:
    DEC ecx
    JNZ mark
    Then benchmark them.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  12. #12

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88
    A-ha, definitive results:

    I tested the following benchmark 3 times each to ensure consistency:

    PHP Code:
    // benchtest.cpp : Defines the entry point for the application.
    // S. Caulfield 20/6/03

    #include "stdafx.h"
    #include <stdlib.h>

    int APIENTRY WinMain(HINSTANCE hInstance,
                         
    HINSTANCE hPrevInstance,
                         
    LPSTR     lpCmdLine,
                         
    int       nCmdShow)
    {
         
    // TODO: Place code here.

        
    _SYSTEMTIME st//Systemtime Structure
        
    int s1//Miliseconds before
        
    int s2//Miliseconds after
        
    int dif// difference in Miliseconds
        
    CHAR op[11]; //CHAR array for results in MessageBox

        
    GetSystemTime(&st);

        
    s1 = ((int)st.wMilliseconds) + ((int)st.wSecond 1000) + ((int)st.wMinute 60000);

        
    // Comment out loop which is not being used
        // looped 5 times to give best results

        
    _asm
        
    {
            
    pusha
            mov eax
    05h // x loop
    jump_x:
            
    mov ecx0ffffffffh // y loop
    jump_y:

            
    dec ecx
            jnz jump_y

            
    //loop jump_y

            
    dec eax
            jnz jump_x
            popa
        
    }

        
    GetSystemTime(&st);

        
    s2 = ((int)st.wMilliseconds) + ((int)st.wSecond 1000) + ((int)st.wMinute 60000);

        
    dif s2 s1;

        
    _itoa (difop10);

        
    MessageBox (NULLop"Benchmark"MB_OK);

        return 
    0;

    The results I got are:

    LOOP = 32.375 secs, 32.375 secs and 32.390 secs

    DEC & JNZ = 21.593 secs, 21.594 secs and 21.594 secs

    So it looks like the DEC & JNZ is one third more effecient than LOOP, I rest my case. Would the defence like to cross-examine?

  13. #13
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    The defence admits defeat and files a case against Intel for idiocy.


    BTW, what CPU do you have?
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  14. #14

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88
    I have to say that I am surprised by these results.

    the dec & jnz are excecuted about 20 billion times, which takes just over 21.5 seconds, which is just under 1 billion excecutions per second. I have a 2 GHz processor which must mean that the dec & jnz each take only one clock cycle.

    What surprises me is that the Windows management and multitasking does not seem to eat up that many clock cycles, this process seemed to be using nearly all of the processor time. I thought that the constant switching between tasks and all the other stuff goimg on in the background would be using a lot of the processor time and make the apps go quite slow in relation to the clock speed.

    I am now convinced of the benifit of being able to have intimate control of program excecution using asm.

    Where did you get your source on how many clock cycles each instruction takes, if I could get an up to date accurate list of how processor intensive each of the instructions are, I could make lightning fast routines by being selective about what instructions I use.

  15. #15

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88
    I have an Athlon XP 2400+ which runs at about 2GHz

  16. #16
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Given your calculations (I didn't think about it much at first) it seems to me that the result is simply impossible.

    a) While windowing probably takes no time at all if you don't do other things, thread management should take a little. Very little indeed, as multitasking is very efficient, but a little.
    b) No jump takes 1 clock cycle. The jump instruction takes more than one cycle even if it doesn't jump.
    c) Jumping flushes the queue! An Athlon has a ~8 stage pipeline, this means that every jump instruction makes the next instruction take AT LEAST 8 cycles.
    d) You have code before and after the test itself. The time calculations are little, yet they do take a little time.

    The problem is that we don't know what really happens inside modern chips. They have jump prediction mechanisms and similar things and we'll never know what they do exactly.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  17. #17
    Fanatic Member
    Join Date
    Jan 2003
    Posts
    1,004
    My source came from the NASM opcode help. (Its worth it for the manual itself. Heck, its free. )

    It makes sense that its so fast because you are using registers and that the instructions are in different pipelines (at least according to the help file.)

    Now, I believe that the AMD processors can process simple instructions very quickly so that could be a factor.

  18. #18

    Thread Starter
    Lively Member
    Join Date
    May 2002
    Location
    London, England
    Posts
    88
    If you are using an Intel chip, I would suggest trying the above code to see if you get a similar outcome.

  19. #19
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    mine is amd too.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  20. #20
    Fanatic Member
    Join Date
    Jan 2003
    Posts
    1,004
    I have access to a Pentium. I'll try it.

    Sorry about being so late with this.
    "Can't" and "shouldn't" are two totally separate things.

    All questions should be answered. All answers should be true. That is why I post.

  21. #21
    Fanatic Member
    Join Date
    Jan 2003
    Posts
    1,004
    Here are the numbers that I were in the message box using a Pentium:

    60928 for DEC / JNZ
    246645 for LOOP

    DEC / JNZ beats LOOP!
    "Can't" and "shouldn't" are two totally separate things.

    All questions should be answered. All answers should be true. That is why I post.

  22. #22
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Must be the RISC architecture of the newer CPUs.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width