-
Jul 1st, 2022, 03:39 PM
#1
Thunk for a CopyMemory replacement?
Calling to a function, any function, means some overhead, some performance penalty.
I am thinking if making a CopyMemory replacement based on an ASM thunk could avoid that.
What do you think?
-
Jul 1st, 2022, 03:43 PM
#2
Re: Thunk for a CopyMemory replacement?
Originally Posted by Eduardo-
Calling to a function, any function, means some overhead, some performance penalty.
I am thinking if making a CopyMemory replacement based on an ASM thunk could avoid that.
What do you think?
If you do, just please document the machine code with ASM op-codes as well as descriptions of what you're doing.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Jul 1st, 2022, 03:45 PM
#3
Re: Thunk for a CopyMemory replacement?
Also, don't forget that we can "bend" an LSET to do memory copies. I'm not sure if that's faster than RtlMoveMemory or not though. It wouldn't have to load any libraries though.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Jul 1st, 2022, 04:15 PM
#4
Re: Thunk for a CopyMemory replacement?
CopyMemory (RtlMoveMemory) is the quite fast function. __vbaCopyBytes is little bit faster. Maybe don't you need to copy data at all?
-
Jul 1st, 2022, 04:18 PM
#5
Re: Thunk for a CopyMemory replacement?
Code:
Declare Function vbaCopyBytes Lib "msvbvm60.dll" Alias "__vbaCopyBytes" (ByVal length As Long, dst As Any, src As Any) As Long
Declare Function vbaCopyBytesZero Lib "msvbvm60.dll" Alias "__vbaCopyBytesZero" (ByVal length As Long, dst As Any, src As Any) As Long
as seen here.
Not sure why we haven't been using that all along.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Jul 1st, 2022, 04:27 PM
#6
Re: Thunk for a CopyMemory replacement?
The stub vb uses to transfer you to copymemory is already small Chuck of asm you can see it in second half of this blog post
http://sandsprite.com/blogs/index.ph...=471&year=2019
Won’t get smaller than that, which leaves you competing with writing copymemory in asm vrs the api version. Haven’t looked at it’s disasm probably small. And how to get execution to it. Callwindowproc trick is out to much overhead, have to replace a class function pointer or overwrite a module function in memory. I don’t imagine the gains will be great and you could trigger more av
-
Jul 1st, 2022, 04:44 PM
#7
Re: Thunk for a CopyMemory replacement?
Originally Posted by The trick
CopyMemory (RtlMoveMemory) is the quite fast function. __vbaCopyBytes is little bit faster. Maybe don't you need to copy data at all?
It is for the memory mapping, and I believe there is no way without copying the data from the map to a local variable.
If we could change the variable pointers, maybe it could be possible. Something that I proposed here (for tB).
Even when VB6 is quite fast, sometimes we want all possible speed. And now we are a bit limited.
BTW: thanks to all that are participating, I've read all the messages.
-
Jul 1st, 2022, 04:51 PM
#8
Re: Thunk for a CopyMemory replacement?
Originally Posted by Eduardo-
It is for the memory mapping, and I believe there is no way without copying the data from the map to a local variable
You could map an array to an arbitrary memory. It avoids copying.
https://github.com/thetrik/VbVst/blo...es/modMain.bas
This module uses g_tSharedData array (UDT) which is mapped to a file-mapping and shared between processes.
-
Jul 1st, 2022, 05:18 PM
#9
Re: Thunk for a CopyMemory replacement?
I still don't understand why people find it whizzy to call code injections thunks. Thunks are an entirely different thing.
-
Jul 1st, 2022, 05:28 PM
#10
Re: Thunk for a CopyMemory replacement?
Originally Posted by The trick
Do you think I could set up, let's say a 1 GB array in this way, and VB still able to work with it?
Or maybe even one larger than 4 GB?
-
Jul 1st, 2022, 05:30 PM
#11
Re: Thunk for a CopyMemory replacement?
Originally Posted by dilettante
I still don't understand why people find it whizzy to call code injections thunks. Thunks are an entirely different thing.
I have no idea what is the academical term, I'm just calling them in the same way that the people that work with the technique call them (so they know what I'm talking about).
-
Jul 1st, 2022, 09:04 PM
#12
Re: Thunk for a CopyMemory replacement?
I actually wrote my own version of CopyMemory when I was first introduced to Trick's VB6 assembler add-in:-
Code:
use32
push ebp
mov ebp, esp
push edi ; Destination address goes in EDI
push esi ; Source address goes in ESI
;-----------------------------
mov edi, [ebp + 8] ;Copy pointer from 1st argument to EDI.
;EDI is the destination
mov esi, [ebp + 12] ;Copy pointer from 2nd argument to ESI.
;ESI is the source
mov ecx, [ebp + 16] ;Get the number of bytes being copied from the
;3rd argument and put it into ECX
rep movsb ;Copy the byte at address ESI to the address as EDI
;and increment both pointers in ESI and EDI.
;The REP prefix would repeatedly
;execute the MOVSB instruction.
;The value of ECX tells REP how many times to execute
;the MOVSB instruction.
;-----------------------------
pop esi
pop edi
mov esp, ebp
pop ebp
ret 12
Nothing exciting here. It's just as fast as RtlMoveMemory. This one uses the x86 string instruction movsb and as far as I know, this is the fastest way possible to copy a block of memory from one address to another. I suspect RtlMoveMemory also uses movsb since my version is just as fast.
You could probably make this assembly version slightly faster by trimming down the stack frame epilogue/prologue. You could avoid using the base pointer and just use the stack pointer directly. However I believe the performance gains would be marginal, not enough to be worth the hassle of not having a base pointer to do offsetting within the stack frame.
Last edited by Niya; Jul 1st, 2022 at 09:07 PM.
-
Jul 1st, 2022, 09:17 PM
#13
Re: Thunk for a CopyMemory replacement?
OK. Since we are already on a close subject I'll take the opportunity to ask another related question: is it possible from a Win32 process to use 64 bits processor instructions?
I'm asking this, because I think maybe there are new instructions that can be used to copy data faster in a x64 machine, even from a x86 process.
(Or maybe -quite possibly- this question makes no sense since I don't know much about processors architecture)
-
Jul 1st, 2022, 09:42 PM
#14
Re: Thunk for a CopyMemory replacement?
Originally Posted by Eduardo-
is it possible from a Win32 process to use 64 bits processor instructions?
Yes. The processor has to be switched to 64 bit mode. Coincidentally the trick showed us how to do just that in your population count thread months ago. However, when I tested it, I discovered that if you are constantly switching between 64 bit and 32 bit mode it saps performance and I mean it really saps it. I can't think of any situation where you could gain anything by switching the CPU between 32 and 64 bits within a process. I'm sure there are some edge cases but in general I'd expect it's better for the process to just stay in one mode.
Also, I do not believe there is anything in the x86 64 bit instruction set that allows you to copy memory faster. If I'm not mistaking MOVSB would also be the fastest way to copy memory in a 64 bit program just as it is in a 32 bit program.
Last edited by Niya; Jul 1st, 2022 at 09:46 PM.
-
Jul 1st, 2022, 10:19 PM
#15
Re: Thunk for a CopyMemory replacement?
-
Jul 2nd, 2022, 03:18 AM
#16
Re: Thunk for a CopyMemory replacement?
Originally Posted by Niya
. . . MOVSB would also be the fastest way to copy memory in a 64 bit program just as it is in a 32 bit program.
Besides MOVSB there are MOSVW (16-bit word transfers) and MOSVD (32-bit dword transfers) but yet faster would be to use MOVDQA i.e. SSE instruction to load 128-bit registers which can be done in parallel as modern processors pipeline execution etc.
If you disassemble RtlMoveMemory (or memcpy) I would bet it uses *aligned* MOSVD and resorts to MOVSB only for head/tail of the buffer to fill up to multiple of 4. At least that's what I found in Turbo C 2.0 libc when I disassembled memcpy 30+ years ago and thought it was a clever optimization. . .
cheers,
</wqw>
-
Jul 2nd, 2022, 01:00 PM
#17
Re: Thunk for a CopyMemory replacement?
Originally Posted by Eduardo-
Do you think I could set up, let's say a 1 GB array in this way, and VB still able to work with it?
Or maybe even one larger than 4 GB?
You can map your big dataset to a "window" without copying.
-
Jul 2nd, 2022, 02:34 PM
#18
Re: Thunk for a CopyMemory replacement?
Originally Posted by The trick
You can map your big dataset to a "window" without copying.
But can you access it as an ordinary variable? (or an like a variable in an array)
-
Jul 2nd, 2022, 03:31 PM
#19
Re: Thunk for a CopyMemory replacement?
Originally Posted by Eduardo-
But can you access it as an ordinary variable? (or an like a variable in an array)
Yes, of course. Please show me your data and i'll show you how to achieve it.
-
Jul 2nd, 2022, 05:15 PM
#20
Re: Thunk for a CopyMemory replacement?
Originally Posted by The trick
Yes, of course. Please show me your data and i'll show you how to achieve it.
I still didn't finish the code, and I'll have to pause it for several days now, but it will be something like this:
Code:
Public Property Get Item(ByVal Index As Long) As String
' code...
' code...
' in the AddressWhereDataIsMapped variable there will be the address returned by MapViewOfFile API + any necessary offset
' in LenOfString is the length of the string that is stored
Item = Space$(LenOfString)
CopyMemory ByVal StrPtr(Item), ByVal AddressWhereDataIsMapped, LenOfString * 2
End Property
It would be in a class module. The idea is to make this Property 'Item' the default, so the values can be accessed very much with just the object variable name and the Index as it is in an array.
The things I have in mind are (some are already coded):
To start with a relatively small FileMap of maybe 10 MB and make a new FileMap every time the current FileMap storage capacity is reached (2x size of the current at first, and 1.1 when 2x is not already possible), copying all the data to the new file that will be bigger. That would work automatically every time it runs out of space. The items that changed values are not reused. New values are stored always at the end.
It will have one MapView for the writing, that will be a window of maybe 2 MB pointing to the end of the map and changing automatically every time it needs to write outside the current window.
Likewise, there will be 10 MapViews for reading, with a smaller windows, maybe just 1 MB, that will be held for the last 10 last needed 1 MB windows that were used.
It will speed up when the retrieval of values are in sequence (or more or less in sequence). For entirely random access, it won't gain speed since the items will be most of the times out of the current 10 windows.
All that must work quite fast for my needs.
And if I can get rid of the overhead of CopyMemory, it would be faster.
That's the idea.
Last edited by Eduardo-; Jul 2nd, 2022 at 05:29 PM.
-
Jul 2nd, 2022, 08:16 PM
#21
Re: Thunk for a CopyMemory replacement?
Originally Posted by wqweto
If you disassemble RtlMoveMemory (or memcpy) I would bet it uses *aligned* MOSVD and resorts to MOVSB only for head/tail of the buffer to fill up to multiple of 4. At least that's what I found in Turbo C 2.0 libc when I disassembled memcpy 30+ years ago and thought it was a clever optimization. . .
Hmmm...I find this surprising. MOSVD has no variant that could copy from memory to memory. One of the operands has to be a register of some kind which means to copy memory using this instruction, you'd have to explicitly copy from memory to a register and then copy the register to the new memory location. I have doubts as to whether this would be faster than REP MOVSB.
-
Jul 3rd, 2022, 01:45 AM
#22
Re: Thunk for a CopyMemory replacement?
REP MOVSB/W/D/Q is avoided in many system libraries since Pentium 1 or 2. Few reasons are - prefetch queue in CPUs, memory cache (different levels), multi-CPU/core/threads systems.
Simple instructions (mov eax, [esi] - mov [edi], eax - sub ecx, 4) are more optimized and faster than movsd.
Edit: It seems our beloved Intel tried to fix that performance problem 10 years ago but still some REP MOVSx are slow.
Last edited by peterst; Jul 3rd, 2022 at 01:54 AM.
-
Jul 3rd, 2022, 02:24 AM
#23
Re: Thunk for a CopyMemory replacement?
Originally Posted by peterst
REP MOVSB/W/D/Q is avoided in many system libraries since Pentium 1 or 2. Few reasons are - prefetch queue in CPUs, memory cache (different levels), multi-CPU/core/threads systems.
Simple instructions (mov eax, [esi] - mov [edi], eax - sub ecx, 4) are more optimized and faster than movsd.
Edit: It seems our beloved Intel tried to fix that performance problem 10 years ago but still some REP MOVSx are slow.
Interesting. I think at some point I will test all of this to see what's really up.
-
Jul 3rd, 2022, 08:23 AM
#24
Re: Thunk for a CopyMemory replacement?
Originally Posted by Niya
Hmmm...I find this surprising. MOSVD has no variant that could copy from memory to memory. One of the operands has to be a register of some kind which means to copy memory using this instruction, you'd have to explicitly copy from memory to a register and then copy the register to the new memory location. I have doubts as to whether this would be faster than REP MOVSB.
MOVSD is like MOVSB but works on dwords. Both "move" data from ESI to EDI -- there are no registers involved except ECX which is decremented by one i.e. for MOVSD the initial ECX is number of *dwords* to copy while for MOSVB its in bytes.
Here is the RtlMoveMemory from Win11
Code:
_RtlMoveMemory@12:
77B88870 56 push esi
77B88871 57 push edi
77B88872 8B 74 24 10 mov esi,dword ptr [esp+10h]
77B88876 8B 7C 24 0C mov edi,dword ptr [esp+0Ch]
77B8887A 8B 4C 24 14 mov ecx,dword ptr [esp+14h]
77B8887E FC cld
77B8887F 3B F7 cmp esi,edi
77B88881 76 1A jbe _RtlMoveMemory@12+2Dh (77B8889Dh)
77B88883 8B D1 mov edx,ecx
77B88885 83 E2 03 and edx,3
77B88888 C1 E9 02 shr ecx,2
77B8888B F3 A5 rep movs dword ptr es:[edi],dword ptr [esi]
77B8888D 0B CA or ecx,edx
77B8888F 75 05 jne _RtlMoveMemory@12+26h (77B88896h)
77B88891 5F pop edi
77B88892 5E pop esi
77B88893 C2 0C 00 ret 0Ch
77B88896 F3 A4 rep movs byte ptr es:[edi],byte ptr [esi]
77B88898 5F pop edi
77B88899 5E pop esi
77B8889A C2 0C 00 ret 0Ch
77B8889D 74 F9 je _RtlMoveMemory@12+28h (77B88898h)
77B8889F 8B C7 mov eax,edi
77B888A1 2B C6 sub eax,esi
77B888A3 3B C8 cmp ecx,eax
77B888A5 76 DC jbe _RtlMoveMemory@12+13h (77B88883h)
77B888A7 FD std
77B888A8 03 F1 add esi,ecx
77B888AA 03 F9 add edi,ecx
77B888AC 4E dec esi
77B888AD 4F dec edi
77B888AE F3 A4 rep movs byte ptr es:[edi],byte ptr [esi]
77B888B0 FC cld
77B888B1 EB E5 jmp _RtlMoveMemory@12+28h (77B88898h)
77B888B3 90 nop
Notice the thematic double MOVSx
Code:
mov edx,ecx
and edx,3
shr ecx,2
rep movs dword ptr es:[edi],dword ptr [esi]
or ecx,edx
jne _RtlMoveMemory@12+26h (77B88896h)
pop edi
pop esi
ret 0Ch
rep movs byte ptr es:[edi],byte ptr [esi]
First EDX = ECX Mod 4, ECX = ECX \ 4, then MOVSD for Size \ 4 number of dwords and finally for the remaining 0-3 bytes just use MOVSB.
No alignment check, the only check is if it needs to copy backwards on overlapping buffers which is exactly the difference b/n memcpy and memmove in libc -- memcpy does no check and has side effects for overlapping buffers but is slightly faster.
cheers,
</wqw>
-
Jul 3rd, 2022, 11:21 AM
#25
Re: Thunk for a CopyMemory replacement?
Oh you made a typo....you meant MOVSD but you typed MOSVD which I typed into Google and got this which is actually MOVD. So I thought you meant that
Last edited by Niya; Jul 3rd, 2022 at 11:26 AM.
-
Jul 3rd, 2022, 12:05 PM
#26
Re: Thunk for a CopyMemory replacement?
Originally Posted by wqweto
First EDX = ECX Mod 4, ECX = ECX \ 4, then MOVSD for Size \ 4 number of dwords and finally for the remaining 0-3 bytes just use MOVSB.
No alignment check, the only check is if it needs to copy backwards on overlapping buffers which is exactly the difference b/n memcpy and memmove in libc -- memcpy does no check and has side effects for overlapping buffers but is slightly faster.
cheers,
</wqw>
When I compared my straight MOVSB version to this one, the performance was about even if I remembered right. Using MOVSD here might be an example of complicating something simple for marginal gains. Then again, the person that wrote this(assuming it was handwritten) is most likely a way better low level programmer than I will ever be so they probably know things I don't which informed their decision to do it this way.
-
Jul 3rd, 2022, 12:26 PM
#27
Re: Thunk for a CopyMemory replacement?
There a numerous "optimized" memcpy version floating around but Linus deliberately left their memcpy based on MOVSx, rumor says to force Intel/AMD to make MOVSx fastest in their new CPUs and not neglect it as in the past.
cheers,
</wqw>
-
Jul 3rd, 2022, 01:19 PM
#28
Re: Thunk for a CopyMemory replacement?
Originally Posted by wqweto
There a numerous "optimized" memcpy version floating around but Linus deliberately left their memcpy based on MOVSx, rumor says to force Intel/AMD to make MOVSx fastest in their new CPUs and not neglect it as in the past.
cheers,
</wqw>
There is a meme here somewhere:-
Normal programmers: Optimizes code for the CPU
God tier programmers: CPU gets optimized for the code.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|