nops do not take any cpu cycles on execution nowadays but i use these to align following offsets in the thunk so to make "codegen" of the variable num_of_args easier like this
Code:
option explicit
private declare sub copymemory lib "kernel32" alias "rtlmovememory" (destination as any, source as any, byval length as long)
private declare function virtualprotect lib "kernel32" (byval lpaddress as long, byval dwsize as long, byval flnewprotect as long, lpfloldprotect as long) as long
public function call_ultow(byval pfn as long, byval value as long, byval str as long, byval radix as long, optional byval spacer as long) as long
pvpatchtrampoline addressof module1.call_ultow, 3
call_ultow = call_ultow(pfn, value, str, radix)
end function
private sub form_click()
dim hlib as long
dim pfn as long
dim sbuffer as string
hlib = loadlibrary("msvcrt")
pfn = getprocaddress(hlib, "_ultow")
sbuffer = string(50, 0)
call_ultow pfn, &h80212123, strptr(sbuffer), 10
msgbox "[" & replace(sbuffer, vbnullchar, vbnullstring) & "]"
end sub
the extra spacer parameter was the invention in this thread. Kudos!
ASM CODE?CAN USE WriteProcessMemory,put asm code to addressof CallCdecl2
so ,no need add-in for asm
Code:
Public Function CallCdecl2( _
ByVal pfn As Long, _
ByVal lArg1 As Long, _
ByVal lArg2 As Long, _
Optional ByVal lRetSpace As Long) As Long
End Function
'===========asm?
CallCdecl2? BITS 32
NUM_OF_ARGS equ 2
pop eax ; get retaddr
pop ecx ; get pfn
mov [esp + NUM_OF_ARGS * 4], eax
call ecx
add esp, NUM_OF_ARGS * 4
ret
I already put more effort in this thread than I intended to, so I'll leave stack shuffling implementation to you.
Now I'll go read some extra info about the great Visual Freebasic of China instead -- we already have enough links on topics about it here, don't we?
cheers,
</wqw>
Unfortunately, I can't write assembly code. If you can bind the API function address of each cdecl DLL to the function address of VB module, it will be most convenient.
The second method points the DLL cdecl function with four parameters to VB function. The first parameter is API address, and then four parameters are added
VB is simple and easy to use, but its function is limited sometimes. VC, Delphi can write assembly code directly in the program, but annoyed is that VB can't. I have seen that there are also VB embedded assembly on the Internet, but some methods are too complex, and there is no corresponding introduction. I provide a method here, maybe you can be useful in the future!
Basic idea: assembly code can be stored in an array of byte type. Then, by some means, the system control right is transferred to this assembly code, and our assembly code segment is executed.
But how to make this assembly code get the control permission of the system? Check the win API manual and you can see that there is a callwindowproc function. This function is originally used to call user-defined window procedures. Its prototype is as follows:
Function CallWindowProc Lib "user32" Alias "CallWindowProcA" ( ByVal lpPrevWndFunc As Long, ByVal hWnd As Long, ByVal Msg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long
It has five parameters. Lpprevnfunc is a long type, which is equal to the address of the user's own window procedure. The other three parameters are required by the window procedure. See MSDN for details. We only need the address of funndwlpc. What if we pass in our assembly code address? Of course, callwindowproc takes this address as the window procedure address, and then calls the assembly code. Our assembly code is executed..
Of course, we have to make a sample. If we pass in the other four parameters, we will pass four zeros. Because we don't need these four parameters, but they are required by callwindowproc. Don't forget that the lpprevwndfunc we passed in is not the real window process address, but our own assembly code address.
Specifically, for example, we want to embed a piece of assembly code that does nothing: first
Dim AsmCode() as byte
redim AsmCode(8)
’Generating machine code
AsmCode(0) = &H58 ’POP EAX
AsmCode(1) = &H59 ’POP ECX
AsmCode(2) = &H59 ’POP ECX
AsmCode(3) = &H59 ’POP ECX
AsmCode(4) = &H59 ’POP ECX
AsmCode(5) = &H50 ’PUSH EAX
’You can add ASM code you want to execute here
’... if added, the following array offsets need to be changed accordingly
’The code you add ends here
’Return control to main program
AsmCode(6) = &HC3 ’RET
’.....
then:
Calldllfunction = callwindowproc (varptr (asmcode (0), 0, 0, 0) varptr function, used to get the variable address. Returns a long value.
Why do I need to execute several pop and one push? Because we disguise it as a window procedure with the first address of an assembly code. When the system calls callwindowproc, we actually pass in four parameters, namely the four zeros above, in addition to lpprevwndfunc. When calling lpprevwndfunc, the callwindoproc function pushes the remaining four parameters into the stack. Equivalent to executing the following code:
xxxx00A4H: push 0
xxxx00A6H: push 0
xxxx00A8H: push 0
xxxx00AAH: push 0
Xxxx00ach: call varptr (asmcode (0)) (we can't see this code, which is processed internally by callwindoproc)
xxxx00AFH: ......
Because we don't use these four parameters at all, we just need to pop it up. Therefore, we executed four pop ECX, that is, pop the four unused parameters to keep the correctness of the stack pointer. But why do we need pop eax? The reason is that callwindowproc regards lpprevwndfunc as a window procedure. As a normal window procedure, when executing a call statement, we have to push the address of the next instruction of the call statement into the stack for the subroutine ret. The code above is executed:
push xxxx00afh。 In fact, in callwindowproc,
In fact, we have to pay attention to the following code when executing these sentences implicitly:
Push 0; the parameter is put on the stack
push 0
push 0
push 0
Push xxx00afh; (automatically when call is executed)
In order to keep the stack pointer balanced after the window procedure is executed, of course, the corresponding pop instruction should be executed. The first pop eax is to temporarily save the address returned by the subroutine in register eax, and then pop up four unused parameters. Then, the return address saved in eax is pushed back to the stack. When RET is executed, it will return to callwindowproc correctly.
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
Private Declare Function VirtualProtect Lib "kernel32" (ByVal lpAddress As Long, ByVal dwSize As Long, ByVal flNewProtect As Long, lpflOldProtect As Long) As Long
Public Function CallCdecl(ByVal Pfn As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module1.CallCdecl, 0
CallCdecl = CallCdecl(Pfn)
End Function
Public Function CallCdecl1(ByVal Pfn As Long, ByVal Arg1 As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module1.CallCdecl1, 1
CallCdecl1 = CallCdecl1(Pfn, Arg1)
End Function
Public Function CallCdecl2(ByVal Pfn As Long, ByVal Arg1 As Long, ByVal Arg2 As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module1.CallCdecl2, 2
CallCdecl2 = CallCdecl2(Pfn, Arg1, Arg2)
End Function
Public Function CallCdecl3(ByVal Pfn As Long, ByVal Arg1 As Long, ByVal Arg2 As Long, ByVal Arg3 As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module1.CallCdecl3, 3
CallCdecl3 = CallCdecl3(Pfn, Arg1, Arg2, Arg3)
End Function
Public Function CallCdecl4(ByVal Pfn As Long, ByVal Arg1 As Long, ByVal Arg2 As Long, ByVal Arg3 As Long, ByVal Arg4 As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module1.CallCdecl4, 4
CallCdecl4 = CallCdecl4(Pfn, Arg1, Arg2, Arg3, Arg4)
End Function
Private Function pvPatchTrampoline(ByVal Pfn As Long, ByVal lNumParams As Long) As Boolean
Const PAGE_EXECUTE_READWRITE As Long = &H40
Const THUNK_SIZE As Long = 21
Dim bInIDE As Boolean
Dim aThunk(0 To 5) As Long
Debug.Assert pvSetTrue(bInIDE)
If bInIDE Then
Call CopyMemory(Pfn, ByVal Pfn + &H16, 4)
Else
Call VirtualProtect(Pfn, THUNK_SIZE, PAGE_EXECUTE_READWRITE, 0)
End If
' 0: 58 pop eax
' 1: 89 84 24 XX XX XX XX mov dword ptr [esp+Xh],eax
' 8: 58 pop eax
' 9: FF D0 call eax
' 11: 90 nop
' 12: 90 nop
' 13: 90 nop
' 14: 81 C4 XX XX XX XX add esp,Xh
' 20: C3 ret
aThunk(0) = &H24848958
aThunk(1) = lNumParams * 4 + 4
aThunk(2) = &H90D0FF58
aThunk(3) = &HC4819090
aThunk(4) = lNumParams * 4
aThunk(5) = &HC3
Call CopyMemory(ByVal Pfn, aThunk(0), THUNK_SIZE)
'--- success
pvPatchTrampoline = True
End Function
Private Function pvSetTrue(bValue As Boolean) As Boolean
bValue = True
pvSetTrue = True
End Function
. . . so you can call _ultow with something like this
CallCdecl3 Pfn, &H80212123, StrPtr(sBuffer), 10
This is the complete source code for Module1.bas above, nothing extra is needed.
The trampolines are implemented as self-modifying code and are self-patched with ASM code "in-place" (no separate memory for thunks allocated).
In the IDE when you edit VB6 code while debugging, some modules might get recompiled, including the module with the trampolines. So the trampoline patches will be deallocated when the VB6 functions get recompiled at a new memory location by the IDE. Fortunately because these are implemented as self-modifying functions on first call the trampolines will self-patch with ASM code again and everything will work as expected.
DOSE IT CALL EVERY TIME?
CALL CallCdecl2 =?CALL 2 LINES CODE?
pvPatchTrampoline AddressOf Module1.CallCdecl2, 2
CallCdecl2 = CallCdecl2(Pfn, Arg1, Arg2)
I WANT TO FIND A WAY to write ASM CODE to VB FUNCTION BY WriteProcessMemory ONLY once
IT'S ERR FOR [CDECL]
Private Declare Function sqlite3_exec Lib "sqlite3" ((ByVal sqlite3 As Long, ByVal zSql As long) As Long
function Mysqlite3_exec(ByVal sqlite3 As Long, ByVal zSql As long)
' WriteProcessMemory ASMCODE here in sub main() once
asm like :
put arg1 from sqlite3
push arg2 from zSql
call sqlite3_exec
end function
Here are the trampolines for up to 4 parameters
call cdecl dll api ,IT CAN RUN IN vb6 ide,BUT NOT IN EXE
now it't ok,i fixedmaybe need more code memory in CallCdecl function
Code:
Public Function CallCdecl2(ByVal Pfn As Long, ByVal Arg1 As Long, ByVal Arg2 As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module3.CallCdecl2, 2
CallCdecl2 = CallCdecl2(Pfn, Arg1, Arg2)
Exit Function
MsgBox 0
MsgBox 1
MsgBox 2
End Function
---------------------------
工程1
---------------------------
CALL V8JS.DLL
First Run Used Time: 5.0935 MS
Now Used Time: 0.031 MS
Com vTableUsed Time: 0.037 MS
DispCallFunc Time: 0.0125 MS
---------------------------
工程1
---------------------------
Call Cdecl dll api ,UsedTime:
VbAddRef--10.422 MS
VbAdd--9.8749 MS
CallCdecl--10.307 MS
Com vTable--317.5384 MS
DispCallFunc--424.8981 MS
---------------------------
Code:
Private Declare Function QueryPerformanceFrequency Lib "kernel32" (lpFrequency As Currency) As Long
Public Declare Function QueryPerformanceCounter Lib "kernel32" (lpPerformanceCount As Currency) As Long
Public MsCpu As Double, Counter_S As Currency
Public CPUv1 As Currency, CPUv2 As Currency
Public CPUv3 As Currency, CPUv4 As Currency
Public CusedTime As Currency
Sub IntCpuTimer()
If QueryPerformanceFrequency(Counter_S) Then
MsCpu = Counter_S / 1000
Else
MsgBox "Un SUPPORT!"
End If
End Sub
Function VbAdd(ByVal a As Long, ByVal b As Long) As Long
VbAdd = a + b
End Function
Function VbAddRef(a As Long, b As Long) As Long
VbAddRef = a + b
End Function
Function MyAdd(ByVal a As Long, ByVal b As Long) As Long
MsgBox "put asm"
MsgBox "put asm"
MsgBox "put asm"
MsgBox "put asm"
MyAdd= 0
End Function
TEST:
Code:
IntCpuTimer
Dim MethodName(4) As String
Dim TimeSz(4) As Currency
MethodName(0) = "VbAddRef"
MethodName(1) = "VbAdd"
MethodName(2) = "CallCdecl"
MethodName(3) = "Com vTable"
MethodName(4) = "DispCallFunc"
Dim Count As Long
Dim KMAX As Long
KMAX = 5
Count = 100000
Dim a As Long, b As Long, K As Long, i As Long
For K = 1 To KMAX
For i = 1 To Count
a = i
b = i * 2
QueryPerformanceCounter CPUv1
ret = VbAddRef(a, b)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
TimeSz(0) = TimeSz(0) + CusedTime
Next
Next
For K = 1 To KMAX
For i = 1 To Count
a = i
b = i * 2
QueryPerformanceCounter CPUv1
ret = VbAdd(a, b)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
TimeSz(1) = TimeSz(1) + CusedTime
Next
Next
For K = 1 To KMAX
For i = 1 To Count + 1
a = i
b = i * 2
QueryPerformanceCounter CPUv1
ret = CallCdecl2(ExcuteJavaScript_Ptr, a, b)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
If 第一次用时 = 0 Then
第一次用时 = CPUv2 - CPUv1
Else
TimeSz(2) = TimeSz(2) + CusedTime
End If
Next
Next
For K = 1 To KMAX
For i = 1 To Count
a = i
b = i * 2
QueryPerformanceCounter CPUv1
ret = Com_ExcuteJavaScript.Invoke(a, b)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
TimeSz(3) = TimeSz(3) + CusedTime
Next
Next
For K = 1 To KMAX
For i = 1 To Count
a = i
b = i * 2
QueryPerformanceCounter CPUv1
ret = DispCallFunc_CDECL(ExcuteJavaScript_Ptr, vbLong, a, b)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
TimeSz(4) = TimeSz(4) + CusedTime
Next
Next
For i = 0 To UBound(MethodName)
MethodName(i) = MethodName(i) & "--" & TimeSz(i) & " MS"
Next
MsgBox "Call Cdecl dll api ,UsedTime:" & vbCrLf & Join(MethodName, vbCrLf)
Last edited by xiaoyao; Dec 2nd, 2020 at 07:01 AM.
I already warned you about IDE recompilations. These will "erase" the ASM code from the patched VB6 function so the trampolines will stop functioning (unless they are made self-modifying functions as the impl above).
If you still don't want to impl them as self-modifying code you'll have to re-patch the function (in the IDE only) before each call just to be sure these are functioning trampolines like this
This will allow using the trampolines in the IDE reliably, even if their code module gets recompiled. (The Debug.Assert line will not get compiled in the final executable.)
Originally Posted by xiaoyao
First Run Used Time: 5.6877 MS
Now Used Time: 0.0302 MS
It runs 【188.33】 times faster
You are the king of pointless micro-benchmarking :-))
Still good to know you had the cdecl trampolines going, hope you find them useful for your purposes.
seems like it would be way easier to download a copy of visual studio community for free
and learn to compile sqllite as a stdcall dll or write a stdcall shim to access it?
VbAddRef--10.4712 MS
VbAdd--10.2099 MS
CallCdecl--10.6393 MS
Com vTable--316.0363 MS
DispCallFunc--425.0076 MS
ClassAdd--23.5341 MS
ClassAddRef--24.1091 MS
ClassFriendAdd--10.5864 MS
class1.cls:
Code:
Public Function Add(ByVal a As Long, ByVal b As Long) As Long
Add = a + b
End Function
Public Function AddRef(a As Long, b As Long) As Long
AddRef = a + b
End Function
Friend Function FriendAdd(ByVal a As Long, ByVal b As Long) As Long
FriendAdd = a + b
End Function
Last edited by xiaoyao; Dec 2nd, 2020 at 07:31 AM.
seems like it would be way easier to download a copy of visual studio community for free
and learn to compile sqllite as a stdcall dll or write a stdcall shim to access it?
It is certainly the most convenient to directly compile sqlit3.dll source code for stdcall output.
We are studying many cases without source code, such as how to call the fastest and more convenient
it's very good,thank you,
With your method, the running speed is almost the same as VB's function running speed. Thank you very much. You are awarded an Olympic cup
I don't understand what's the sense to use __stdcall if you anyway use the LoadLibrary/GetProcAddress pair to get a function pointer and use this pointer to call the function.
If you use a static dll you can use both __stdcall and __cdecl "out of box".
If you use a dynamic dll you could use any variant from this thread.
so ,no need add-in for asm
Add-in does NO dyncmic code at all. The code is placed to compiled executable.
sqlite3_exec
This function is the main problem is still reading and writing a large number of cells.
It took me five seconds to read and write one compo.
I spent 5 seconds reading and writing fifty thousand rows and fifteen columns in one component of an Excel spreadsheet when it's stdcall.
If it's cdecl, it could take 30 seconds.
He's obsessed with speed (on which I agree).
But if you do a single query to retrieve the data you need, how would this affect a the speed of an actual call.
If a query returns 50K rows with 15 columns then main time is spent in the query, not in the call.
He's obsessed with speed (on which I agree).
But if you do a single query to retrieve the data you need, how would this affect a the speed of an actual call.
If a query returns 50K rows with 15 columns then main time is spent in the query, not in the call.
50,000*15 cells need to read,need call cdecl api 50,000*15 times.
old sqlite3 have gettable method api
it will save in vb Array variable.
how to chang every api to One to one correspondence vb function when call without Pfn?
function CallCdecl2(ByVal Pfn As Long, ByVal Arg1 As Long, ByVal Arg2 As Long, Optional ByVal Spacer As Long) As Long
pvPatchTrampoline AddressOf Module1.CallCdecl2, 2
CallCdecl2 = CallCdecl2(Pfn, Arg1, Arg2)
End Function
like:
function Calladd( ByVal Arg1 As Long, ByVal Arg2 As Long, Optional ByVal spacer As Long) As Long
Calladd = Calladd(Arg1, Arg2)
End Function
sub main
pfn=getproaddress(hmode,"add")
pvPatchTrampoline pfn,AddressOf Module1.Calladd,2
end sub
My test results above prove that the speed is indeed increased by 40 times. It's just that the way of calling is more troublesome, and two more parameters need to be added. A function address, and one last untaken argument. But this method is more general, in general, two to four parameters can call most of the DLL API.
seems like it would be way easier to download a copy of visual studio community for free
and learn to compile sqllite as a stdcall dll or write a stdcall shim to access it?
Why bother? winsqlite3.dll already ships as part of Windows 10:
cdecl declare sub api1 lib "abc.dll" (destination as any, source as any, byval length as long)
cdecl declare function api2 lib "abc.dll" (destination as any, source as any, byval length as long) as long
how to write asm to vb function for call cdecl
sub api1(a,b,c)
msgbox "put asm"
end sub
function api2(a,b,c) as long
'
end function
sqlite3.dll only An example 。
function Mysqlite3_exec(ByVal sqlite3 As Long, ByVal zSql As long)
'ASMCODE:
'CALL address1(sqlite3 ,zSql )
'clear something?
end function
PatchTrampoline AddressOf Module1.CallCdecl1, 1
PatchTrampoline AddressOf Module1.CallCdecl2, 2
=========
can chang CallCdecl2 without (ByVal Pfn As Long, Optional ByVal Space。
General method of a few parameters, as long as the five functions can be called almost all the API functions.
The speed of this operation is very fast.
new ide(visual freebasic support cdecl,)But the most intuitive way is to write one for each function.
how to realize this?
The main principle is to open the software for the first time, the address of all these functions, the number of parameters compiled into the compilation, the real call is just to read a few parameters can be.
主要原理就是第一次打开软件,把这些所有的函数地址,参数数量用汇编写入进去,真正调用的时候只是读取几个参数就可以。
function Mysqlite3_exec(ByVal sqlite3 As Long, ByVal zSql As long)
'
end function
Some people may misunderstand that I am advertising, the software itself is free. Freebasic is a language that was first used to compile support for Qbasic. Technology without borders, VB6 has been dead for 23 years, we this. This theme is mainly to enhance some of its functions, but this is limited after all.
visual freebasic are easy to use, more than VB6
But the operation com object is very troublesome, hoped that has the ability person to solve him together.
like sok=invoke(objptr,vblong,vblong)
Last edited by xiaoyao; Dec 4th, 2020 at 05:06 PM.
... visual freebasic are easy to use, more than VB6 ...
No, it isn't "easier to use" (for complex scenarios), as long as it does not:
- come with proper Class-definition support
- and proper Event-support on these Classes (WithEvents-support, RaiseEvent-support)
- and proper "generalized Error-Handling" when interacting with the Methods of these Classes
Originally Posted by xiaoyao
... the operation com object is very troublesome, ...
like sok=invoke(objptr,vblong,vblong)
The COM-signature you described above:
HResult hr = SomeVoidReturningFunc(ClassType * ThisPtr, int Arg1 , int Arg2)
or
HResult hr = SomeIntReturningFunc(ClassType * ThisPtr, int Arg1 , int Arg2, [out, retval] int * Result)
is a necessity when interacting with "Object-Methods, that shall transport Error-state".
You cannot avoid or circumvent this call-scheme.
Besides, it was already mentioned, that the slight call-overhead of such Object-Methods -
(due to "one more arg" in case of a void-method, and "two more args" in case of a value-returning method) -
is timing-wise not really significant, when the method-body performs more than a simple: return a + b internally.
BTW, you still have not provided some zipped CodeArchive, where you demonstrate what you need this "fast sqlite-calling" for.
You mentioned, that you "gave proof" that "some thing" was faster than "some other thing" -
but such statements are just "gobbledigook", as long as you don't provide real proof via some Test-Code which supports your claims also on the machines of other developers.
You never do that (posting your own code for something) - the only thing you do so far is "making wild claims" (in very badly translated english).
Since you are still unfamiliar with VB6-code apparently -
I'd personally have no problem when you'd post your "proofs" in a zipped freebasic *.bas module instead -
but please post at least "something" when you want us to take you seriously.
main code in reply:#48
can't upload cdecl.dll (api=Add),sorry!
Add(11,22)=33
DOWN test: CDECL_speed Test.zip
Some people are not interested in this at all. Those who are interested in it will write an example to test. Using other programming languages to generate a cdecl API of addition operation, DLL file is on the line. Hard work, because I used to test a cdecl V8 JS calculation, and SQLite has nothing to do with it. All these DLL files are not allowed to be uploaded.
In fact, my focus is not on how to handle SQLite, but on how to call cdecl API to minimize the overhead
how to call cdecl if fast,you can try.
sorry,my code is too much ,and i put the testcode ,you can try yourself.
i put more method for call cdecl,now i add one again.
'Rmlistview.rar cCDECL.cls
'http://read.pudn.com/downloads80/sourcecode/others/306578/listview/32bpp%20DIB/cCDECL.cls__.htm
test:times=50 , fotnext=100000
Call Cdecl dll api ,UsedTime:
cCDECL.CLS--2389.5272 MS
Call Cdecl dll api ,UsedTime:
VbAddRef--103.6018 MS
VbAdd--99.4034 MS
CallCdecl--153.6503 MS
Com vTable--3129.5649 MS
DispCallFunc--4215.9675 MS
ClassAdd--237.5009 MS
ClassAddRef--246.6976 MS
ClassFriendAdd--109.4997 MS
Code:
Dim Lib_cCDECL As New cCDECL
'C = Lib_cCDECL.CallFuncAddress(ExcuteJavaScript_Ptr, a, b)
With Lib_cCDECL
For K = 1 To KMAX
For i = 1 To Count
a = i
b = i * 2
QueryPerformanceCounter CPUv1
ret = .CallFuncAddress(ExcuteJavaScript_Ptr, a, b)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
TimeSz(8) = TimeSz(8) + CusedTime
Next
Next
End With
down test CDECL_speed Test.zip
Help
1.95 MB of 9.54 MB Used(why not 1000mb?)
File Upload Manager - Manage all files that you have uploaded
Last edited by xiaoyao; Dec 5th, 2020 at 06:08 AM.
how to fix this for call cdecl?
MakeFunctionCdecl ExcuteJavaScript_Ptr, AddressOf MyAdd, 2
Public Sub MakeFunctionCdecl(DllFunAddr As Long, BackFunAddr As Long, Args As Long)
Dim code() As Byte, JmpBackAddr As Long
Dim OldProtect As Long
Dim ByteLen As Long
Args = 2
ByteLen = 16
ReDim code(ByteLen)
Vblegend.VirtualProtect ByVal DllFunAddr, ByteLen, 64, OldProtect '更改函数地址所在页面属性
JmpBackAddr = DllFunAddr - BackFunAddr - 5
code(0) = &HE9
CopyMemory code(1), JmpBackAddr, 4
Dim i As Long
' 11: 90 nop
' 12: 90 nop
' 13: 90 nop
' 14: 81 C4 XX XX XX XX add esp,Xh
' 20: C3 ret
Vblegend.WriteProcessMemory -1, ByVal BackFunAddr, code(0), ByteLen, 0
End Sub
When you JMP to DllFunAddr the function there wil RET to the original address of the caller so your ADD ESP, Xh will not be reached at all.
I already told you what needs to be done: shuffle the args so that a *new* return address is inserted on the stack, namely the address of your ADD ESP, Xh + RET thunk.
Shuffling args with REP MOVSB will be slower that PUSH-ing one more additional param as the original solution proposed by The Trick in the thread.
I know you'd love to micro-benchmark it with yet another flowed benchmark of yours. This
Code:
QueryPerformanceCounter CPUv1
ret = .CallFuncAddress(ExcuteJavaScript_Ptr, a, b)
QueryPerformanceCounter CPUv2
. . . is wrong on so many levels. You cannot benchmark like this as the performance counters are not precise enough. It's impossible to measure couple of instructions with QueryPerformanceCounter. This API takes a lot more that couple of instructions to execute and the mutlimedia timer it's using is not precise enough.
Suppose you have functions f and g. You have to benchmark a loop of million executions for f (A), then for g (B) and finally million executions of an empty loop (C) then compare A - C vs B - C.
MAYBE NOT USE JMP,USE call eax?
my friend write asm code,it's run successful,but i lost the code,He can't be reached now。
' 9: FF D0 call eax
Code:
push 1
push 2
call function
add esp, 8
like this?
function myAdd(byval a as long ,byval b as long) as long
push b
push a
call function (cdecl api ptr)
add esp, 8
end function
Last edited by xiaoyao; Dec 5th, 2020 at 07:16 AM.
Call Cdecl dll api ,UsedTime:
VbAddRef--103.6018 MS
VbAdd--99.4034 MS
CallCdecl--153.6503 MS
Com vTable--3129.5649 MS
DispCallFunc--4215.9675 MS
ClassAdd--237.5009 MS
ClassAddRef--246.6976 MS
ClassFriendAdd--109.4997 MS
cCDECL.CLS--2389.5272 MS
Yep, that's just another of your "Micro-Benchmarks" (which measure only call-overhead - and leave out the work within a function-body).
And BTW - your results above are, what comes out when you run the test in the IDE.
For a more real-world-test regarding call-overheads, you should compile natively (with all extended compiler-options checked).
I've just did that with your code (using the cdecl.dll which implemented the little Add-function) -
compiling the whole thing natively (with all extended options checked) -
and also including (just for fun) the RC5/RC6 built-in cdecl-call-helper (instead of the cCDecl-Class of Paul Caton).
Here's the code I've used, to replace Catons cCDecl class in your Module1:
Code:
'With Lib_cCDECL
'For K = 1 To TimesVal
'For i = 1 To Imax
' a = i
' b = i * 2
' QueryPerformanceCounter CPUv1
' Ret = .CallFuncAddress(ExcuteJavaScript_Ptr, a, b)
' QueryPerformanceCounter CPUv2
' CusedTime = (CPUv2 - CPUv1) / MsCpu
' TimeSz(7) = TimeSz(7) + CusedTime
'Next
'Next
'End With
With New_c 'RC5/RC6 based cdecl-Helper-call
Dim pArgs As Long, Args(0 To 1) As Long
pArgs = VarPtr(Args(0))
For K = 1 To TimesVal
For i = 1 To Imax
Args(0) = i
Args(1) = i * 2
QueryPerformanceCounter CPUv1
Ret = .cdeclCallDirect(retLong, ExcuteJavaScript_Ptr, pArgs, 8)
QueryPerformanceCounter CPUv2
CusedTime = (CPUv2 - CPUv1) / MsCpu
TimeSz(7) = TimeSz(7) + CusedTime
Next
Next
MethodName(7) = "RC5/6 cdeclcall"
End With
And the result I got (running the native-compiled *.exe) is this one:
And as you can see from the native compiled results, the call-overheads differ from yours - and are all (unsurprisingly) -
roughly at the same level of about 13-26msec (averaging at about 20msec to perform the 0.5Mio calls).
The only significant outlier regarding the Overhead is (with 372msec) the DispCallFunc-based cdecl-call.
The COM-based calls (via Class-Methods) have to transport "2 additional Params" - and have therefore a timing "above 20msec"
(this includes the call via the RC5/6 cdecl helper, which also has to go through that COM-call-overhead - and is therefore a bit slower than the "trampoline").
That testing done - what I would like to see from you, is a real-world test-example (not a micro-benchmark) -
where you call into a cdecl-library (as e.g. the original sqlite3.dll binary from sqlite.org),
and then do "some real work" with that dll (e.g. the Excel-import/export scenario you mentioned).
When you JMP to DllFunAddr the function there wil RET to the original address of the caller so your ADD ESP, Xh will not be reached at all.
I already told you what needs to be done: shuffle the args so that a *new* return address is inserted on the stack, namely the address of your ADD ESP, Xh + RET thunk.
Shuffling args with REP MOVSB will be slower that PUSH-ing one more additional param as the original solution proposed by The Trick in the thread.
I know you'd love to micro-benchmark it with yet another flowed benchmark of yours. This
Code:
QueryPerformanceCounter CPUv1
ret = .CallFuncAddress(ExcuteJavaScript_Ptr, a, b)
QueryPerformanceCounter CPUv2
. . . is wrong on so many levels. You cannot benchmark like this as the performance counters are not precise enough. It's impossible to measure couple of instructions with QueryPerformanceCounter. This API takes a lot more that couple of instructions to execute and the mutlimedia timer it's using is not precise enough.
Suppose you have functions f and g. You have to benchmark a loop of million executions for f (A), then for g (B) and finally million executions of an empty loop (C) then compare A - C vs B - C.
cheers,
</wqw>
test :50*100000 times
VbAddRef--8.5441 MS
VbAdd--7.3409 MS
CallCdecl2--46.6595 MS ( more than ClassFriendAdd)
Com vTable--2998.326 MS
DispCallFunc--4087.915 MS
ClassAdd--119.4827 MS
ClassAddRef--110.8526 MS
ClassFriendAdd--12.643 MS
cCDECL.CLS--2100.1245 MS
Code:
QueryPerformanceCounter CPUv1
For K = 1 To KMAX
For i = 1 To Count
a = i
b = i * 2
ret = VbAdd(a, b)
Next
Next
QueryPerformanceCounter CPUv2
TimeSz(1) = (CPUv2 - CPUv1) / MsCpu
Last edited by xiaoyao; Dec 5th, 2020 at 07:37 AM.
RE:That testing done - what I would like to see from you, is a real-world test-example (not a micro-benchmark) -
where you call into a cdecl-library (as e.g. the original sqlite3.dll binary from sqlite.org),
and then do "some real work" with that dll (e.g. the Excel-import/export scenario you mentioned).
Olaf
==============
At present, I mainly test how to CDECL API call the most convenient and the fastest. Specific how to use SQLite, the interested person to test it. Perhaps the fastest way is to query once and then receive data from callback n times. Another way is to get table once, directly save all data to VB6 string array.
i want to study,CAN YOU GIve me vb6 sources of " RC5/6 cdecl "?thank you
Last edited by xiaoyao; Dec 6th, 2020 at 04:18 AM.
Call Cdecl by VB Function
why Stack was trashed by 4 bytes?
Code:
Function VB_CdeclAPI_Sum(ByVal a As Long, ByVal b As Long) As Long
MsgBox 1
MsgBox 2
MsgBox 2
MsgBox 2
MsgBox 2
End Function
Sub FixCdecl(VbFunction As Long, CdeclApi As Long, args As Long)
'ESP堆栈不平衡 Stack was trashed by 4 bytes
Dim asm() As String, stub() As Byte
Dim i As Long, argSize As Long
argSize = args * 4
' 0: 58 pop eax
' 1: 89 84 24 XX XX XX XX mov dword ptr [esp+Xh],eax
push asm(), "58 89 84 24 " & lng2Hex(argSize + 0) '&H24848958
push asm(), "B8 " & lng2Hex(CdeclApi) 'B8 90807000 MOV EAX,708090
push asm(), "FF D0" 'FFD0 CALL EAX
push asm(), "83 C4 " & Hex(argSize + 0) '83 C4 XX add esp, XX 'cleanup args
'push asm(), "C2 10 00"
push asm(), "C3"
stub() = toBytes(Join(asm, " "))
Dim THUNK_SIZE As Long
THUNK_SIZE = UBound(stub) + 1
VirtualProtect2 VbFunction, THUNK_SIZE, PAGE_EXECUTE_READWRITE, 0 '更改函数地址所在页面属性
WriteProcessMemory2 -1, VbFunction, VarPtr(stub(0)), THUNK_SIZE, 0
'Vblegend.VirtualProtect VbFunction, THUNK_SIZE, PAGE_EXECUTE_READWRITE, 0 '更改函数地址所在页面属性
'Vblegend.WriteProcessMemory -1, VbFunction, stub(0), THUNK_SIZE, 0
End Sub
form1 code:
Code:
Dim startESP As Long, endEsp As Long
startESP = getESP
Dim h As Long, ret As Long
Dim CdeclApi As Long, lpfnAdd As Long, lpfnVoid As Long, lpfnSub As Long
h = LoadLibrary("cdecl.dll")
CdeclApi = GetProcAddress(h, "Add")
Dim a As Long, b As Long, c As Long
a = 44
b = 55
FixCdecl AddressOf VB_CdeclAPI_Sum, CdeclApi, 2
' FixCdecl AddressOf VB_CdeclAPI_Sum, CdeclApi, 8
startESP = getESP
c = VB_CdeclAPI_Sum(a, b)
endEsp = getESP
MsgBox "c=" & c
'ESP堆栈不平衡
MsgBox "Stack was trashed by " & (endEsp - startESP) & " bytes"