Why not using "insert or ignore into" sql execute on each? Would be worth a comparison.
Printable View
Why not using "insert or ignore into" sql execute on each? Would be worth a comparison.
Part I:
Krool, it works like this:
There is a list of items (they are files, but it does not matter much for the explanation).
For each item, it needs to calculate some data about the item. It takes some time to do it. Not a lot, but as there are many items, the whole operation can take some time.
The first time that the program runs, it needs to calculate the data for every item, there is no other way.
The second time that the programs runs, and further, the speed could be greatly increased if it doesn't have to calculate the data for each item again, but use data from a previous run.
Then here this cache database comes into play.
Also, between one run an another, there could have appear new items (or some could have been deleted but it does not matter either here, we will keep the cached data of deleted items for now).
Then, this is what the program needs to do:
For each item, first look into the database and if it is found, then retrieve the data from the database, if it is not found, then calculate the data and save it to the database for being ready to use the next time.
That all was working very well and fast with DAO's Seek and AddNew.
The problem was that I want this program to be ready to handle as much data (as much items, OK, files) as a computer can have, and that might be a lot.
And DAO/MDB stopped working at about 3,000,000 items because it hits the 2 GB database file size limit.
It is now already working with SQLite, tested already up to 10,000,000 items, 6.5 GB database file size.
Now I'm about to test and convert the code to use the cursor technology (still I didn't start with that - thanks Olaf).
Part II:
Aside from that.
I think with this cursor technology, the Seek method could be implemented. I'm starting to think that a DAO direct replacement, I mean keeping the original DAO interface and behavior (and performance), could be possible.
The main obstacle seems to be the transactions, that need to be started/committed, but I think that they could be automated too, based on on the actions that the object is receiving through its interface (and might be through some timing too). I think that probably DAO does that under the hood.
This program that I'm working at this time is quite simple regarding database stuff, its code is not hard to be converted, but having such a component would allow to convert larger programs that now are using DAO.
I talk about DAO because it is what I've used (mainly) in my code, but other people would be more interested in replacing ADO I guess.
In these database operations often (not always) speed is very important. The component needs to be as fast as it can. It is different from GUI components where speed is unimportant.
Maybe I'm the only one interested in this backward compatibility.
Revision 6 now. Hopefully now finished. :S
So all possible errors in SQLiteCursor or at creation are now properly raised.
The new RC6-version (6.0.10) is now "online"...
Here's the final Test-Result (also incorporating Krools latest version-update, from one posting above)...
Attachment 185411
Here the Test-Code I've used:
OlafCode:Option Explicit
Private Sub Form_Load()
Dim iDbDAO As Database, iRecDAO As Recordset
Dim iCon As cConnection, iCommand As cCommand
Dim iT1
If New_c.FSO.FileExists(App.Path & "\SQLite.db3") Then
New_c.FSO.DeleteFile App.Path & "\SQLite.db3"
End If
CreateSQLiteDB
Set iDbDAO = DBEngine.OpenDatabase(App.Path & "\OrigDB.mdb")
Set iRecDAO = iDbDAO.OpenRecordset("Files")
Set iCon = New_c.Connection(App.Path & "\SQLite.db3")
Set iCommand = iCon.CreateCommand("INSERT INTO Files (Path_Hash, FileLen, Date, Image_Data) VALUES (?,?,?,?)")
iT1 = Timer
iCon.BeginTrans
Do Until iRecDAO.EOF
iCommand.SetText 1, iRecDAO(1).Value
iCommand.SetInt32 2, iRecDAO(2).Value
iCommand.SetDouble 3, iRecDAO(3).Value
iCommand.SetBlob 4, iRecDAO(4).Value
iCommand.Execute
iRecDAO.MoveNext
Loop
iCon.CommitTrans
LogText "Time to fill DB: " & Round(Timer - iT1, 2) & vbCrLf
End Sub
Private Sub CreateSQLiteDB()
Dim iCon As cConnection
Set iCon = New_c.Connection(App.Path & "\SQLite.db3", DBCreateNewFileDB)
' iCon.Execute "CREATE TABLE Files ( Path_Hash Blob PRIMARY KEY NOT NULL, FileLen INTEGER DEFAULT 0 NOT NULL, Date REAL DEFAULT 0 NOT NULL, Image_Data BLOB NOT NULL) Without RowId"
iCon.Execute "CREATE TABLE Files (ID_File INTEGER PRIMARY KEY, Path_Hash TEXT NOT NULL, FileLen INTEGER DEFAULT 0 NOT NULL, Date REAL DEFAULT 0 NOT NULL, Image_Data BLOB NOT NULL)"
iCon.Execute "CREATE INDEX Path_Hash ON Files (Path_Hash)"
End Sub
Private Sub LogText(nText As String)
txtLog.SelText = nText & vbCrLf
txtLog.Refresh
End Sub
Private Sub Command2_Click()
Dim iKeys(100000) As String, bKeys(100000) As String, iIndexes(100000) As Long
Dim oCon As cConnection, oCur As cCursor
Dim iDbDAO As Database, iRecDAO As Recordset
Dim iSQLite As New SQLiteConnection, iCur As SQLiteCursor
Dim C As Long, ID As Long, PH As String, FL As Long, DT As Date, BD() As Byte
Dim iT1
Set oCon = New_c.Connection(App.Path & "\SQLite.db3")
Set iDbDAO = DBEngine.OpenDatabase(App.Path & "\OrigDB.mdb")
iSQLite.OpenDB App.Path & "\SQLite.db3", SQLiteReadWrite
' load keys
C = 0
Set oCur = oCon.CreateCursor("SELECT Path_Hash FROM Files")
Do While oCur.Step
iKeys(C) = oCur.ColVal(0): C = C + 1
Loop
' set random indexes
For C = 0 To UBound(iIndexes)
iIndexes(C) = Int(Rnd * 100000)
Next
iT1 = Timer
oCon.BeginTrans
Set oCur = oCon.CreateCursor("Select * From Files Where Path_Hash=?")
For C = 0 To UBound(iIndexes)
oCur.SetText 1, iKeys(iIndexes(C))
If oCur.Step Then
' ID = oCur.ColVal(0)
' PH = oCur.ColVal(1)
FL = oCur.ColVal(2)
DT = oCur.ColVal(3)
BD = oCur.ColVal(4)
Else
Stop
End If
Next
oCon.CommitTrans
LogText "RC6-Cursor: time searching 100,000 randoms records that are found: " & Round(Timer - iT1, 2) & vbCrLf
iT1 = Timer
Set iRecDAO = iDbDAO.OpenRecordset("Files")
iRecDAO.Index = "Path_Hash"
For C = 0 To UBound(iIndexes)
iRecDAO.Seek "=", iKeys(iIndexes(C))
If Not iRecDAO.NoMatch Then
' ID = iRecDAO(0).Value
' PH = iRecDAO(1).Value
FL = iRecDAO(2).Value
DT = iRecDAO(3).Value
BD = iRecDAO(4).Value
Else
Stop
End If
Next
LogText "DAO: time searching 100,000 randoms records that are found: " & Round(Timer - iT1, 2) & vbCrLf
iT1 = Timer
iSQLite.Execute "Begin"
Set iCur = iSQLite.CreateCursor("Select * From Files Where Path_Hash=?")
Dim cFL As SQLiteColumn: Set cFL = iCur(3)
Dim cDT As SQLiteColumn: Set cDT = iCur(4)
Dim cBD As SQLiteColumn: Set cBD = iCur(5)
For C = 0 To UBound(iIndexes)
iCur.SetParameterValue 1, iKeys(iIndexes(C))
If iCur.RecordCount Then
' ID = iCur(1).Value
' PH = iCur(2).Value
FL = cFL.Value
DT = cDT.Value
BD = cBD.Value
Else
Stop
End If
Next
iSQLite.Execute "Commit"
LogText "SQLiteCursor: time searching 100,000 randoms records that are found: " & Round(Timer - iT1, 2) & vbCrLf
End Sub
The SQLiteCursor class has also a direct "Value" property. So it wouldn't go through the Column item class.
So that would be more performant as it saves one overhead. And since you also use the direct ColValue approach in your cCursor class it would be more comparable..
Ah - didn't know about the new Value(Idx)-Prop...
Although (if you look at my latest Test-code), I've already tried really hard,
to not "go through your Columns->Item-sequece" (using explicit Column-refs, set before the loop).
Anyways - have changed the relevant part now to the new direct Value-Prop this way:
But it's only 5% better, compared to before:Code:iT1 = Timer 'an "integrated CursorObject" (that allows Param-Settings) would be faster
iSQLite.Execute "Begin"
Set iCur = iSQLite.CreateCursor("Select * From Files Where Path_Hash=?")
For C = 0 To UBound(iIndexes)
iCur.SetParameterValue 1, iKeys(iIndexes(C))
If iCur.RecordCount Then
' ID = iCur.Value(0)
' PH = iCur.Value(1)
FL = iCur.Value(2)
DT = iCur.Value(3)
BD = iCur.Value(4)
Else
Stop
End If
Next
iSQLite.Execute "Commit"
LogText "SQLiteCursor: time searching 100,000 randoms records that are found: " & Round(Timer - iT1, 2) & vbCrLf
Attachment 185413
Olaf
Congratulation, great achievement!
Are you going to tell us what's the witchcraft? Not even calling the SQLite DLL directly can be that fast.
Last night I spent a good time trying to speed up VBSQLite, and could reduce the time but just marginally (10% - 15%).
I see that the main time is taken by sqlite3_reset and sqlite3_bind_text when setting the parameter to the cursor. But the time is taken by the SQLite dll, so you must be doing something entirely different.
PS: I was using sqlite3_bind_text16 instead of converting to UTF8 and sqlite3_bind_text, which works a bit faster (also got rid of all the stub's).
There is a small overhead in the COM calling of Columns/Column/Value/Cursor/Value but it is still marginal. Even returning an empty Value from the Value Column object (returning immediately) the time does not lower much.
The only thing I didn't try was to use a typelib for the API declarations, but I guess that speed increase would be only marginal too. Now I'm curious (I know that wizards like to keep their secrets).
PS2: it must be something in the SQLite configuration I guess.
I've tested this already - but it didn't make a large difference (so I left DAO with the shorter code).
It's only about 3-4% you gain, when you set the DAO-Field-Objects outside the loop:
@Eduardo (regarding "Witchcraft")... I don't know why it is so much faster... possible reasons could be:Code:iT1 = Timer
Set iRecDAO = iDbDAO.OpenRecordset("Files")
iRecDAO.Index = "Path_Hash"
Dim fFL As DAO.Field: Set fFL = iRecDAO(2)
Dim fDT As DAO.Field: Set fDT = iRecDAO(3)
Dim fBD As DAO.Field: Set fBD = iRecDAO(4)
For C = 0 To UBound(iIndexes)
iRecDAO.Seek "=", iKeys(iIndexes(C))
If Not iRecDAO.NoMatch Then
' ID = iRecDAO(0).Value
' PH = iRecDAO(1).Value
FL = fFL.Value
DT = fDT.Value
BD = fBD.Value
Else
Stop
End If
Next
LogText "DAO: time searching 100,000 randoms records that are found: " & Round(Timer - iT1, 2) & vbCrLf
- I've compiled SQLite with quite a few optimizations (using the newest MS-VC++ compiler)
- and I really only iterate between Reset, BindParameter, and Step (nothing much "in-between")
- forgot to mention: the sqlite-exports in cairo_sqlite.dll are __stdcall, and I call them in the VB-Code via typelib (sans Err-Object-overhead)
Olaf
Yes. I renamed sqlite3win32.dll to something else, copied cairo_sqlite.dll there and renamed it as sqlite3win32.dll, then commented the lines (because those function are not found in your dll):
Got 0.66 for VBSQlite, opposed to 1.94 with original sqlite3win32.dll (running uncompiled).Code:' stub_sqlite3_regexp_init PropHandle, 0, 0
' stub_sqlite3_concat_init PropHandle, 0, 0
' Call SQLiteOverloadBuiltinFunctions(PropHandle)
So it is the sqlite3win32.dll the (main) culprit.
Yes, I still use the VC98\Bin\cl.exe.
Perhaps adding /O2 compiler switch makes an impact?
In worst case I could use the vs studio 2017 msvc compiler.
The VB6 IDE crashes on CreateCursor in the call to sqlite3_prepare_v3
Ok, I changed to /Ot (favor speed code), which doesn't crash.
See 4Eduardo_Ot_sqlite3win32.dll
I found out the problem. /O2 is equivalent of /Og /Oi /Ot /Oy /Ob2 /GF /Gy
However, /Og causes the crash. It's deprecated anyhow and probably a VC98 issue.
Here 4Eduardo_O2_exceptOg_sqlite3win32.dll used nowCode:/O2 /Og-
1.9 sec, both. So, the same as the first one.
I just tried with the SQLite DLL version that comes with Windows, winsqlite3.dll, that is in C:\Windows\SysWOW64, put it in the proper folder and renamed it to sqlite3win32.dll, also commented the lines stub_sqlite3_regexp_init... and it worked, but the surprise was that the time was 1.74 seconds. Closer to yours and far from Olaf's one.
Eduardo, beside the topic of compiler optimization it may be worth to increase performance by certain pragmas.
So, after you open a SQLiteConnection you may execute:
If exclusive mode is not wanted you may comment out last line.Code:.Execute "PRAGMA temp_store = 0"
.Execute "PRAGMA journal_mode = OFF"
.Execute "PRAGMA synchronous = 0"
.Execute "PRAGMA locking_mode = EXCLUSIVE"
EDIT: treat this as a test. Not recommend for release configuration. As for example a power outage results then in a corrupt db.
I just successfully recompiled VBSQLite12 with VS2015 but the performance of the latest test project as posted in the current thread was not affected at all.
The only sqlite compile option included in RC6 which made the difference for VBSQLite12 performance turned out to be #define SQLITE_DEFAULT_CACHE_SIZE 8192 and here are the updated results of the test on my machine
Here is how to dump RC6 compile options:Code:Time to fill DB: 0.48
RC6-Cursor: time searching 100,000 randoms records that are found: 0.29
DAO: time searching 100,000 randoms records that are found: 0.75
SQLiteCursor: time searching 100,000 randoms records that are found: 0.27
Here is the commit with VS2015 support for sqlite3win32 repo and fixed SQLITE_DEFAULT_CACHE_SIZE option for performance.Code:For Each vElem In oCon.GetRs("PRAGMA compile_options").GetRows()
Debug.Print vElem
Next
To compile VBSQLite12 with VS2015 produced .cobj files you'll need LinkSwitches=KERNEL32.LIB ucrt.lib vcruntime.lib /OPT:NOREF /OPT:NOWIN98 and then will need to start VB6.exe from console so that you can run vcvars32.bat for the LIB (and INCLUDE) environment variables to be correctly set like this
Or you can make a .bat file with these two lines.Code:c:> "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\vcvars32.bat"
c:> "C:\Program Files (x86)\Microsoft Visual Studio\VB98\VB6.EXE"
cheers,
</wqw>
Btw, there is sqlite3_bind_text16 function which works on UTF-16 strings which might explain the 0.02 difference. Or it could be Timer measurement produced fluke.
Thanks wqweto. I remember SQLITE_DEFAULT_CACHE_SIZE and stumbled on that before. Though I am wondering why I didn't change the default..
Now I did.
EDIT: used SQLITE_DEFAULT_CACHE_SIZE -8000 (-8000 = -8000*1024) instead of +8196
the default is -2000. So it's an 4x increase.
0,80 sec now.
Tested again, 0,66 sec the one from the RC.
Could you share that compilation to test speed here and compare?
PS: testing uncompiled because I could not compile.
Also, using sqlite3_bind_text16 instead of WideCharToMultiByte/sqlite3_bind_text accounts for an important difference: 0,10 sec less (with both DLLs).
Here is the VS2015 compiled version of VBSQLite12. Not sure if it's kept binary compatible with the original though.
cheers,
</wqw>
You do realize VBSQLite12.DLL has sqlite embedded already and does not need any extra DLL to work?
I didn't use nor tested this intermediate DLL at all because it's needed only for debugging VBSQLite12 itself i.e. if you load VBSQLite12 in the IDE in a project groups for instance then you will need it present but once VBSQLite12 is compiled the sqlite is embedded from the .cobj files.
cheers,
</wqw>
Also for sqlite3_prepare (in OpenDataSet) I think that sqlite3_prepare_v2 could be sqlite3_prepare16_v2 and at CreateCommand sqlite3_prepare_v3 could be sqlite3_prepare16_v3. And the same at CreateCursor.
OK Krool, with wqweto's sqlite3win32.dll I get 0,64 sec (with the same conditions: uncompiled and still not having changed sqlite3_bind_text to sqlite3_bind_text16).
It is even slightly faster than RC (0.02).
Yes, this is somewhat tedious work that has to be done in the wrapper, combing through all the cases.
I already tweaked SQLiteCursor.SetParameterValue to use stub_sqlite3_bind_text16 for Case vbString in my compilation of the wrapper above which gained couple of hundreds of a second in the test although probably RC6 does use the same API too.
Btw, in SQLiteCursor.SetParameterValue there is Case vbDouble but vbSingle is missing.
cheers,
</wqw>
Glad you found the issue...
Though since everybody is currently in "C-compiling-mood" anyways,
why not make the jump two versions into the future at this occasion as well:
https://www.sqlite.org/releaselog/3_39_2.html
Olaf
1.2.8 has a critical bug. Fixed now in 1.2.9
The change to sqlite3_bind_text16 uses directly StrPtr(). However, i forgot to add a & vbNullChar.
BSTRs are both length prefixed and null terminated so these can be cast to LPWSTR without additional processing i.e. without null terminating.
Of course BSTRs *can* contain null characters (a.k.a. Chr(0) or vbNullChar) so this breaks LPWSTR cast a bit (become shorter) but not fatally.
Yes I mixed that up. I was used to add always a nullchar as to receiving strings from API's so I am always sure to have a cut-off in the string (Instr vbNullChar always hits) that I somewhat forgot that to sending a string to an API is safe even if the text visible has no nullchar. As Eduardo pointed out in memory there is a terminating null.
So, later on I will change SetParameterValue again to use directly StrPtr() as this can make use of the ByRef Value directly. (most efficient way)