Convert binary data to a valid Unicode string whitout wasting (much) space ❓
OK, everybody waste space, so let's use Base64 or better: Hexadecimal.
But what if you need to save memory?
Copying byte arrays directly to strings work for some things, but other things don't work. For example for using that string as a Key in a collection.
I think there must be a clever way to handle that, for example just escaping the special code points that cause the string "corruption".
Or converting to a much higher base, here someone a JavaScript script that does that: base32678 (unfortunately JavaScript, here the discussion)
What about the idea of escaping/modifying the special characters?
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
I have to ask. What is the ultimate goal here? As I understand it, encodings like base64 exist explicitly for transporting binary data through channels that are designed to support text only like a URL or a text file. In these cases we don't typically care about how efficient the format in terms of storage space. This seems like an odd request for a binary encoding format.
C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter
There's just no reason to use garbage like InputBox. - jmcilhinney
The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
On the contrary, the channels that are so limited are teletypes, maybe old modems, old emails formats, and... URLs (old URLs).
In today's world it is a waste of space in most places.
In VB6, it works with UTF-16, it does not need to be so limited.
In my special case, I need to save memory space. They are keys in a collection and now my program exhaust the 2GB local process memory because of that collection, that could be much smaller.
I was using 32 characters Hex keys, a total waste.
Now I'm trying to use the 16 bytes hash from where they come from, using as close to 16 bytes as possible. Not need to be fixed the lengths, may be it will perform even more efficiently if they vary.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
OK, everybody waste space, so let's use Base64 or better: Hexadecimal.
But what if you need to save memory?
Copying byte arrays directly to strings work for some things, but other things don't work. For example for using that string as a Key in a collection.
I think there must be a clever way to handle that, for example just escaping the special code points that cause the string "corruption".
Or converting to a much higher base, here someone a JavaScript script that does that: base32678 (unfortunately JavaScript, here the discussion)
What about the idea of escaping/modifying the special characters?
Define "special characters" in this context, do you mean anything outside of ASCII or ANSI characters? If so why not something like HTML encoding or possibly url encoding.
First time I have come across base32768, nice to see a .Net library available as well https://github.com/kzrnm/Base32768 - might have a proper look at that later as I am curious to see if I could find a use for it.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
On the contrary, the channels that are so limited are teletypes, maybe old modems, old emails formats, and... URLs (old URLs).
In today's world it is a waste of space in most places.
In VB6, it works with UTF-16, it does not need to be so limited.
In my special case, I need to save memory space. They are keys in a collection and now my program exhaust the 2GB local process memory because of that collection, that could be much smaller.
I was using 32 characters Hex keys, a total waste.
Now I'm trying to use the 16 bytes hash from where they come from, using as close to 16 bytes as possible. Not need to be fixed the lengths, may be it will perform even more efficiently if they vary.
That sounds like a HashSet or similar, if you don't ever need to retrieve a list of the keys then you only store the hash values.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by PlausiblyDamp
That sounds like a HashSet or similar, if you don't ever need to retrieve a list of the keys then you only store the hash values.
The keys are already hash values, what I store and need to retrieve are Long values that are indexes.
What take (more) space are the keys, not the values.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by PlausiblyDamp
Define "special characters" in this context, do you mean anything outside of ASCII or ANSI characters?
I tried to explain already, and is the key point of this thread: any byte code, or more properly any byte code succession that can not be a valid string in UTF-16, or that is changed by functions that process text (as text and not as byte arrays). I guess they are not many of these invalid or control "characters".
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Ok, I'm still having trouble understanding. So what you're saying is that you have 32 bit hash values that you're using as keys in a collection. These hash values are stored as Strings, presumably in hexadecimal format. So what you're asking is how to store the hash values as a String but in a format that uses less space. Is that about right?
C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter
There's just no reason to use garbage like InputBox. - jmcilhinney
The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
I'm doing some testing and it seems that it is the collection object that has this problem but it is not something general, so resorting to an alternative collection might work.
The collection has an issue for example with the character ChrW(-24627), but if I set the collection to nothing always before the addition, it takes it with no problem, but if there are many keys already in the collection, adding that character causes an error (and starts erroring with any character), and strangely if I set it to nothing and start a new one after that, I also get the issue. Weird.
Originally Posted by Niya
So what you're asking is how to store the hash values as a String but in a format that uses less space. Is that about right?
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by PlausiblyDamp
Why use a string for the hash value? Could you not just use the underlying integer or long value of the hash?
The Collection object in VB6 is not like a Dictionary<TKey,TValue> in .Net where you can use anything for a key. You have to use a String key.
Originally Posted by Eduardo-
I'm doing some testing and it seems that it is the collection object that has this problem but it is not something general, so resorting to an alternative collection might work.
The collection has an issue for example with the character ChrW(-24627), but if I set the collection to nothing always before the addition, it takes it with no problem, but if there are many keys already in the collection, adding that character causes an error (and starts erroring with any character), and strangely if I set it to nothing and start a new one after that, I also get the issue. Weird.
Originally Posted by Niya
So what you're asking is how to store the hash values as a String but in a format that uses less space. Is that about right?
Yes.
If I ran into this problem, I'd most likely just write my own hash table implementation so I can use the binary keys as is. A 32 bit hash value can fit in a Long.
There is plenty of information online that tells you how they work and how to implement them. They are surprisingly not that sophisticated. To be honest, I've never had much love for VB6's Collection class and if VB6 were still my main language today, I'd avoid using them in favor of my own hand rolled implementations or something from a 3rd party. I find the Collection class too problematic but that's just me.
C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter
There's just no reason to use garbage like InputBox. - jmcilhinney
The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
As for your idea of using some kind of String compression, the truth is you're never going to find a way of encoding the hash value as a String that is more space efficient than it's binary representation.
I mean think of a number like say, 15. Even if you use a 1 byte per character encoding, it will still take 2 bytes to store that number. However, it could be encoded using only 4 bits which is 4 times smaller than it's String representation. You will always run into issues like this when encoding binary data as a String.
C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter
There's just no reason to use garbage like InputBox. - jmcilhinney
The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber
Yea, that could work but it will have to be modified as it uses String keys as is. Might be as simple as changing the internals from String to Long where keys are concerned.
C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter
There's just no reason to use garbage like InputBox. - jmcilhinney
The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by PlausiblyDamp
Why use a string for the hash value? Could you not just use the underlying integer or long value of the hash?
I'm using 32 Hexadecimal characters for the hash, they occupy 64 bytes in memory each hash, and they come from a 16 bytes binary hash (16 bytes binary).
I don't know how to use an numerical value in VB6 for such a huge number.
I would like to know, I guess there could be possible to have some collection with good performance and 16 bytes binary keys.
I made some testings but I'm still not sure if I could use smaller hashes without going into collisions, I'm still testing but these tests take a lot of time.
Originally Posted by Niya
A 32 bit hash value can fit in a Long.
It was 32 ASCII chars (64 bytes), not 32 "bits".
There must be a science about how long the Hash should be for being quite secure there won't be collisions (unless you win the lottery).
Originally Posted by Niya
If I ran into this problem, I'd most likely just write my own hash table implementation
The only problem of doing everything is that everything takes time.
This implementation can be further optiimized (regarding mem-usage),
when you restrict the two at Class-Level defined Array-Variables:
- mKeys() and mValues() (which are currently Variant)
- to specific types like e.g. mKeys() As String, mValues() As Long
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Niya
from String to Long
You must be joking. Better then to use a counter instead of a hash.
The hashes must match the path of files in the file system.
Then from the hashed path, I can get the data of that file that was previously saved withe the hash as the key.
If I worked with the full path, there would be much worse space and performance problems.
How do you think you could do that with 32 bits hashes without having collisions? (unless you are TheImp)
This implementation can be further optiimized (regarding mem-usage),
when you restrict the two at Class-Level defined Array-Variables:
- mKeys() and mValues() (which are currently Variant)
- to specific types like e.g. mKeys() As String, mValues() As Long
Olaf
OK, yes, thank you for the info. I already had found that the problem was in the VB collection (wasn't sure if there were others issues somewhere else too, the 'single quotes in the SQL query must be escaped but I already have done that).
I'll have to move to that class then.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
I installed the cHash.
It seems that the SQLite database don't like those keys in the Text field.
Like 1/4 of the keys are not found (after they are stored).
Tried setting the field as BLOB type but does not work either, the problem is different, like the keys come back corrupted.
I think I'll have to convert to Hex ASCII or Base64 the keys that go to the database and then back to 16 bytes after retrieval.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Ok, I re-read all the posts in the thread more carefully and put some thought into this. You need to encode 16 bytes as a String but the problem is that a straight copy to a String buffer would end up producing characters that won't work with as a Collection key. This means you will have to encode the binary data as a String using something like Base64. However, this has the effect of bloating the data. It would take more than 16 bytes to store 16 bytes of data. I think this about sums up your problem.
There are three solutions as I see it:
One, would be find or invent and encoding format that produces a String that is usable as a Collection key while not bloating up the number of bytes required to store it. This might be possible, but I don't think it's worth the effort.
Secondly, you could use a different collection class like Olaf's cHashList that may be able to handle weird Unicode code points as keys. This is my favored solution.
Third, you use a different hashing algorithm like CRC32 to produce small 32 bit hashes. Given that CRC32 was invented more for error checking, it's probably not suitable if you need strong collision resistance. I would prefer this one if the collision resistance of CRC32 were acceptable.
C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter
There's just no reason to use garbage like InputBox. - jmcilhinney
The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
I already have to function to make the hashes, it is HashData.
OK, it is already working fine with cHashD. The hashes are Binary data stored in Strings in the VB6 part and converted to Hexadecimal ASCII for the database part.
BTW, I coded this couple of functions to make those conversions:
Do you have a recommendation, or know of an API for this task, that (maybe) could be better performant?
Code:
Public Function StringToHex(nData As String) As String
Dim iBytes() As Byte
Dim c As Long
Dim iStr As String
iBytes = nData
For c = 0 To UBound(iBytes)
iStr = Hex$(iBytes(c))
If Len(iStr) = 1 Then
iStr = "0" & iStr
End If
StringToHex = StringToHex & iStr
Next c
End Function
Public Function HexToString(nHex As String) As String
Dim iBytes() As Byte
Dim c As Long
Dim c2 As Long
Dim iLen As Long
Dim iChar As String
iLen = Len(nHex)
If iLen Mod 2 = 1 Then Err.Raise 5 ' must be even
ReDim iBytes(iLen \ 2 - 1)
c2 = -1
For c = 1 To iLen Step 2
c2 = c2 + 1
iBytes(c2) = Val("&H" & Mid$(nHex, c, 2))
Next
HexToString = iBytes
End Function
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Still I'm not very satisfied with this approach, I see that I'm having a performance hit converting to hex back and forth all the time.
I'll investigate a bit what characters have problem with the database Text format. Then I come back to the OP.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
Still I'm not very satisfied with this approach, I see that I'm having a performance hit converting to hex back and forth all the time.
I'll investigate a bit what characters have problem with the database Text format. Then I come back to the OP.
Just define your SQLite HashField properly as Blob (instead of Text).
And then ensure, that in cour SQLite Command or Cursor-Objects,
you pass a ByteArray as the Parameter (to the Cursor-Class) -
this will entirely avoid any Hex-conversions.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
I moved the code to the new version, unchunked, using a Cursor now, quite similar as the original.
That part seems to be working apparently fine (I still couldn't test much), but I get an error. I'll describe the situation:
I open one database connection in a bas module and there I handle one table, that is a Configuration table where I store setting (as a SaveSetting replacement).
That is done from a cRecordset, named mRecSettings that is opened at the program start and remains open until the program closes.
In a class module I load all the heavy data from the Files table (on demand, when needed). There I open another cConnection, a cCursor and a cCommand.
When I start loading data I start a transaction and when I finish I close it.
I found before this same error that I'm having not, so I added properties to that class (that has global scope) and from the SaveSetting replacement procedure, when I need to update or add a new value to the Configuration table, I check if there is a transaction going on in the class module, and in that case End the transaction to be able to use this cRecordset.
But at this time I see that there is no transaction going on in the class, the .TransactionStackCounter property also reports 0, but I get the error anyway:
The code at the procedure where I get the error is (in the bas module):
Code:
Public gDataBase As cConnection
Private mRecSettings As cRecordset
Code:
Public Sub SaveSettingDB(AppName As String, Section As String, Key As String, Setting As Variant, Optional Default)
Dim iStr As String
Dim iInTrans As Boolean
If Not IsMissing(Default) Then
If CStr(Default) = Setting Then
DeleteSettingDB AppName, Section, Key
Exit Sub
End If
End If
If Not gImageDataLoader Is Nothing Then
iInTrans = gImageDataLoader.DatabaseIsInATransaction
If iInTrans Then gImageDataLoader.DatabasePauseTransaction
End If
iStr = AppName & "_" & Section & "_" & Key
If Not mRecSettings.FindFirst("Key = '" & iStr & "'") Then
mRecSettings.AddNew
mRecSettings!Key = iStr
End If
mRecSettings!value = Setting
mRecSettings.UpdateBatch
If iInTrans Then
gImageDataLoader.DatabaseResumeTransaction
End If
End Sub
The code for the connection that is executed at the program start is:
Code:
Public Sub OpenDatabaseGeneral()
If gDataBase Is Nothing Then
On Error Resume Next
Set gDataBase = New_c.Connection(DatabasePath)
Set mRecSettings = gDataBase.OpenRecordset("SELECT * FROM Settings")
If Err.Number Then
Set mRecSettings = Nothing
Set gDataBase = Nothing
Kill DatabasePath
On Error GoTo 0
If Not FileExists(DatabasePath) Then
CreateDatabase
End If
On Error Resume Next
Set gDataBase = New_c.Connection(DatabasePath)
Set mRecSettings = gDataBase.OpenRecordset("SELECT * FROM Settings")
If Err.Number Then
MsgBox "Error opening cache database: " & Err.Number & " " & Err.Description, vbCritical
End If
End If
End If
End Sub
The error occurs at the line mRecSettings.UpdateBatch
PS: this error already happened with the previous version.
Last edited by Eduardo-; Aug 1st, 2022 at 11:22 PM.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
My guess is, that you use DoEvents somewhere in your Apps "larger loops" -
is that the case?
Also, when gDataBase is a true global Variable (in a *.bas-Module),
you should use it "everywhere in your App" -
(and not open another cConnection on the same DB, in another Project-Private Class).
Do you have a small (complete) example, which reproduces that error-message?
Also, the code for:
- gImageDataLoader.DatabaseIsInATransaction
- gImageDataLoader.DatabasePauseTransaction
- gImageDataLoader.DatabaseResumeTransaction
would be nice to see
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Thanks.
Originally Posted by Schmidt
My guess is, that you use DoEvents somewhere in your Apps "larger loops" -
is that the case?
It wasn't any DoEvents involved when the error happened.
Anyway I don't think it has anything to do.
Originally Posted by Schmidt
Also, when gDataBase is a true global Variable (in a *.bas-Module),
you should use it "everywhere in your App" -
In this way I can handle the connections as needed. I mean, I know when to open and when to close each connection. One is global but may be closed too soon for the other needs.
That's what classes are useful for, to enclose all needed there with its own timing, isolating from other things.
Originally Posted by Schmidt
(and not open another cConnection on the same DB, in another Project-Private Class).
So I can open just one connection at a time?
Originally Posted by Schmidt
Do you have a small (complete) example, which reproduces that error-message?
Also, the code for:
- gImageDataLoader.DatabaseIsInATransaction
- gImageDataLoader.DatabasePauseTransaction
- gImageDataLoader.DatabaseResumeTransaction
would be nice to see
Olaf
You already said if I understood correctly that only one connection is possible.
Is that a restriction of RC6 or SQLite?
Public Property Get DatabaseIsInATransaction() As Boolean
DatabaseIsInATransaction = mDataBase.TransactionStackCounter > 0
End Property
Public Sub DatabasePauseTransaction()
mDataBase.CommitTrans
End Sub
Public Sub DatabaseResumeTransaction()
mDataBase.BeginTrans
End Sub
mDataBase is the cConnection
It is opened in the Class_Initialize and closed in the Class_Terminate
Last edited by Eduardo-; Aug 2nd, 2022 at 08:00 AM.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
Code:
Public Property Get DatabaseIsInATransaction() As Boolean
DatabaseIsInATransaction = mDataBase.TransactionStackCounter > 0
End Property
Public Sub DatabasePauseTransaction()
mDataBase.CommitTrans
End Sub
Public Sub DatabaseResumeTransaction()
mDataBase.BeginTrans
End Sub
mDataBase is the cConnection
It is opened in the Class_Initialize and closed in the Class_Terminate
Note that this is just a protection that I added intended to get rid of that error, but if multiple connections are allowed it should not be necessary.
Also, another thing to consider, is that I'm working with a completely different table from this connection and this cRecordset.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Schmidt
Do you have a small (complete) example, which reproduces that error-message?
I'm trying to figure the situation when it happens, and I could make it happen again but now the error changed (I think at least once I got this error instead before):
"Too busy to execute SQL statement"
I could not get the capture because the second time I resumed the execution I got the other one:
"Cannot start a transaction within a transaction". (The same posted above in the capture)
And in this case there is a DoEvents involved, because I in purpose closed the program when it was loading images:
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by wqweto
This is a huge problem. It's a very bad design for (system) components to spin thread messages which usually is wreaking havoc on client applications.
For instance WinHttpRequest and ServerXMLHTTP (both WinHTTP based) sometime spin messages which leads to catastrophic reentrancy in client apps.
In a server apps this is rearly a problem as these are not event driven per se. Some of these server apps don't have (or need) a message pump at all.
cheers,
</wqw>
What do you exactly mean by "spin messages"?
But come on wqweto, you must know that there is not other way in long loops to keep the program responsive and allow user actions since VB6 is single threaded.
It is not a problem once reentrancy is handled (as I off course do).
Yes, there is another way: using timers. That really would make it a lot more complex, a lot slower, and will have the very same problem (regarding the database).
I don't know why you guys are so afraid of DoEevents really. I make programs that seem to be multithreaded just by using DoEvents, can't be done in other way, or it would be a lot cumbersome and slow using timers (no way).
I've been programming this way for years already (when needed, of course).
Yes, you need to put some flags, not to access controls or form properties when the form is already unloaded, I think I have no need to explain that to you because you must understand that very well already.
Let other programmers be "DoEvents-fobic", but not you!
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
I'm trying to figure the situation when it happens, and I could make it happen again but now the error changed (I think at least once I got this error instead before):
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
I don't know why you guys are so afraid of DoEevents really.
Because when calling MyConn.Execute for instance I'm not prepared to handle Form_Unload *while* inside the method call i.e. I didn't put up special flags before each and every method call so to be able to cancel my form being unloaded, or "swallow" starting actions on clicks on any command button.
It is so much simpler to assume linear execution of the code with no reentrancy, because everything else leads to madness.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
I don't know why you guys are so afraid of DoEevents really.
Because you have to really know:
- when to call it
- but especially "from where" to call it
I'd certainly not call it from within a Cnn.BeginTrans... Cnn.CommitTrans-CodeBlock.
IMO you are scanning for ImageFiles (across "entire local disks") -
a task which is time-consuming "on its own" already (even without any fast DB-Transaction-Handling).
So, why not using a simple "re-shelling" of your current Executable with a Command-Parameter -
which spawns an entire new (Worker-)Process for your larger ImageFileScans...?
These secondary "Worker-Processes" could be easily separated from the GUI-Process
(which is detected via a simple Command$=="", in that case show the GUI...).
You could even shell multiple Scanner-Processes this way...
(one for each local Drive-letter, because this makes sense FileScan-wise).
And in this case (multiple local Drives, multiple Scan-Processes), I'd even ease the burden
of "parallel writes" to the SQLite-Cache-DB, by creating a separate DB for each DriveLetter:
- either as ImgCache_C.db3, ImgCache_D.db3, a.s.o.
- or in a separate DBCache-SubFolder below your App.Path .\C\ImgCache.db3, .\D\ImgCache.db3
And your AppSettings.db3 could sit in its own File, directly (as usual) in your App.Path.
If you'd do so, you'd get:
- an always responsive GUI-Process (without any DoEvents needed, not even in the WorkerProcesses which run "headless")
- much better FileScan- and analysis-performance (make use of these multiple Cores, man...)
- no transaction-conflicts in this case (with parallel running "WorkerProcesses per Drive", each addressing its own DB-File)
- much better write-speed without concurrency-issues as well...
... (keep in mind, that SQLite allows only a single "WriteTransaction" at any time on a given DB)
How to communicate between the GUI- and Worker-Processes I leave to you,
(e.g. your recent MemMapped-Arrays could be a good candidate for that task).
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Schmidt
Because you have to really know:
- when to call it
- but especially "from where" to call it
I'd certainly not call it from within a Cnn.BeginTrans... Cnn.CommitTrans-CodeBlock.
Why not?
Of course I need to call it from there since it is the time consuming task.
I don't see the point really, since the database system allows multiple connections.
And, I also am Committing that transaction before I attempt any update on the other table (anyway it made not difference)
Originally Posted by Schmidt
IMO you are scanning for ImageFiles (across "entire local disks") -
a task which is time-consuming "on its own" already (even without any fast DB-Transaction-Handling).
No, in that moment the files are already all scanned (before the data loading process begins, and then the transaction begins), and they are just local, not in a network.
Originally Posted by Schmidt
So, why not using a simple "re-shelling" of your current Executable with a Command-Parameter -
which spawns an entire new (Worker-)Process for your larger ImageFileScans...?
I'm already doing exactly that but not for loading the data from the database, but when the data needs to be calculated (when it is still not cached in the database), because that takes more time and then I spawns several workers to get advantage of the several core processors that the CPU might have.
To load a normal amount of file-data from the database (when the data is already cached) is not very time consuming.
In the case I got the error I was testing with about 400,000 files, and the data is loaded in some seconds (there I click to unload the form and get the error, that BTW didn't happen with DAO because it was working exactly in the same way)
Originally Posted by Schmidt
These secondary "Worker-Processes" could be easily separated from the GUI-Process
(which is detected via a simple Command$=="", in that case show the GUI...).
You could even shell multiple Scanner-Processes this way...
Scanning is quite fast, well... relatively fast.
In normal cases, for example those 400,000 files, it takes like 20 seconds only. (OK, it is a SDD M.2). I'm throwing numbers from the top of my head, maybe not exact but can give an idea of roughly the magnitude.
And 400,000 files wouldn't be so a normal case anyway, a normal case would be 1000 to 50,000 files I guess.
But I considered to do some multiprocess when there are millions of files (less normal case ). Maybe in another version, if someone asks.
I tested with 12,000,000 files and the scanning lasted like 30 minutes.
The load from database (already cached) also took a lot (when using the chunked approach), but I could not test still with the new version. I first need to run and cache all the data, and when the data is not cached, for the 12,000,000 files it takes like 3 hours (yes, multiprocessing), and because I'm testing mainly with icons, otherwise with larger images that would take more.
And the old database won't serve because it had the hashes in hexadecimal and now it works with 16-byte hashes binary (and the field is a BLOB, as you suggested).
But the reason that it wasn't very fast with so much files was mainly because instead of VB6 arrays I need to work with Large arrays.
Windows itself is not very fast, if I right click for Properties on the directory where I have copied all these files, it takes a good time to fill the information too (not 30 minutes but several minutes, I didn't time it).
Originally Posted by Schmidt
(one for each local Drive-letter, because this makes sense FileScan-wise).
That would be the problem, how to split the search, because we don't know upfront where the files are.
But this issue is entirely something else not related at all with this error I'm having (it is not about searching files).
Originally Posted by Schmidt
And in this case (multiple local Drives, multiple Scan-Processes), I'd even ease the burden
of "parallel writes" to the SQLite-Cache-DB, by creating a separate DB for each DriveLetter:
- either as ImgCache_C.db3, ImgCache_D.db3, a.s.o.
- or in a separate DBCache-SubFolder below your App.Path .\C\ImgCache.db3, .\D\ImgCache.db3
And your AppSettings.db3 could sit in its own File, directly (as usual) in your App.Path.
If you'd do so, you'd get:
- an always responsive GUI-Process (without any DoEvents needed, not even in the WorkerProcesses which run "headless")
- much better FileScan- and analysis-performance (make use of these multiple Cores, man...)
- no transaction-conflicts in this case (with parallel running "WorkerProcesses per Drive", each addressing its own DB-File)
- much better write-speed without concurrency-issues as well...
... (keep in mind, that SQLite allows only a single "WriteTransaction" at any time on a given DB)
How to communicate between the GUI- and Worker-Processes I leave to you,
(e.g. your recent MemMapped-Arrays could be a good candidate for that task).
Olaf
I don't think that separating by drive is a good idea. People tend to put all the media on one place.
I'm trying to build a sample project with the problem.
Last edited by Eduardo-; Aug 2nd, 2022 at 12:10 PM.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Here is the sample project.
The first time, run, click start, wait to finish, close the form. At that time the database is filled with the data that is now cached.
The second time is the problem.
Run again. Click start. Unload the form from the X.
It does not need to be within a DoEvents, you can click to unload the form when the process has already finished and the same error will happen.
So, you can test clicking to unload the form in the middle of the process, or when it finished.
In the first case, it will come from a DoEvents, in the second case it won't. In both cases the same error.
The connection to the database is still open in the class at that time.
If I set both references to the class instance to Nothing, it Terminates, and so closes the connection and the error won't happen.
Re: Convert binary data to a valid Unicode string whitout wasting (much) space ❓
Originally Posted by Eduardo-
Here is the sample project.
Thanks.
Whilst I take a look, you could take a look at my "minimal-demo" with regards to FileScans as well...
(even if your approach is different, maybe you can steal some stuff from it).
It follows your "separate-connection" in the cImageDataLoader-Class -
and even uses "DoEvents" within the "Transacted-Routine" itself
(with proper re-rentrancy-protection in the Form of course).
The Project needs a Module, a Form and a Class, and starts from Sub Main:
modMain.bas
Code:
Option Explicit
Public gDBFile As String, gCnn As cConnection, gImageDataLoader As cImageDataLoader
Private mRecSettings As cRecordset
Sub Main()
gDBFile = App.Path & "\ImageCache.db3"
Set gCnn = OpenCnn(gDBFile)
Set mRecSettings = gCnn.OpenRecordset("Select * From Settings")
SaveSettingDB App.EXEName, "MySection", "MyKey1", "SomeSetting1"
SaveSettingDB App.EXEName, "MySection", "MyKey2", "SomeSetting2"
Set gImageDataLoader = New cImageDataLoader
Form1.Show
End Sub
Public Function OpenCnn(DBFile As String) As cConnection
If New_c.FSO.FileExists(DBFile) Then
Set OpenCnn = New_c.Connection(DBFile)
Else
Set OpenCnn = New_c.Connection(DBFile, DBCreateNewFileDB)
OpenCnn.Execute "Create Table Settings(Key Text Primary Key Collate NoCase, Value Text) Without RowID"
OpenCnn.Execute "CREATE TABLE Files (PathHash Blob Primary Key, FileLen Int Default 0 NOT NULL, FileDate Real Default 0 NOT NULL, ImageData Blob NOT NULL) Without RowId"
End If
End Function
Public Sub SaveSettingDB(AppName As String, Section As String, Key As String, Setting As Variant)
Dim iStr As String, iInTrans As Boolean
iStr = AppName & "_" & Section & "_" & Key
If Not mRecSettings.FindFirst("Key = '" & iStr & "'") Then
mRecSettings.AddNew
mRecSettings!Key = iStr
End If
mRecSettings!Value = Setting
mRecSettings.UpdateBatch
End Sub
cImageDataLoader.cls
Code:
Option Explicit
Private Declare Sub HashData Lib "shlwapi" (ByVal pSrc&, ByVal SrcBytes&, ByVal pDst&, ByVal DstBytes&)
Private mCnn As cConnection
Public TotalFilesScanned As Long
Private Sub Class_Initialize()
Set mCnn = OpenCnn(gDBFile) 'though better would be, to use the gCnn here (when it's the same DB)
End Sub
Public Sub DoFileComparisonsForDirectory(D As cDirList)
Dim oCur As cCursor, oUpd As cCommand, oIns As cCommand 'a Cursor, an Update- and an Insert-Object
Dim i&, FP$, FH(0 To 15) As Byte, pH&, FL&, DT As Date, BD() As Byte 'helper-variables for looping and buffering
mCnn.BeginTrans
Set oCur = mCnn.CreateCursor("Select FileLen, FileDate, ImageData From Files Where PathHash=?")
Set oUpd = mCnn.CreateCommand("Update Files Set FileLen=?, FileDate=?, ImageData=? Where PathHash=?")
Set oIns = mCnn.CreateCommand("Insert Into Files(PathHash, FileLen, FileDate, ImageData) Values(?,?,?,?)")
pH = VarPtr(FH(0))
For i = 0 To D.FilesCount - 1
TotalFilesScanned = TotalFilesScanned + 1
FP = LCase$(D.Path & D.FileName(i)) 'get the full, current filepath from the DirList-Object
HashData StrPtr(FP), LenB(FP), pH, 16 'and calculate the current FilePathHash in a ByteArray
oCur.SetBlobPtr 1, pH, 16 'set the Cursor-Search-Param
If TotalFilesScanned Mod 256 = 0 Then
mCnn.CommitTrans
Form1.Caption = TotalFilesScanned & " " & FP
SaveSettingDB App.EXEName, "MySection", "LastScannedFolder", FP 'just to show that this works
DoEvents 'allow the GUI, to reflect Progress-Infos and stuff
mCnn.BeginTrans
End If
If oCur.Step Then '<- if True, then a File-Entry with this Hash-Value exists in the DB-Table
FL = oCur.ColVal(0): DT = oCur.ColVal(1) 'so, let's read two values out for the following comparison
If FL = D.FileSize(i) And DT = D.FileLastWriteTime(i) Then 'the Cache-entry is valid...
BD = oCur.ColVal(2) '... so we can use the Cached Image_Data of the found record
Else 'we need to update the Cache-Entry in the DB-Table (the PathHash is the same, but FileLen and FileDate differ)
BD = "re-calculate the image-data-comparison-blob somewhere (accessing the FileSystem directly via Flist(i).FileName)"
oUpd.SetInt32 1, D.FileSize(i)
oUpd.SetDouble 2, D.FileLastWriteTime(i)
oUpd.SetBlob 3, BD 'store the new calculated Image_Data-Blob
oUpd.SetBlobPtr 4, pH, 16 'this last-Param is for the Where-Clause of our Update-Statement
oUpd.Execute
End If
Else 'File-Info is not yet in the DB-Table, so we perform an insert on all the 4 FieldValues
BD = "calculate the image-data-comparison-blob somewhere (accessing the FileSystem directly via Flist(i).FileName)"
oIns.SetBlobPtr 1, pH, 16
oIns.SetInt32 2, D.FileSize(i)
oIns.SetDouble 3, D.FileLastWriteTime(i)
oIns.SetBlob 4, BD 'store the new calculated Image_Data-Blob
oIns.Execute
End If
DoSomethingWith BD '<-BD can be used now (it either came from the DB-Cache, or was freshly recalculated and cached)
Next
mCnn.CommitTrans 'commit the changes in chunks (one Directory-cDirList-Object at a time)
End Sub
Private Sub DoSomethingWith(BD() As Byte)
'some actions, using the ImageCache-BlobData
End Sub
Form1.frm
Code:
Option Explicit
Private StillSearching As Boolean
Private Sub Form_Load()
Caption = "Click Me, to start scanning"
End Sub
Private Sub Form_Click()
FileSearchOn "c:\", "*.jpg;*.png;*.gif;*.bmp"
End Sub
Private Sub FileSearchOn(ByVal Path As String, Filter As String)
If StillSearching Then MsgBox "I'm still searching": Exit Sub
StillSearching = True
gImageDataLoader.TotalFilesScanned = 0
SearchRecursivelyOn Path, Filter
StillSearching = False
Caption = "Search finished"
End Sub
Private Sub SearchRecursivelyOn(Path As String, Filter As String)
Dim D As cDirList, i As Long
On Error Resume Next 'to avoid potential rights-problems on certain folders
Set D = New_c.FSO.GetDirList(Path, dlSortNone, Filter)
On Error GoTo 0
If D Is Nothing Then Exit Sub
If D.FilesCount Then gImageDataLoader.DoFileComparisonsForDirectory D
For i = 0 To D.SubDirsCount - 1
SearchRecursivelyOn D.Path & D.SubDirName(i), Filter
Next
End Sub
Private Sub Form_Unload(Cancel As Integer)
If StillSearching Then MsgBox "I'm still searching": Cancel = 1
End Sub