OK, done what I wanted to do
I know I said "next week when I get some spare time", but
there were no good movies on TV so I had a little tinker
with the Memory Mapped Files.
Where I think they will be useful (in general) is:
Speed: Less than 100ms to map then file as Memory.
Sharing: Different applications can share the memory
(which happens to be stored in a file).
Where I think it could help those applications where large
files are involved is that you would not need to load the
entire file into memory. This is a bad idea in general
unless you know that your file is always less than a
certian limit. But even then, what if your file is 1GB is
size. Will you ensure you have enough memory (Real and
virtual) to load the file? I think not. I also seriously
doubt that your user would enjoy waiting for a 1GB file to
be read if all he wanted to do was to search for a sequence
of characters.
So, if the only reason balip (or whoever else is
interested) loads the file into memory (in a string) is to
use InStr, then I suspect that a far far better idea is to
use the memory mapped file idea, write or find a dll that
allows you to find a byte sequence in a given memory range
(in fact I am 100% certain this will be built in to
win32API - will take a look soon). Another function needed
is to extract a string from between two memory addresses
(like Mid does already).
Even if you don't do all this, you need to consider using a
byte array for your project instead of building a string.
This is simply because a string of say 1000 characters is
stored internally as unicode which makes it 2000 bytes.
You immediately save 50% of your storage space by using a
byte array instead of a string. The only sacrifice is that
you have less access to some pre-defined VB string handling
code.
If anyone is interested in the sample project I wrote then
email me at [email protected]. This sample is very
simple and uses the api calls to tell VB that a disk file
is REALLY some memory belonging to my application.
The lessons I followed to learn this technique are from Dan
Appleman's Win32 API book which I think every VB developer
must have (or have access to).
Regards
No - Only reads as it needs
I am fairly certain of this fact. The memory mapped file
is only read if someone (that would be me) tried to read
some of the contents of the memory. Using CopyMemory is
the way to go about accessing the memory (and hence the
file is read).
I might continue to look if there is a better approach than
this to the question. If the file needs to be parsed
several times from start to finish then this approach while
it will not need much memory, might end up slower than
another approach.
Cheers
Some results from my testing
OK. I made some big claims about speed and how to load large files into memory or alternatives.
Now I have made a dll (only a VB one mind you) and I have
tested it against InStr.
InStr blows it away as far as speed to find text at the end
of the 8MB file I was using as my sample. I used a VB-
World html page which I then appended to itself multiple
times. Then I added a text string to the end of the 8MB
file.
The purpose was to determine if the overhead in time to
load the file in as a string (not to mention memory limits
or anything else) was worth it.
In my tests, I found that with the 8MB string, InStr
returned the position within 600ms to 1300ms (it varied - I
guess the heap conditions affect it quite a bit). My DLL
which could possibly be tweaked quite a bit, managed a
consistent 1500ms . The tweaking might reduce it to under
1000ms if I am lucky.
So, for each InStr, I would lose by up to 900ms or a factor
of 2.5. I suspect that the factor is the way to measure
the relative difference.
The time taken for me to load the 8MB file in as a string
was around 12s (12000ms). So to save time overall, using
instr, I would have to have about 8000 InStr calls in the
code.
Now, if I can find an easy way (And I thing the FoldString
API does this) to convert a byte array into a VB String
(not a simple translation I fear), then this payoff will be
a great deal lower.
Time to extract a portion of the string using my DLL or MID
were negligible until I started trying for huge return
strings (like 2MB or so). When trying for 4MB, the built
in Mid performed in < 80ms whereas my DLL did it in
5000ms. However the reason here is again in the conversion
to a VB String which can be sped up greatly once I find the
DLL.
More tests should be performed using the same techniques I
use except instead of using the Memory Mapped File, use
normal VB code to binary load a chunk of the file at a
time. I would assume that this would not be as fast but it
is yet to be seen.
Conclusion so far for me is that it is worth investigating
further because of the possibility of a programmer wanting
to deal with a string larger than the available RAM. As
soon as this limit is reached, any programmer would need to
look for another way of dealing with "Strings".
Anyhow - I'm off to do more research.
Cheers
P.S. If this load of dribble is too boring for the thread, let me know and I'll stop posting ;)