Results 1 to 35 of 35

Thread: Importing a very large text file in a String variable

  1. #1

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Exclamation

    Hello friends...

    I'm having an urgent problem here; I'm trying to import a very large text file of +54 MB(!!) into a String variable but each time the machine crashes...

    Does anyone know what the problem is? Normally seen a string variable can contain up to 2 billion characters (says the VB Help)...

    PS. It works fine with large files of 10 to 20 MB.
    PPS. The machines who need to process those files are Pentium III's with 128 MB internal memory...

    Here's the code I use in my program:

    ====================================

    Public Function ImportFile(iMessage As Integer, Optional ByVal fLatestFile As Boolean = False, _
    Optional strFileName As String) As String

    Open sFileName For Input As #1
    ImportFile = Input(lLength, #1) -> Crash!
    Close

    ====================================

    lLength contains a value like this = 55000000

    Is it crashing because of a memory-overflow?

    Please help Me if You know a solution for this...

    Thank You

    Bart

  2. #2
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Try Binary
    Code:
    Open sFileName For Binary As #1 
    Get#1,,ImportFile
    Close
    also if that doesn't work, a byte array takes half less space
    Code:
    Dim buffer() as byte
    Open sFileName For Binary As #1 
    Get#1,,buffer
    Close
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  3. #3

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Unhappy

    Well, it doesn't import anything... the string (ImportFile) is empty. Possibly because the Get statement expects the data being formatted into a sort of record struture. The files that I need to import contain just one long string...

    So I need to use this code again...
    ImportFile = Input(lLength, #1)
    -> Crash

    pfff

    I really don't know how to solve this one...

  4. #4
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    let's try again
    Code:
    Open sFileName For Binary As #1 
    Importfile=Space(lof(1))
    Get#1,,ImportFile
    Close
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  5. #5

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Unhappy

    Oh my...
    nope, doesn't work either.

    The systems crash here
    -> Importfile=Space(lof(1))

    Do You have another suggestion?

    Bye the way, thanks for the help until now

  6. #6
    Frenzied Member Jop's Avatar
    Join Date
    Mar 2000
    Location
    Amsterdam, the Netherlands
    Posts
    1,986
    hey KEDAMAN! nice to see you again man!!!
    Balip, I'm just guessing, try this API thingy... only use the ReadFile API to read the file into a byte-array.

    not sure if it works for you, you don't need the other code, but it may be handy sometime

    Code:
    Const MOVEFILE_REPLACE_EXISTING = &H1
    Const FILE_ATTRIBUTE_TEMPORARY = &H100
    Const FILE_BEGIN = 0
    Const FILE_SHARE_READ = &H1
    Const FILE_SHARE_WRITE = &H2
    Const CREATE_NEW = 1
    Const OPEN_EXISTING = 3
    Const GENERIC_READ = &H80000000
    Const GENERIC_WRITE = &H40000000
    Private Declare Function SetVolumeLabel Lib "kernel32" Alias "SetVolumeLabelA" (ByVal lpRootPathName As String, ByVal lpVolumeName As String) As Long
    Private Declare Function WriteFile Lib "kernel32" (ByVal hFile As Long, lpBuffer As Any, ByVal nNumberOfBytesToWrite As Long, lpNumberOfBytesWritten As Long, ByVal lpOverlapped As Any) As Long
    Private Declare Function ReadFile Lib "kernel32" (ByVal hFile As Long, lpBuffer As Any, ByVal nNumberOfBytesToRead As Long, lpNumberOfBytesRead As Long, ByVal lpOverlapped As Any) As Long
    Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" (ByVal lpFileName As String, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Any, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
    Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
    Private Declare Function SetFilePointer Lib "kernel32" (ByVal hFile As Long, ByVal lDistanceToMove As Long, lpDistanceToMoveHigh As Long, ByVal dwMoveMethod As Long) As Long
    Private Declare Function SetFileAttributes Lib "kernel32" Alias "SetFileAttributesA" (ByVal lpFileName As String, ByVal dwFileAttributes As Long) As Long
    Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long
    Private Declare Function GetTempFileName Lib "kernel32" Alias "GetTempFileNameA" (ByVal lpszPath As String, ByVal lpPrefixString As String, ByVal wUnique As Long, ByVal lpTempFileName As String) As Long
    Private Declare Function MoveFileEx Lib "kernel32" Alias "MoveFileExA" (ByVal lpExistingFileName As String, ByVal lpNewFileName As String, ByVal dwFlags As Long) As Long
    Private Declare Function DeleteFile Lib "kernel32" Alias "DeleteFileA" (ByVal lpFileName As String) As Long
    Private Sub Form_Load()
        'KPD-Team 1998
        'URL: http://www.allapi.net/
        'E-Mail: [email protected]
        Dim sSave As String, hOrgFile As Long, hNewFile As Long, bBytes() As Byte
        Dim sTemp As String, nSize As Long, Ret As Long
        'Ask for a new volume label
        sSave = InputBox("Please enter a new volume label for drive C:\" + vbCrLf + " (if you don't want to change it, leave the textbox blank)")
        If sSave <> "" Then
            SetVolumeLabel "C:\", sSave
        End If
        
        'Create a buffer
        sTemp = String(260, 0)
        'Get a temporary filename
        GetTempFileName "C:\", "KPD", 0, sTemp
        'Remove all the unnecessary chr$(0)'s
        sTemp = Left$(sTemp, InStr(1, sTemp, Chr$(0)) - 1)
        'Set the file attributes
        SetFileAttributes sTemp, FILE_ATTRIBUTE_TEMPORARY
        'Open the files
        hNewFile = CreateFile(sTemp, GENERIC_WRITE, FILE_SHARE_READ Or FILE_SHARE_WRITE, ByVal 0&, OPEN_EXISTING, 0, 0)
        hOrgFile = CreateFile("c:\config.sys", GENERIC_READ, FILE_SHARE_READ Or FILE_SHARE_WRITE, ByVal 0&, OPEN_EXISTING, 0, 0)
        
        'Get the file size
        nSize = GetFileSize(hOrgFile, 0)
        'Set the file pointer
        SetFilePointer hOrgFile, Int(nSize / 2), 0, FILE_BEGIN
        'Create an array of bytes
        ReDim bBytes(1 To nSize - Int(nSize / 2)) As Byte
        'Read from the file
        ReadFile hOrgFile, bBytes(1), UBound(bBytes), Ret, ByVal 0&
        'Check for errors
        If Ret <> UBound(bBytes) Then MsgBox "Error reading file ..."
        
        'Write to the file
        WriteFile hNewFile, bBytes(1), UBound(bBytes), Ret, ByVal 0&
        'Check for errors
        If Ret <> UBound(bBytes) Then MsgBox "Error writing file ..."
        
        'Close the files
        CloseHandle hOrgFile
        CloseHandle hNewFile
        
        'Move the file
        MoveFileEx sTemp, "C:\KPDTEST.TST", MOVEFILE_REPLACE_EXISTING
        'Delete the file
        DeleteFile "C:\KPDTEST.TST"
        Unload Me
    End Sub
    Jop - validweb.nl

    Alcohol doesn't solve any problems, but then again, neither does milk.

  7. #7
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Hi there Jop!

    Balip, maybe you actually ran out of memory and there was not enough harddiskspace to swap to it either?!? Did you get any error messages or did it simply crash vb? Did Jops apicode work?
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  8. #8

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Question

    Goooooooooood Mor-ning Jop and Kedaman!!! ;-)

    Well guy's... I tried Kedaman's code this evil morning, it worked (didn't crash, OLE!) but what do I have to do with the result in the byte-array?

    I need to parse the content of the file as a whole, so I need 1 string where I can search in... can I convert the byte-array one or another way so I get what I want? (Pleaszzzzze say yes?!)

    Jop, It could be that I ran out of memory...
    The systems crash when VB is putting the imported data from the file into the string. This takes a very long time and the harddisk lights are flashing heavily... then, from one moment to the other the system hangs (when I'm in debug mode, if I make an EXE of it the application just crashes but the system doesn't).

    I really thing the systems I need to work with here are up-to-date: 128MB, free harddisk space of +1GB, Pentium III etc.

    Do You guys have other ideas???? The people who hired me are going to kill me if they find out that my backup program can't parse their extreme large text-files...

  9. #9
    Frenzied Member Jop's Avatar
    Join Date
    Mar 2000
    Location
    Amsterdam, the Netherlands
    Posts
    1,986
    Hmm... I tried importing a 25 MB file in a string, Open file for binary....
    It didn't actually crash my system, but it just took to long (5+ minutes) that I decided to CTRL+ALT+DELETE/Kill my app
    But I have an idea, why don't you sell your prog as a Winamp plugin? it sounded great hehe just ship a 100 MB file, read it and winamp is acting cool

    No, but seriously now.
    You need it for a backup program? Why would it need a 50 MB TEXT file? Can't you split it in small parts?

    If you really need that big file, read it in small chunks...
    Or use C++/Assembler.

    < Jop's wondering how progs like mediaplayer opens their very big MPG files without even slowing down >

    < Jop suddenly remembers, Buffering! >

    < Jop still don't know why the harddisk isn't that busy then >

    Im kinda now, how the hell to they do that?


    Have fun anyway
    Jop - validweb.nl

    Alcohol doesn't solve any problems, but then again, neither does milk.

  10. #10

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Unhappy

    Thx Jop

    The solution of reading it in small chunks is just taking too damn long and my system crashes even sooner cause he has to open the string, put a little part against it, read from the file, open etc...

    Too bad I can't program in C++ nor Assembler so I need another solution... I can't tell why the systems just hangs when he is transferring the data into the string. Microsoft says a string can contain up to 2 billion characters... damn Microsoft!

    Do You have an idea how I can convert the byte-array to a string? I need the whole string cause the structure looks like a XML tag, You know with "open" and "close" tags...

    Maybe You have an idea how I can read from a specified location in a file (the result needs to be a string)?

    Thanks for Your help until now!

  11. #11
    Frenzied Member Jop's Avatar
    Join Date
    Mar 2000
    Location
    Amsterdam, the Netherlands
    Posts
    1,986
    You *can* read it in a byte array? this code works for me:

    Code:
    'After loading in Byte Array B()
    Dim x as long, str as String
    For x = LBound(B) to UBound(B)
        str = str & B(x)
    next x
    Hope that helps!!!
    Jop - validweb.nl

    Alcohol doesn't solve any problems, but then again, neither does milk.

  12. #12
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Hey, guys, btw jop, i finally got my connection!!!
    Code:
    Importfile=StrConv(buffer, vbFromUnicode)
    Balip, yep it's that easy and fast to convert a byte array to astring but, hmm it may crash if you don't have the memory needed again but hope it works
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  13. #13

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Thumbs up

    Well Guys,

    I made it work

    Apparently VB can't read 1 string of +40.000.000 characters in one time (pff)
    So I opened the file in MultiEdit (very good text editor!) and saved it under the same name... When I opened the file again, MultiEdit had added some Line Feeds/Line Breaks to it and guess what... VB can import the file!

    In VB I used this code to import it cause the system keeps crashing when You try to import the data in a whole like I did before (see the code I typed before)...

    ==========
    Public Function ImportFile(...) as String
    ...
    Open sFileName For Binary As #1
    On Error GoTo ErrorHandler

    While Not EOF(1)
    Line Input #1, strTmp
    strTmpData = strTmpData & strTmp

    If Len(strTmpData) >= 2000000 Then
    If Len(strFileData) < 20000000 Then
    strFileData = strFileData & strTmpData
    Else
    ImportFile = ImportFile & strFileData
    strFileData = ""
    End If

    strTmpData = ""
    End If
    Wend

    If strTmpData <> "" Then
    strFileData = strFileData & strTmpData
    End If

    ImportFile = ImportFile & strFileData
    strFileData = ""
    strTmpData = ""
    strTmp = ""
    Close
    ...
    End Function
    ============

    Maybe You guys know something to speed it up? Right now it's taking up 10 to 15 min. to import the data...

    Ok, this worked, but now I'm suffering with the fact that the processing of the string (search processes = InStr()) slowed down so much that You can write an MP3 song by hand and still be ready before him! (

    But, lucky me, their normal system (PDMAIN = IBM/Oracle) crashed also due to the heavy load of data from those files (they growed again to +/- 65MB!) and maybe they are going to cut those files in pieces or remove a lot of data in it...

    So, thanks for all Your help, someday my program will run like it was ment to be

    Bye,
    Bart



  14. #14
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    balip, don't use line input, if you want mý advice, use binary and get it in chunks instead

    Did you try out strconv?
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  15. #15

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Question

    Hi Kedaman

    To get the binary input in chunks I'll have to use this code, wright?

    Code:
    Open sFileName For Binary As #1 
    Get#1,,ImportFile
    Close
    and then I convert it to ASCII by using this code?

    Code:
    Importfile=StrConv(buffer, vbFromUnicode)
    By what do I need to replace the "buffer" variable?
    And what's the real difference between using LineInput and the Get statement?

    Geezzz, never worked with binary input before...

    I have a strong feeling You can help me with this code, isn't it???

    PS. Do You have another idea how I can search in a --LARGE-- string cause InStr() is just taking sooooooooooooooo long if You put it in loop...?

    Bye for now and thx for the reply

  16. #16
    Frenzied Member Jop's Avatar
    Join Date
    Mar 2000
    Location
    Amsterdam, the Netherlands
    Posts
    1,986
    To replace the buffer:

    Code:
    Importfile=StrConv(Importfile, vbFromUnicode)
    Does that help?

    hehe and Line Input does just what is says, it read's the file line-by-line, so it's a bit slow.

    And the Binary thing get's X bytes at a time.
    Jop - validweb.nl

    Alcohol doesn't solve any problems, but then again, neither does milk.

  17. #17
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Code:
    Dim buffer As String, Importfile As String, chunksize As Long
    chunksize = 65536
    buffer = Space(chunksize)
    Open strFile For Binary As #1
        Do While LOF(1) - Loc(1) > chunksize
            Get #1, , buffer
            Importfile = Importfile & buffer
        Loop
        buffer = Space(LOF(1) - Loc(1))
        If Len(buffer) Then
            Get #1, , buffer
            Importfile = Importfile & buffer
        Else: End If
    Close #1
    Binary chunk reading is a bit more complicated than that, you read small parts, in this case 65536 bytes per chunk and then add them upin importfile, which may prevent a huge buffer, that may be what caused your problem. As jop explained, Line input is much slower, and that's because it has to check each byte for a linefeed to cut off the reading + it has to format the data for the variable that you read. Binary reading just get's the raw data from the file directly into the variable memory.
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  18. #18
    New Member
    Join Date
    Mar 2000
    Posts
    13
    Hi,
    interresting reading...

    Run into a problem with a little app and are now searching for quicker ways to open a text-file, split every line, search and replace some swedish signs, and finally put it in a text box.

    Right now it takes minutes to open a file on 89 kB....

    Any suggestions?
    /Tote
    VB6 SP4

  19. #19
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Open the file in Binary, get the data into one string and use replace to replace the words, then put it in the textbox, why do you need to split them anyway?
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  20. #20
    New Member
    Join Date
    Mar 2000
    Posts
    13
    I'm trying to open it binary, but I only get the first 65536 characters, so it seems like the buffersize can't be bigger.

    I'm using this code:
    ////
    Open strFile For Binary As ff
    buffer = Space(LOF(1))
    Text2.Text = LOF(1) 'just for control use
    Get #ff, , buffer
    Close ff
    Text1.Text = buffer
    ////

    In Text2 I get a value > 65536.

    Why I want to split every line?
    Well, it's a app that converts text-files...

    Original file example:
    Name: Doe John
    Phone: 555-1234
    --

    Name: Doe Jane
    Phone: 555-7890
    --


    And need it to look like this:
    John;Doe;555-1234
    Jane;Doe;555-7890
    /Tote
    VB6 SP4

  21. #21
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    well, replace the linefeed with ; instead then
    No, the textbox can't contain more than 65536 bytes, and no nullchars allowed either. You do the split after replacing.
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  22. #22
    New Member
    Join Date
    Mar 2000
    Posts
    13
    Ok, then I have to use a richtextbox.
    I show the file after replace but before the split, now it opens in seconds.
    My only excuse is that in Sweden it's 2:30 am right now...

    Thanx for the help kedaman

    /Tote
    VB6 SP4

  23. #23
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Ah, well it's almost 4 AM here in Finland, hur håller man sig vaken egentligen?
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  24. #24
    New Member
    Join Date
    Mar 2000
    Posts
    13
    Kaffe...
    /Tote
    VB6 SP4

  25. #25
    Hyperactive Member
    Join Date
    Jun 2000
    Location
    Auckland, NZ
    Posts
    411

    Better late than never?

    I know this thread is approaching the end, but it is
    interesting to me so I thought I'd look into it next week
    during any spare time. It may amount to nothing, but
    in case I find a solution along the path I am going to
    investigate, I will let you know.

    I am interested in researching "Memory Mapped Files". to
    quote Dan Appleman from his fine book, "There is really no
    difference between a file and memory. Ah, I know what
    you’re thinking: Surely such a statement is the product of
    a hallucination. We all know that these are two different
    things.

    All of the material presented here is copyrighted by either
    Desaware or Macmillan. No part of this material may be used
    or reproduced in any fashion (except in brief quotations
    used in critical articles and reviews) without prior
    consent."

    So you see, there may in fact be no need to open the file
    and read the string at all. Since the file exists on disk,
    and you may wish to open a file up to a size larger than
    your available RAM (including swap) then it makes sense to
    try this set of API calls.

    If anyone can beat me to it, then post here so I don't
    run around re-inventing the wheel.

    Thanks
    Paul Lewis

  26. #26
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Paul, youre right, variables are just connected to RAM by the Virtual Machine and to the harddisk by Virtual memory, but never heard about Memory Mapped Files.
    Hmm, also i think it actually reads the files into memory every time you read the swap file, and that's why "swap", or isn't it?

    [Edited by kedaman on 10-15-2000 at 06:09 AM]
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  27. #27
    Hyperactive Member
    Join Date
    Jun 2000
    Location
    Auckland, NZ
    Posts
    411

    OK, done what I wanted to do

    I know I said "next week when I get some spare time", but
    there were no good movies on TV so I had a little tinker
    with the Memory Mapped Files.

    Where I think they will be useful (in general) is:
    Speed: Less than 100ms to map then file as Memory.
    Sharing: Different applications can share the memory
    (which happens to be stored in a file).

    Where I think it could help those applications where large
    files are involved is that you would not need to load the
    entire file into memory. This is a bad idea in general
    unless you know that your file is always less than a
    certian limit. But even then, what if your file is 1GB is
    size. Will you ensure you have enough memory (Real and
    virtual) to load the file? I think not. I also seriously
    doubt that your user would enjoy waiting for a 1GB file to
    be read if all he wanted to do was to search for a sequence
    of characters.

    So, if the only reason balip (or whoever else is
    interested) loads the file into memory (in a string) is to
    use InStr, then I suspect that a far far better idea is to
    use the memory mapped file idea, write or find a dll that
    allows you to find a byte sequence in a given memory range
    (in fact I am 100% certain this will be built in to
    win32API - will take a look soon). Another function needed
    is to extract a string from between two memory addresses
    (like Mid does already).

    Even if you don't do all this, you need to consider using a
    byte array for your project instead of building a string.
    This is simply because a string of say 1000 characters is
    stored internally as unicode which makes it 2000 bytes.

    You immediately save 50% of your storage space by using a
    byte array instead of a string. The only sacrifice is that
    you have less access to some pre-defined VB string handling
    code.

    If anyone is interested in the sample project I wrote then
    email me at [email protected]. This sample is very
    simple and uses the api calls to tell VB that a disk file
    is REALLY some memory belonging to my application.

    The lessons I followed to learn this technique are from Dan
    Appleman's Win32 API book which I think every VB developer
    must have (or have access to).

    Regards
    Paul Lewis

  28. #28
    Hyperactive Member
    Join Date
    Jun 2000
    Location
    Auckland, NZ
    Posts
    411

    No - Only reads as it needs

    I am fairly certain of this fact. The memory mapped file
    is only read if someone (that would be me) tried to read
    some of the contents of the memory. Using CopyMemory is
    the way to go about accessing the memory (and hence the
    file is read).

    I might continue to look if there is a better approach than
    this to the question. If the file needs to be parsed
    several times from start to finish then this approach while
    it will not need much memory, might end up slower than
    another approach.

    Cheers
    Paul Lewis

  29. #29
    Hyperactive Member
    Join Date
    Jun 2000
    Location
    Auckland, NZ
    Posts
    411

    Some results from my testing

    OK. I made some big claims about speed and how to load large files into memory or alternatives.

    Now I have made a dll (only a VB one mind you) and I have
    tested it against InStr.

    InStr blows it away as far as speed to find text at the end
    of the 8MB file I was using as my sample. I used a VB-
    World html page which I then appended to itself multiple
    times. Then I added a text string to the end of the 8MB
    file.

    The purpose was to determine if the overhead in time to
    load the file in as a string (not to mention memory limits
    or anything else) was worth it.

    In my tests, I found that with the 8MB string, InStr
    returned the position within 600ms to 1300ms (it varied - I
    guess the heap conditions affect it quite a bit). My DLL
    which could possibly be tweaked quite a bit, managed a
    consistent 1500ms . The tweaking might reduce it to under
    1000ms if I am lucky.

    So, for each InStr, I would lose by up to 900ms or a factor
    of 2.5. I suspect that the factor is the way to measure
    the relative difference.

    The time taken for me to load the 8MB file in as a string
    was around 12s (12000ms). So to save time overall, using
    instr, I would have to have about 8000 InStr calls in the
    code.

    Now, if I can find an easy way (And I thing the FoldString
    API does this) to convert a byte array into a VB String
    (not a simple translation I fear), then this payoff will be
    a great deal lower.

    Time to extract a portion of the string using my DLL or MID
    were negligible until I started trying for huge return
    strings (like 2MB or so). When trying for 4MB, the built
    in Mid performed in < 80ms whereas my DLL did it in
    5000ms. However the reason here is again in the conversion
    to a VB String which can be sped up greatly once I find the
    DLL.

    More tests should be performed using the same techniques I
    use except instead of using the Memory Mapped File, use
    normal VB code to binary load a chunk of the file at a
    time. I would assume that this would not be as fast but it
    is yet to be seen.

    Conclusion so far for me is that it is worth investigating
    further because of the possibility of a programmer wanting
    to deal with a string larger than the available RAM. As
    soon as this limit is reached, any programmer would need to
    look for another way of dealing with "Strings".

    Anyhow - I'm off to do more research.
    Cheers

    P.S. If this load of dribble is too boring for the thread, let me know and I'll stop posting
    Paul Lewis

  30. #30
    Hyperactive Member
    Join Date
    Jun 2000
    Location
    Auckland, NZ
    Posts
    411

    Talking Ha Ha

    Well I should have read properly the previous posts because
    the guys (kedaman and others) pointed out the StrConv
    utility which I glossed over. I figured it couldn't be
    that easy to convert a byt array to a Unicode String..hehe

    So now the modified code runs like this:
    Where as it was taking 12000ms to load the 8MB string, it
    now takes 2200ms.

    So this means now I'd have to perform about 1500 InStr
    commands on the loaded string instead of using my DLL in
    order to start winning in the time stakes. Much more
    acceptable but still quite a lot of calls.

    Also, the modified DLL version of Mid takes about 250ms
    instead of the 5000ms it used to... StrConv really rocks.

    By implementing a different method in my DLL (using Instr
    but on a chunk of the source string at a time), I get
    speeds of between 1000ms to 1300ms (instead of the
    consistent 1500ms). The variation is due to my using InStr
    again since it has proven it's worth to me (hehe). So now,
    the number of InStr Operation needed to beat the Memory
    Mapped File and byte array is about 2200 operations (at
    worst) and break even at best.


    Conclusion:
    By not loading the file into memory at all and only loading
    chunks of it at a time, there is potential to cut load
    times for applications loading huge strings from a file by
    a very large fraction of the original time. Instead of
    minutes of wait time just to load the string, you are able
    to load only the parts you need for the operation you are
    performing. For example, suppose the String you want
    happens to be inside the first 1KB of a 8MB string? The
    memory mapped method will only need to access the disk for
    the first chunk of data (in my case I use 64000 bytes per
    chunk).

    I also found that if the chunk size was increased, (640kb
    for example) the time for the dll to find the string
    reduced drastically as well. This is to be expected of
    course.

    I'll put together a new version of my sample for those that
    showed an interest.

    Regards
    Paul Lewis

  31. #31

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Talking

    WOW!
    HeAvY CoDeD mAn!

    That's a lot of information You just typed there Paul!

    I'm going to analyse Your sample today and I'll let You know what the result was, okay?

    Ps. I really appreciate Your work!

    Bart

  32. #32
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    We have been over the whole chunk reading thing already. Check it out here
    Iain, thats with an i by the way!

  33. #33
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Code:
    Private Function FindStr(String1 As String) As Long
      ' simple method to find a position of one sequence of bytes
      ' (representing a string) in mybytes
      ' need to discover an existing dll in win32api that does this already
      ' just pass the dll the two byte arrays and the two array lengths,
      ' and it should return the position.  THIS MUST ALREADY EXIST I AM SURE
      Dim b() As Byte
      ReDim b(Len(Text1) - 1)
      Dim c As Long, d As Long
      For c = 1 To Len(Text1)
        b(c - 1) = Asc(Mid(Text1, c, 1))
      Next
      Dim found As Boolean
      
      For c = 0 To UBound(myBytes)
        found = True
        For d = 0 To UBound(b)
          If myBytes(c + d) <> b(d) Then
            found = False
            Exit For
          End If
        Next
        If found Then Exit For
      Next
      
      If found Then FindStr = c Else FindStr = -1
    End Function
    Paul, I hade a look at your project and it's amazing! Anyway I'm not sure did you use Instr or not? I know i've been trying to do this myself once, a faster version of instr using byte arrays; ALAS! IT's slow. So in this case did you replace the byte array searching with INSTR again? By converting back to unicode, with Strconv, yeah this function's the best thing i know next to copymemory, and then do the search with instr, you'll save much time But I asked about, and i'm sure there is a faster function than INSTR$, take a look at Like operator, it compares reallly fast!

    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  34. #34
    Hyperactive Member
    Join Date
    Jun 2000
    Location
    Auckland, NZ
    Posts
    411

    Thanks

    Thanks kedaman and balip, I hope my experimentation with
    the API and whatnot was of use. I certainly enjoyed
    researching the question because it has taught me several
    things I didn't know before.


    To Iain,
    It is irrelevant if "We have been over the whole chunk
    reading thing already" because as you will no doubt agree,
    learning by doing is far more permanent than learning by
    someone else doing. So I have learned some things by
    trying out some ideas I had which I add is the whole point
    of answering questions... One of the things I learned
    about was Memory Mapped Files so for just that one thing, I
    feel my times was worthwhile (for me)...


    I did ask in an earlier post if anyone already knew about
    this stuff to save me re-inventing the wheel too...

    Anyhow..off to work I go..

    Cheers


    [Edited by PaulLewis on 10-16-2000 at 03:54 PM]
    Paul Lewis

  35. #35

    Thread Starter
    Member
    Join Date
    Aug 2000
    Posts
    38

    Question

    Yepididoo, here I am again...

    Say Paul, I used Your/Kedaman's fast way of importing data by using Binary reading...

    It's indeed a lot faster (about 40%) but, one -big- problem,
    it just doesn't copy all the characters that are in the datafile(??)

    I used this code
    Code:
    hFile = CreateFile(strMapFile, GENERIC_READ Or GENERIC_WRITE, 0, ByVal 0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0)
    
    'Importing data into String
            lMaxSize = FileLen(strMapFile) - 1
    
            ReDim myBytes(lMaxSize - 1) As Byte
            CopyMemory myBytes(0), ByVal hAddress, lMaxSize
    ImportFile = StrConv(myBytes, vbUnicode)
    I do not get any error messages and I already tried to change the lMaxSize to a value of 1000 characters more than the FileLen() function says the file is.
    Result: the string (after convertion) is again as large as it was before I changed the lMaxSize...
    When I look at the characters that are at the end of the string by using the Right() function, they remains a cut off piece of data...
    I don't know how I can let him read until the end of the file.

    Do You know a solution or do I need to keep using LineInput() cause this function works great right now... little bit slower, but hey, it works...

    ------
    I've got also another question...

    I'm using this code
    Code:
            ImportFile = Replace(ImportFile, Chr(13), "", , , vbBinaryCompare)
            ImportFile = Replace(ImportFile, Chr(10), "", , , vbBinaryCompare)
    to replace the LineFeed etc. but these functions are sooooo
    sllooooooowwwwwww, You just can't imagine how slow they are!
    Do You know a way of replacing these Chr()'s in a binary array?

    Thank You!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width