[RESOLVED] Fastest way to read in a large text file (4GB+)
Hi
I seem to be hitting some limits on the Get statement when reading in a text file > 2GB in binary.
eg. Get #ff, byte position/number, buffer
When byte position is > 2,147,000,000 I get the error bad record number, I presume as the Get function was built using something similar to a long to hold the the byte number which has a maximum value of 2,147,000,000.
Unfortunately, I need to be able to read in large text files fast, and line input is just not up to the job due to speed issues, and byte arrays seem to max out >400MB and <500MB. Are there any other options?
Re: Fastest way to read in a large text file (4GB+)
I've been Google'ing around and can't find any hits for exceeding a limit using Get, although the documentation does state that the record number is a Long.
You could try the ADODB Stream Object
Code:
'
' Assumes a reference to Microsoft ActiveX Data Objects 2.8 Library
'
Dim st As ADODB.Stream
Dim strData As String
Set st = New ADODB.Stream
st.Type = adTypeText 'Text Data
st.LineSeparator = adCRLF 'Line Terminmator is CRLF
st.Charset = "ascii" 'Character set is ASCII
st.Open
st.LoadFromFile ("c:\MyApp\MyData.txt") ' Whatever your File Name is
'
' Read each record
'
Do Until st.EOS
strData = st.ReadText(adReadLine)
'
'
'etc
'
Loop
st.Close
Set st = Nothing
Re: Fastest way to read in a large text file (4GB+)
thanks for the reply Doogle, I just started googling and found this:
http://support.microsoft.com/kb/189981/en
so I'll have a play around with it first as it contains pure API calls without any dependency and might be faster,
there are probably some functions in the FileSystemObject that might be able to get around the 2GB limit also
eg http://support.microsoft.com/kb/186118
Re: Fastest way to read in a large text file (4GB+)
Have you seen this over in codebank?
Re: Fastest way to read in a large text file (4GB+)
Thanks for the link milk, that is how I ended up doing it (via API calls rather than FSO). :)
I used the CreateFile, WriteFile, ReadFile and CloseHandle APIs to get past the 2GB limit, tested up to 8GB files no worries:
http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx
http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx
http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx
http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx
I found that I didn't need SetFilePointer as once I had set the byte array buffer size the WriteFile and ReadFile functions automatically took care of the file position for me. Also I specified FILE_FLAG_NO_BUFFERING And FILE_FLAG_WRITE_THROUGH in the dwFlagsAndAttributes argument in CreateFile when writing large files in preference to using FlushFileBuffers due to the performance hit noted by MS in the details regarding the FlushFileBuffers function:
"Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write."