|
-
Sep 8th, 2005, 07:18 PM
#1
Thread Starter
New Member
Binary Get# causes overflow on 10gb files [resolved]
Hi all,
I'm trying to use Binary Access to read a large 10gb file, by moving, and reading to an exact point in the file, using the Get# function.
The Get# uses a Long as its position parameter, to specify where to move to in the file. This naturally causes issues accessing large files past the point of ~2.1gb, and returns an Overflow error.
Is there an alternative approach i can take, to read X characters, from a specific point in a file, and return a string? An API call perhaps?
Whatever the alternative, it needs to be lightning fast, to directly move to that point in the file instantaneously.
I've been searching the net/msdn/etc for a few hours now, with no luck.
I've seen approaches of splitting the file, then reading it, but this wont work, as the below code takes place thousands of times, and needs to happen fairly quickly (few ms per read). The below approach is ultra fast, and does the trick, when dealing with under 2.1gb files, but we need to handle up to 10gb.
Any help really appreciated!
current approach:
Open strFileName For Binary Access Read As #lngFNum
strData = Space$(4000)
Get #lngFNum, dblPos, strData
Last edited by AaronColvin; Sep 9th, 2005 at 12:04 AM.
Reason: resolved
-
Sep 8th, 2005, 07:52 PM
#2
Re: Binary Get# causes overflow on 10gb files
Are you reading the whole file in chunks? In that case do you really need to set the read position? The next Get# statement will read from the current position in the file.
-
Sep 8th, 2005, 09:39 PM
#3
Thread Starter
New Member
Re: Binary Get# causes overflow on 10gb files
Firstly, thanks for your reply!
I'm not sure i understand your question. reading in chunks?
We aren't reading through the whole file, processing it as we go, if thats what your asking? Its more a case of jumping into the file, at a certain point, reading 4000chars, and jumping back out. Then depending on all sorts of business rules, it may need to jump back in to the file, at a completely different point in the file, read 4000 chars, then jump back out again.
(i wont bother going in to "why" )
The Get# function is normally perfect for this, as it is almost instant. But stupidly, it expects a long to be passed, for the byte position to start reading from. Obviously this then limits you to a max "start read position" of 2147483647, or ~2.147gb.
We have indentified that we can split the reference files in question into smaller 1-2gb files, during building them, and then simply continue to use the Get# method, and just open the relevant refence file we are after, as opposed to the current "1 large reference file" approach. However, the re-design work surrounding this, especially at the 11th hour, is rather daunting. We have only just detected this issue during final Load Testing. yes, a bit of a bungle on our part.
Ideally we are after an equivalent function that accepts a Double (or single) data type, instead of a Long data type. Surely there must be something ?! I'd be stunned if its not possible to seek to an exact point in a file, and begin reading, larger than 2gig. In fact, i can think of many applications that must do this - eg very large DV/AVI video camera files, which are often well over 2gb in raw DV format.
We also had some similar trouble performing a "get file size" call on files over 2.147gb, using the FileLen function (it returns a long).
But we got past that with the following code (altho there may be better ways) :
Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" (ByVal lpFileName As String, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, lpSecurityAttributes As Any, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
Private Declare Function GetFileSizeEx Lib "kernel32" (ByVal hFile As Long, lpFileSize As Currency) As Boolean
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Function GetLargeFileSize(ByVal strFileName As String) As Double
Const GENERIC_READ = &H80000000
Const FILE_SHARE_READ = &H1
Const OPEN_EXISTING = 3
Dim hFile As Long, nSize As Currency
'open the file
hFile = CreateFile(strFileName, GENERIC_READ, FILE_SHARE_READ, ByVal 0&, OPEN_EXISTING, ByVal 0&, ByVal 0&)
'
'get the filesize in currency
GetFileSizeEx hFile, nSize
'
'close the file
CloseHandle hFile
'
'Size in bytes return as double
GetLargeFileSize = nSize * 10000
End Function
-
Sep 8th, 2005, 11:01 PM
#4
Re: Binary Get# causes overflow on 10gb files
OK I understand now what you need. Unfortunatly VB natively doesn't offer this since even the Seek statement uses a Long to position the file pointer. So you have to turn to the API. You need to open the file using CreateFile in the same manner as you're already doing to get the file size. Next you need to call the SetFilePointer function. This function takes two Long values to simulate a 64-bit long integer to position the file pointer. However the low order Long is passed ByVal and for this to work you would need to pass an unsigned long to this argument, and as you know VB doesn't support unsigned long integers. So the solution would be to make several calls to SetFilePointer and move it (at the most) 2GB each time from the current position. You can then use the ReadFile function to read the number of bytes you need. So you should read this into a byte array instead of a string. You can then convert the byte array into a string using the StrConv function.
VB Code:
Private Declare Function SetFilePointer Lib "kernel32.dll" ( _
ByVal hFile As Long, _
ByVal lDistanceToMove As Long, _
ByRef lpDistanceToMoveHigh As Long, _
ByVal dwMoveMethod As Long) As Long
Private Declare Function ReadFile Lib "kernel32.dll" ( _
ByVal hFile As Long, _
ByRef lpBuffer As Any, _
ByVal nNumberOfBytesToRead As Long, _
ByRef lpNumberOfBytesRead As Long, _
ByRef lpOverlapped As Any) As Long
'constant values that can be used in the dwMoveMethod argument of SetFilePointer
Private Const FILE_BEGIN As Long = 0
Private Const FILE_CURRENT As Long = 1
Private Const FILE_END As Long = 2
[b]'I have skipped the declaration of CreateFile and CloseHandle since you already have them [/b]
Public Function GetDbl(strFileName As String, dblPos As Double) As String
Dim hFile As Long, bArr() As Byte
Dim nBytesRead As Long
hFile = CreateFile(strFileName, GENERIC_READ, FILE_SHARE_READ, _
ByVal 0&, OPEN_EXISTING, ByVal 0&, ByVal 0&)
Do While dblPos > &H7FFFFFFF '= 2147483647
Call SetFilePointer(hFile, &H7FFFFFFF, ByVal 0&, FILE_CURRENT)
dblPos = dblPos - &H7FFFFFFF
Loop
If CLng(dblPos) Then
Call SetFilePointer(hFile, CLng(dblPos), ByVal 0&, FILE_CURRENT)
End If
Redim bArr(3999) As Byte 'arrays are zero based so this is 4000 bytes
Call ReadFile(hFile, bArr(0), 4000, nBytesRead, ByVal 0&)
Call CloseHandle(hFile)
GetDbl = StrConv(bArr, vbUnicode)
End Function
Last edited by Joacim Andersson; Sep 8th, 2005 at 11:04 PM.
-
Sep 8th, 2005, 11:04 PM
#5
Re: Binary Get# causes overflow on 10gb files
AaronColvin,
You need to use Windows API's to accomplish what you want to do. SetFilePointer has a limit of 2^64-2 or 18446744073709551614. That should do it for you.
VB Code:
Declare Function OpenFile Lib "kernel32" Alias "OpenFile" (ByVal lpFileName As String, lpReOpenBuff As OFSTRUCT, ByVal wStyle As Long) As Long
Declare Function ReadFile Lib "kernel32" Alias "ReadFile" (ByVal hFile As Long, lpBuffer As Any, ByVal nNumberOfBytesToRead As Long, lpNumberOfBytesRead As Long, lpOverlapped As OVERLAPPED) As Long
Declare Function SetFilePointer Lib "kernel32" Alias "SetFilePointer" (ByVal hFile As Long, ByVal lDistanceToMove As Long, lpDistanceToMoveHigh As Long, ByVal dwMoveMethod As Long) As Long
Declare Function lClose Lib "kernel32" Alias "_lclose" (ByVal hFile As Long) As Long
-
Sep 9th, 2005, 12:03 AM
#6
Thread Starter
New Member
Re: Binary Get# causes overflow on 10gb files
Thankyou both for your answers!
What you've listed above looks very promising. I havent had a chance to test it yet - i will try this on Monday.
Thanks again, very appreciated.
-
Sep 9th, 2005, 04:07 AM
#7
Re: Binary Get# causes overflow on 10gb files [resolved]
As an alternative to SetFilePointer you can use SetFilePointerEx. That way you don't have to split the distance to move into two parameters.
-
Sep 9th, 2005, 08:04 AM
#8
Re: Binary Get# causes overflow on 10gb files [resolved]
Well, SetFilePointerEx is only available in Win2000/XP/2003. However the problem is that VB doesn't support unsigned long integers. SetFilePointer can also handle large files since it has 64 bit. The SetFilePointerEx uses two LARGE_INTEGER structure for the new position but VB still can't handle unsigned values. This wouldn't have been a problem if the functions had used pointers for both the low order and high order arguments for the new file position, however only the large order DWORD (or LARGE_INTEGER in SetFilePointerEx) uses a pointer. That means you have to pass the low order by value and how would you then pass the value 3 billion for example?
-
Sep 9th, 2005, 10:58 AM
#9
Re: Binary Get# causes overflow on 10gb files [resolved]
Judging from the file sizes, and from the GetFileSizeEx aaron is using (post #3), the operating system wouldn't be a problem. On other operating systems files can't be bigger then 4 GB, and GetFileSizeEx wouldn't function as well.
Aaron is using a Currency datatype to handle the LARGE_INTEGER in GetFileSizeEx, so I would guess this would work for SetFilePointerEx as well. I haven't tried it though. Because a currency is actually stored as a 64 bit integer value, I guess this will work. As long as you keep in mind to devide/multiply by 10000.
I wouldn't bother too much about the unsigned values, because the file would have to become really really large before VB would return a negative value instead of a positive value. If I haven't miscalculated, the file would have to be 8589934592 GB or larger for VB to return a negative value.
-
Sep 9th, 2005, 11:13 AM
#10
Re: Binary Get# causes overflow on 10gb files [resolved]
Joacim, I just notice that the LARGE_INTEGER structure represents a SIGNED 64 bit integer, and not an unsigned one. This makes sence, because the liDistanceToMove parameter can accept negative values.
-
Sep 9th, 2005, 11:15 AM
#11
Re: Binary Get# causes overflow on 10gb files [resolved]
 Originally Posted by Frans C
I wouldn't bother too much about the unsigned values, because the file would have to become really really large before VB would return a negative value instead of a positive value. If I haven't miscalculated, the file would have to be 8589934592 GB or larger for VB to return a negative value.
??? This was the problem to start with... A VB Long is an unsigned 32 bit integer so the highest value it can contain is 2147483647 (or 2Gig). Using the Currency datatype will not work since you need to pass the value ByVal. It works fine with GetFileSizeEx since it's then passed ByRef and you don't have to pass a number in it since it's used for the return value (the Currency datatype is 64bit in size but it's not an integer). So you can still not pass a value greater then 2147483647 (or &H7FFFFFFF) to the function from VB and there is the problem.
-
Sep 9th, 2005, 11:27 AM
#12
Re: Binary Get# causes overflow on 10gb files [resolved]
 Originally Posted by Frans C
Joacim, I just notice that the LARGE_INTEGER structure represents a SIGNED 64 bit integer, and not an unsigned one. This makes sence, because the liDistanceToMove parameter can accept negative values.
Yes. The LARGE_INTEGER represent a signed 64 bit integer. But to be able to do so the low order DWORD must be unsigned. And it is treated as such unless the hight order DWORD isn't NULL (pointing to 0 not equal 0). So to represent the number 8589934591 for example (which equals a 33 bit number where all bits are set to 1) The high order DWORD must be set to 1 and the low order DWORD have the number 4294967295 (or &HFFFFFFFF) which is an unsingned long value and twice as large as VB can handle (in VB the value &HFFFFFFFF equals -1 in a Long data type and passing -1 byval will not work).
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|