[RESOLVED] How to read some specific columns from a text file ?
Hello VB friends,
I have the following usual lines of code written to open a database in the text format. The database has 5 columns but for me are useful only two (col1 and col 3, which contain cities). My goal is to join these two columns in a single string (bigCities). I used the Input method because I have considered it is fast enough for my needs and a two dimensional array (myArr).
The issue is that I don't now how to avoid to load other 3 unuseful columns as long as I need only 2...I wonder if there are other more suitable aproach.
Code:
Dim fnum As Integer, i As Long
Dim myArr() As String, bigArray As String
Dim bigCities As String
ReDim myArr(1 To 5, 10000)
fnum = FreeFile
Open "myFile" For Input As #fnum
i = 0
Do While Not EOF(fnum)
Input #fnum, myArr(1, i), myArr(2, i), myArr(3, i), myArr(4, i), myArr(5, i)
i = i + 1
Loop
Close #fnum
After that, I had to extract the useful columns (1 and 3) in this way:
Code:
ReDim bigArray(1 To UBound(myArr, 2))
For i = 1 To UBound(myArr, 2)
bigArray(i) = myArr(1, i) & " " & myArr(3, i) & " "
Next i
And, finally, we will have:
Code:
bigCities = Join(bigArray)
Even the result is ok I feel the right approach should be other, a simpler one...
For example, I think the last next-for looping could be, in a way, avoided. But how? Thank you in advance.
Daniel
Last edited by Siddharth Rout; Jul 21st, 2012 at 05:29 AM.
Reason: Added Code Tags
Re: How to read some specific columns from a text file ?
You don't need to use any Arrays and you only need one loop
Code:
Dim strA As String
Dim strB As String
Dim strC As String
Dim strD As String
Dim strE As String
Dim bigCities As String
fnum = FreeFile
Open "myFile" For Input As #fnum
Do While Not EOF(fnum)
Input #fnum, strA, strB, strC, strD, strE
bigCities = bigCities & strA & " " & strC & " "
Loop
Close #fnum
'
' Remove trailing space
'
bigCities = Left$(bigCities, Len(bigCities) - 1)
Re: How to read some specific columns from a text file ?
Hello Doogle,
Thank you for your quick reply.
Honestly, I had hoped to find a better solution not a worse one...
My post has been just a reduced sample, the real text file has almost 4.5 milion rows... Your example, based on concatenation, is just a theoretical solution. If my aproach takes almost 18 seconds, yours should take hours...I have just tested it and after 15 minutes I gave up...
If you have doubts I can proove these sending you both files (vbp and txt).
Using arrays - approach which is not random at all - I tried to avoid a cummulative usage of memory that makes entire looping process very very slow.
Anyway, thank you for your help.
Re: How to read some specific columns from a text file ?
You could output to a temporary file all of the information you are interested in, then read that file afterward to access the data.
Code:
Dim fInput As Integer, fOutput As Integer
Dim sData(4) As String
Dim BigCities As String
fInput = FreeFile
Open "myfile" For Input As #fInput
fOutput = FreeFile
Open "tempfile" For Output As #fOutput
Do Until EOF(fInput)
Input #fInput, sData(0), sData(1), sData(2), sData(3), sData(4)
Print #fOutput, sData(0); " "; sData(2); " ";
Loop
Close #fInput
Close #fOutput
fInput = FreeFile
Open "tempfile" For Input As #fInput
BigCities = Input(LOF(fInput), fInput)
Close #fInput
You could also use the Mid statement to create the result string in memory. This would require more code, but would be faster than output to hard drive.
If you give a sample of your text file (the first several lines should do), we may be able to provide a better solution.
Re: How to read some specific columns from a text file ?
Hello guys,
I have just posted both the entire subroutine involved in the extraction and the first 100 rows of the text file. Please keep in mind that this file has 5 columns and almost 4230000 rows (112Mb) ... The size is a very important detail as long as we should avoid some aproaches which take time.
As you can see, I have kept an usual timer to make different trials worked around comparable. In my PC the below process took almost 12 seconds and I would consider this time as a reference.
Code:
'Load using the Input # statement (Fastest Method)
Private Sub btnInput_Click()
Dim fnum As Integer, i As Long
Dim myArray() As String, bigArray() As String
Dim start_time As Single, stop_time As Single
Dim pathFile As String, bigJoin As String
pathFile = App.Path & "\hugeFile.txt"
Screen.MousePointer = vbHourglass
DoEvents
'ReDim's should be done in run time but this takes time....
ReDim myArray(1 To 5, 6000000)
ReDim bigArray(6000000)
fnum = FreeFile
start_time = Timer
Open pathFile For Input As #fnum
i = 0
Do While Not EOF(fnum) 'Check for end of file
Input #fnum, myArray(1, i), myArray(2, i), myArray(3, i), myArray(4, i), myArray(5, i)
i = i + 1
Loop
Close #fnum
ReDim Preserve myArray(1 To 5, i)
ReDim bigArray(1 To UBound(myArray, 2))
'We keep only the first and the fourth column
For i = 1 To UBound(myArray, 2)
bigArray(i) = myArray(1, i) & " " & myArray(4, i) & " "
Next i
bigJoin = Join(bigArray)
stop_time = Timer
Label1.Caption = Format$(stop_time - start_time, "0.00") & " sec."
Screen.MousePointer = vbDefault
End Sub
Again, is something redundant in my aproach (in terms of time or coding) ? Maybe I could avoid, in a way, to load those three unuseful columns in myArray...Is there any other feasible aproach to work with a huge database like this? I have some doubts that using an ADO I could improve my process but I will try tommorow. At least if we take into account that my goal is to find some substrings in that huge string...
Thank you all for your support.
Re: How to read some specific columns from a text file ?
This is just about the fastest way to get the end result (space-delimited string containing all data from columns 1 and 4)
Code:
Dim Col1 As String, Col2 As String, Col3 As String, Col4 As String, Col5 As String
Dim fnum As Integer
Dim BigString As String
Dim lPos As Long
BigString = Space(1048576)
lPos = 1
fnum = FreeFile
Open "hugeFile.txt" For Input As #1
' Input column headers (this data is ignored)
Input #fnum, Col1, Col2, Col3, Col4, Col5
Do Until EOF(fnum)
Input #fnum, Col1, Col2, Col3, Col4, Col5
' Ensure enough space in buffer
If lPos + Len(Col1) + Len(Col4) + 1 > Len(BigString) Then
BigString = BigString & Space(Len(BigString))
End If
Mid(BigString, lPos) = Col1
lPos = lPos + Len(Col1) + 1
Mid(BigString, lPos) = Col4
lPos = lPos + Len(Col4) + 1
Loop
Close #fnum
BigString = Left$(BigString, lPos - 1)
Re: How to read some specific columns from a text file ?
it seems like its as fast as it can get already. unless like said before you make a temp file with the first and 4th column's info and go from that file if you need access.... of course opening a 100mb text file wont take 1 sec to load! There is no way around. its like if i play a game and it takes 45sec to load but i want it to load in 5sec wont be possible maybe 40sec if you tweak pc a bit, the pc has to process all the information(files) no matter how you want it to do it.
Re: How to read some specific columns from a text file ?
Originally Posted by Daniel Duta
Is there any other feasible aproach to work with a huge database like this? I have some doubts that using an ADO I could improve my process but I will try tommorow. At least if we take into account that my goal is to find some substrings in that huge string...
The obvious feasible approach would be to perform a one-off upload of the columns required to a proper Database engine such as SQL Server, especially if you're going to want to search for sub-strings.
Re: How to read some specific columns from a text file ?
Hello everybody,
Hats off to you Logophobic! Your method is 5 seconds faster...Is amazing indeed to be able to handle over 4 milions rows in 7 seconds without any other special class, API functions or library. I think the trick is you have increased the size of the BigString (1Mb) from beginning.
Nice approach and very didactic. You have 5 globes for this solution.
Thank you too Doogle for your interest. Of course, I will use specialized database engines but for more serious and complex tasks. In this case I was just looking for a simply and economic way to do some checkings. Besides, I have learnt a new solution to a problem and I discovered a bit of ingenuity.
Thank you all.