-
Apr 8th, 2021, 07:12 AM
#1
Thread Starter
Fanatic Member
[RESOLVED] .txt files
Without using Excel or Access, how can I sort a .txt file on one of the fields?
Thanks
-
Apr 8th, 2021, 07:31 AM
#2
Re: .txt files
You can read the text file in an array and sort the the array.
You can use ADO:
http://www.vb-helper.com/howto_ado_load_csv.html
What is your goal?
-
Apr 8th, 2021, 07:43 AM
#3
Re: .txt files
What would be the FORMAT of the text file? Arnoutdv provides your answer, but maybe if you'd provide the format of the file, someone could help you more quickly with an example.
Sam I am (as well as Confused at times).
-
Apr 8th, 2021, 07:52 AM
#4
Re: .txt files
Originally Posted by Arnoutdv
Huh? Didn't know that one.
Now, the interesting question would be: Does this work with a "SELECT....INTO....." Statement?
Aircode-Algorithm
Source-File: source.txt
Target-File: target.txt
Do "SELECT * FROM source.txt INTO target.txt ORDER BY SomeField"
Check if "source.txt" is open <---- Is this necessary when accessing via ADO? Is ADO opening the file/keeping it open while the recordset exists?
If not, kill "source.txt", rename "target.txt" to "source.txt"
Voila: Sorted textfile
Thinking about it: That should only work if the first row (?) are unique column-names
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
-
Apr 8th, 2021, 08:53 AM
#5
Thread Starter
Fanatic Member
Re: .txt files
Ok, let me consolidate everything. I have a .txt file that grows form users running another program. Each record has about 35 characters, 4 fields. I imagine the a .txt file has a maximum size of 32K, just like a listbox. I have no idea how many records that translates to, with regards to my record length. However, I don't think the number of recs are important. It's just the file size. I believe, I haven't check that the VB code "Len(file.txt)" will give me file size. I can easily check to see if I am approaching that size and force the user to delete the file and start again. I have no idea how long it would take to get to 32K, but I guess it could be at least one year.
I still have the constraint that I cannot use Excel or Access. But I do need to sort the file by the second field. I know I can create a second file reformatted so the sorting field is first. Then I would sort descending. If I sort Ascending, I don't know how to read a .txt from the bottom up. I don't think that Arnoudtv's suggestion is using Access. I never used it before. I usually use DAO for coding because I never understood the ADO connection string.
-
Apr 8th, 2021, 09:10 AM
#6
Re: .txt files
What kind of delimiter is used for your text file?
Does it need to be written back to disk?
Can you show the first 10 records/lines of a sample file?
-
Apr 8th, 2021, 09:23 AM
#7
Thread Starter
Fanatic Member
Re: .txt files
Comma delimited. Rewriting back to disk is not required as long as the original text file remains intact
Example:
3/10/2021 9:56:54,125,91,"L"3/10/2021 12:11:17,145,109,"L"
3/10/2021 13:20:20,140,137,"L"
3/10/2021 16:32:01,130,114,"L"
3/10/2021 16:50:02,180,101,"L"
3/10/2021 18:04:43,60,95,"L"
3/10/2021 23:12:47,60,96,"L"
3/11/2021 8:40:38,1260,122,"W"
3/11/2021 8:41:23,260,123,"L"
3/11/2021 9:12:16,135,95,"L"
The second field is what needs to be sorted. The entire record needs to be sorted with it.
-
Apr 8th, 2021, 10:09 AM
#8
Re: .txt files
I assume there is a typo in the first line?
-
Apr 8th, 2021, 10:10 AM
#9
Thread Starter
Fanatic Member
Re: .txt files
Yes, it seems 2 lines got concatenated
-
Apr 8th, 2021, 10:15 AM
#10
Re: .txt files
You should think about this problem in a language agnostic manner. The steps that you need to take are:
- Read the CSV file into a collection
- Convert the columns into their appropriate data types
- Sort the converted collection on the desired column
- Write the results back to the CSV file
With that being said, are you stuck on a particular step in the process? In other words, in VB6, do you know how to read a CSV file? If so, then do you know how to convert the columns to their appropriate data types? etc.
-
Apr 8th, 2021, 10:23 AM
#11
Re: .txt files
With a basic sort on text data the numeric field will not be sorted correct.
It will be a text sort instead of a numeric sort.
So for this sample I created an UDT with a numeric variable which will be used to sort the data.
Code:
Option Explicit
Private Type tpData
aData() As String
lSortValue As Long
End Type
Private Sub Form_Load()
Dim sData As String
Dim aData() As String
Dim tData() As tpData
Dim i As Long
sData = "3/10/2021 9:56:54,125,91,""L"""
sData = sData & vbCrLf & "3/10/2021 12:11:17,145,109,""L"""
sData = sData & vbCrLf & "3/10/2021 13:20:20,140,137,""L"""
sData = sData & vbCrLf & "3/10/2021 16:32:01,130,114,""L"""
sData = sData & vbCrLf & "3/10/2021 16:50:02,180,101,""L"""
sData = sData & vbCrLf & "3/10/2021 18:04:43,60,95,""L"""
sData = sData & vbCrLf & "3/10/2021 23:12:47,60,96,""L"""
sData = sData & vbCrLf & "3/11/2021 8:40:38,1260,122,""W"""
sData = sData & vbCrLf & "3/11/2021 8:41:23,260,123,""L"""
sData = sData & vbCrLf & "3/11/2021 9:12:16,135,95,""L"""
aData = Split(sData, vbCrLf)
ReDim tData(UBound(aData))
For i = 0 To UBound(aData)
tData(i).aData = Split(aData(i), ",")
tData(i).lSortValue = CLng(tData(i).aData(1))
Next i
ShellSortData tData
For i = 0 To UBound(tData)
Debug.Print Join(tData(i).aData, ", ")
Next i
End Sub
Private Sub ShellSortData(tData() As tpData)
Dim lLBound As Long, lUBound As Long
Dim lLoop As Long, lHold As Long, lHValue As Long
Dim tTemp As tpData
lLBound = LBound(tData): lUBound = UBound(tData)
lHValue = lLBound
Do
lHValue = 3 * lHValue + 1
Loop Until lHValue > lUBound
Do
lHValue = lHValue / 3
For lLoop = lHValue + lLBound To lUBound
tTemp = tData(lLoop)
lHold = lLoop
Do While tData(lHold - lHValue).lSortValue > tTemp.lSortValue
tData(lHold) = tData(lHold - lHValue)
lHold = lHold - lHValue
If lHold < lHValue Then Exit Do
Loop
tData(lHold) = tTemp
Next lLoop
Loop Until lHValue = lLBound
End Sub
-
Apr 8th, 2021, 10:33 AM
#12
Re: .txt files
It sounds like writing the sorted data isn't required. Your intended use of the sorted data might help you choose what tools to use. There are numerous options but picking a good one should probably take "output impedance matching" into account.
But it doesn't sound like efficiency is too important here, either in terms of memory footprint or CPU cycles burnt. So the best match here might be in terms of "least amount of code to write and maintain."
This isn't just a sorting problem because you also have a parsing problem. Some approaches may be more tolerant of garbage like dropped line separators in the input, but I'm not sure any will recover from trash fully. Some may ignore "junk" at the ends of lines better, but even then it probably means lost data.
Then there is the issue of localization. CSV can mean different things and timestamp format can sometimes be an issue. I saw no mention of character encoding either.
So I expect this to turn into another painfully long socratic thread where people throw out solutions and then issue by issue we get more crumbs of the actual problem.
-
Apr 8th, 2021, 10:55 AM
#13
Thread Starter
Fanatic Member
Re: .txt files
OK. Time to restate the problem in more detail.
I have a .txt file, as described in a previous post. This file can be anywhere from one line to several thousand lines. I do not want to store the entire file into the program, in either a sorted listbox control or an array. I want to access data directly from the file. In some instances I can read the file sequentially and extract the data, or summaries as needed. In other cases, I need to have a sorted file, by one of the fields. I can easily create a new file with the copy of the sorted field as the first field. Somehow sort the records in desc order into another file. Then just read the the sorted file as you would any other .txt file.
-
Apr 8th, 2021, 11:16 AM
#14
Re: .txt files
Here is something that can read .csv files into an ADODB.Recordset (needs project reference to Microsoft ActiveX Data Object 2.8 Library)
Code:
' Module1
Option Explicit
Public Function ReadFromExcel( _
ByVal sFileName As String, _
Optional Workbook As String, _
Optional ByVal CsvHeader As Boolean) As Recordset
Dim BOM_UTF As String: BOM_UTF = Chr$(&HEF) & Chr$(&HBB) & Chr$(&HBF)
Dim BOM_UNICODE As String: BOM_UNICODE = Chr$(&HFF) & Chr$(&HFE)
Dim cn As ADODB.Connection
Dim rsDest As Recordset
Dim sTable As String
Dim sCharset As String
On Error GoTo EH
'--- open connection
Set cn = New ADODB.Connection
On Error GoTo 0
If LCase$(Right$(sFileName, 5)) = ".xlsb" Then
cn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & sFileName & ";Extended Properties=Excel 12.0"
ElseIf LCase$(Right$(sFileName, 5)) = ".xlsx" Then
cn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & sFileName & ";Extended Properties=Excel 12.0 Xml"
ElseIf LCase$(Right$(sFileName, 4)) = ".csv" Then
sCharset = pvReadFromExcelPrefix(sFileName, 3)
If Left$(sCharset, 2) = BOM_UNICODE Then
sCharset = "CharacterSet=Unicode" & vbCrLf
ElseIf Left$(sCharset, 3) = BOM_UTF Then
sCharset = "CharacterSet=65001" & vbCrLf
Else
sCharset = vbNullString
End If
Workbook = Mid$(sFileName, InStrRev(sFileName, "\") + 1)
sFileName = Left$(sFileName, InStrRev(sFileName, "\"))
With New ADODB.Stream
.Open
.WriteText "[" & Workbook & "]" & vbCrLf & _
"Format=Delimited(,)" & vbCrLf & _
"DecimalSymbol=." & vbCrLf & _
"CurrencyDecimalSymbol=." & vbCrLf & _
"CurrencyThousandSymbol=" & vbCrLf & _
"ColNameHeader=" & CsvHeader & vbCrLf & _
"MaxScanRows=0" & vbCrLf & _
sCharset
.SaveToFile sFileName & "schema.ini", adSaveCreateOverWrite
End With
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sFileName & ";Extended Properties=Text"
Else
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sFileName & ";Extended Properties=Excel 8.0"
End If
On Error GoTo EH
If cn.State <> adStateOpen Then
Exit Function
End If
'--- figure out table name
If LenB(Workbook) <> 0 Then
sTable = Workbook
Else
With cn.OpenSchema(adSchemaTables)
Do While LCase$(!TABLE_NAME.Value) = "database"
.MoveNext
Loop
sTable = Replace(!TABLE_NAME.Value, "''", "'")
End With
End If
'--- open table
Set rsDest = New ADODB.Recordset
rsDest.CursorLocation = adUseClient
rsDest.Open sTable, cn, , adLockOptimistic, adCmdTableDirect
If rsDest.State <> adStateOpen Then
Exit Function
End If
Set rsDest.ActiveConnection = Nothing
'--- success
Set ReadFromExcel = rsDest
Exit Function
EH:
Debug.Print "Critical Error: " & Err.Description
End Function
Private Function pvReadFromExcelPrefix(sFileName As String, ByVal lMaxSize As Long) As String
Dim lSize As Long
Dim nFile As Integer
On Error GoTo EH
lSize = FileLen(sFileName)
If lSize > 0 Then
nFile = FreeFile()
Open sFileName For Binary Access Read Shared As nFile
pvReadFromExcelPrefix = String$(IIf(lSize < lMaxSize, lSize, lMaxSize), 0)
Get nFile, , pvReadFromExcelPrefix
Close nFile
End If
EH:
End Function
ReadFromExcel function can be used like this and the rows can be easily sorted in-memory.
Code:
' Form1
Option Explicit
Private Sub Form_Load()
Dim rs As Recordset
Set rs = ReadFromExcel("d:\temp\aaa.csv", CsvHeader:=True)
rs.Sort = rs.Fields(0).Name
Debug.Print rs.RecordCount
End Sub
cheers,
</wqw>
Last edited by wqweto; Apr 8th, 2021 at 01:46 PM.
-
Apr 8th, 2021, 11:54 AM
#15
Thread Starter
Fanatic Member
Re: .txt files
I previously said that I cannot use Access (any database) or excel in this program.
-
Apr 8th, 2021, 11:57 AM
#16
Re: .txt files
Did you check my post?
Did you study the sample posted by wqweto?
His sample can also read CSV files.
And a reference to Microsoft ActiveX Data Object 2.8 Library, has nothing to do with Access nor with Excel.
-
Apr 8th, 2021, 12:05 PM
#17
Thread Starter
Fanatic Member
Re: .txt files
I saw it. It looked like there a database in it, so I did not study it in detail. I thought that ADODB is a database.
-
Apr 8th, 2021, 12:23 PM
#18
-
Apr 8th, 2021, 12:48 PM
#19
Re: .txt files
Since your Text-Data seems to adhere to "normal CSV-files" (comma-delimited, Text in DoubleQuotes -
according to the standard, defined here: https://www.ietf.org/rfc/rfc4180.txt)...
The least code is probably required by RC5 or RC6 - using a CSV-VirtualTable:
Code:
Option Explicit
Private Sub Form_Load()
Dim sCSV As String, Rs As cRecordset
sCSV = New_c.FSO.ReadTextContent("c:\temp\Simple.csv")
Set Rs = GetRsFromCSVString(sCSV, "Select * From T Order By cast(C1 As Int)")
Do Until Rs.EOF
Debug.Print Rs!C0, Rs!C1, Rs!C2, Rs!C3 'for CSV-content without headers, the Col-Names are C0, C1, a.s.o
Rs.MoveNext
Loop
End Sub
Function GetRsFromCSVString(sCSV$, SQL$, Optional ByVal HasHeader As Boolean) As cRecordset
With New_c.MemDB
.Exec "CREATE VIRTUAL TABLE temp.T USING csv(header=" & IIf(HasHeader, 1, 0) & ", data='" & Replace(sCSV, "'", "''") & "')"
Set GetRsFromCSVString = .GetRs(SQL)
End With
End Function
The above then prints out (using the given 4-Column, headerless test-data, placed in c:\temp\Simple.csv)
Code:
3/10/2021 18:04:43 60 95 L
3/10/2021 23:12:47 60 96 L
3/10/2021 9:56:54 125 91 L
3/10/2021 16:32:01 130 114 L
3/11/2021 9:12:16 135 95 L
3/10/2021 13:20:20 140 137 L
3/10/2021 12:11:17 145 109 L
3/10/2021 16:50:02 180 101 L
3/11/2021 8:41:23 260 123 L
3/11/2021 8:40:38 1260 122 W
HTH
Olaf
-
Apr 8th, 2021, 01:56 PM
#20
Thread Starter
Fanatic Member
Re: .txt files
Is this VB6 code? What Reference do I need for this?
-
Apr 8th, 2021, 02:44 PM
#21
Re: .txt files
Originally Posted by AccessShell
Is this VB6 code? What Reference do I need for this?
Do you need VB6 code with no extra references? What container do you plan on keeping your data in? Arrays or Collections?
So you need a basic CSV parser in pure VB6 that can process file content to an array of UDTs and then some basic quick-sort implementation in pure VB6 which can sort this array of UDTs on a custom field?
This will easily approach 500 LOC, probably Olaf can do it in 200 LOC but why would anyone put so much effort for something so remotely useful provided that there are a 20 LOC solution using sqlite from RC5/RC6 and a 100 LOC (can be pruned to 20 LOC) using OS provided ADODB.Recordsets with all the extra benefits?
I would certainly choose my battles more carefully. . .
cheers,
</wqw>
-
Apr 8th, 2021, 03:03 PM
#22
Thread Starter
Fanatic Member
Re: .txt files
I imagine cRecordset is the problem. I don't know how to resolve this.
Are you suggesting I place the entire .csv file into an array? I want to keep the entire file out of the program. I can deal with one record at a time. If the file gets too big, I will force a deletion and start again.
Otherwise, I don't know what you are suggesting.
-
Apr 8th, 2021, 03:11 PM
#23
Re: .txt files
How do you plan on sorting the .csv while keeping the entire file out of the program?
Do you plan on using some implementation of mergesort that can sort a file without reading it into memory in its entirety?
Another option is to try using sort.exe so that the file is kept out of the program i.e. using an external utility.
cheers,
</wqw>
-
Apr 8th, 2021, 03:18 PM
#24
Thread Starter
Fanatic Member
Re: .txt files
That's the point. I can't bring the entire file into the program. That's why I have a problem and asking for help. If the is no solution with my constraints, then just tell me and be off.
-
Apr 8th, 2021, 03:22 PM
#25
Re: .txt files
How big is the file?
You mentioned that the file didn’t need to be written back.
Rewriting back to disk is not required as long as the original text file remains intact
..
This file can be anywhere from one line to several thousand lines. I do not want to store the entire file into the program, in either a sorted listbox control or an array. I want to access data directly from the file. In some instances I can read the file sequentially and extract the data, or summaries as needed. In other cases, I need to have a sorted file, by one of the fields. I can easily create a new file with the copy of the sorted field as the first field. Somehow sort the records in desc order into another file. Then just read the the sorted file as you would any other .txt file.
How do you deal with the data in your application?
Because you don’t want anything it seems..
-
Apr 8th, 2021, 03:37 PM
#26
Re: .txt files
Another standard tool for this is Microsoft's Log Parser 2.2 which is quite stable and widely used.
This provides a command line utility wrapping the library, a full ActiveX API, SQL queries, and good documentation. It also has features for grabbing new data from a log file being gradually written to by another process ("Parsing Input Incrementally").
See Log Parser (Microsoft) for some 3rd party information about the toolkit.
-
Apr 8th, 2021, 03:38 PM
#27
Thread Starter
Fanatic Member
Re: .txt files
You copied one of my previous post regarding the size of the file. I need to get summary info from the file. I can read it once, completely thru to get all the info I need. The remaining problem I have requires the file to be sorted in Desc order read. Desc on the second field. Or sort Asc and read from the last record up.
-
Apr 8th, 2021, 03:55 PM
#28
Re: .txt files
Originally Posted by AccessShell
That's the point. I can't bring the entire file into the program.
You should have started with maximum file size, not speaking about "thousands" of lines which is puny.
Say file is 10GB so that no one would ever think of loading this into memory but still loading this into sqlite is a viable option though.
cheers,
</wqw>
-
Apr 8th, 2021, 04:39 PM
#29
Re: .txt files
@AccessShell
if you want to do this with pure VB6 you have to isolate the Column you want first and Format that number
Code:
Private Function FormatColumnNumber(sCsv As String) As String
Dim s() As String
Dim i As Long
'your Csv = 3/10/2021 9:56:54,125,91,"L"
'get the Number from Column 1 and Format
s() = Split(sCsv, ",")
FormatColumnNumber = Format(s(1), "00000")
Debug.Print FormatColumnNumber
'00125
'00145
'00140
'00130
'00180
'00060
'00060
'01260
'00260
'00135
End Function
then add a Workarray to that Number
it would look like this then 00125 turns into 00125000000 and so on
Code:
00125
00125000000
00145
00145000001
00140
00140000002
00130
00130000003
00180
00180000004
00060
00060000005
00060
00060000006
01260
01260000007
00260
00260000008
00135
00135000009
now you can sort the csv a write to a new file
but you already have good solutions, I wounder why you don't like them
and by the way, a Access .mdb can run without Access installed
to hunt a species to extinction is not logical !
since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.
-
Apr 8th, 2021, 04:55 PM
#30
Re: .txt files
Originally Posted by AccessShell
Is this VB6 code? What Reference do I need for this?
The code above (in post #19) needs a project-reference to "RC6".
Before you see this ActiveX-Dll-reference, you will have to download the RC6BaseDlls.zip from vbRichClient.com
(unzipping into an install-folder on your dev-machine - and then using the Install-Scripts which are contained in this package).
HTH
Olaf
-
Apr 8th, 2021, 05:05 PM
#31
Thread Starter
Fanatic Member
Re: .txt files
I didn't know you could use an Access .mdb without it being installed. Since I have access installed, how would I test it?
-
Apr 8th, 2021, 05:21 PM
#32
Re: .txt files
I didn't know you could use an Access .mdb without it being installed. Since I have access installed, how would I test it?
OMG...glad I never got involved in this Thread.
But if I HAD, I would have simply loaded that text file into the .mdb file, and queried it with an ORDER BY the second field (asc OR desc, depending)...simple stuff.
Sam I am (as well as Confused at times).
-
Apr 8th, 2021, 05:43 PM
#33
Re: .txt files
As for the "expected input-size" (since I've so far posted an InMemory-approach):
I've just played that through with 1Mio Lines of CSV-Text (in the 4-Column-Format of the OP)...
Code:
Private Sub Form_Load()
Dim sCSV As String, Rs As cRecordset
sCSV = New_c.FSO.ReadTextContent("c:\temp\Simple.csv") 'this file contains the original 10 lines
With New_c.ArrayList(vbString)
Do Until .Count = 100000: .Add sCSV: Loop
sCSV = .Join(vbCrLf) 'let's join the Array-entries to 100000 * 10 lines = 1Mio lines
End With
Debug.Print "sCSV-Size ="; CLng(Len(sCSV) / 1024 / 1024); "MB"
Set Rs = GetRsFromCSVString(sCSV, "Select * From T Order By cast(C1 As Int)")
Debug.Print "We got a sorted Rs with:"; Rs.RecordCount; "records"
End Sub
Function GetRsFromCSVString(sCSV$, SQL$, Optional ByVal HasHeader As Boolean) As cRecordset
With New_c.MemDB
.Exec "CREATE VIRTUAL TABLE temp.T USING csv(header=" & IIf(HasHeader, 1, 0) & ", data='" & Replace(sCSV, "'", "''") & "')"
Set GetRsFromCSVString = .GetRs(SQL)
End With
End Function
The above prints out:
Code:
sCSV-Size = 30 MB
We got a sorted Rs with: 1000000 records
Olaf
-
Apr 9th, 2021, 08:26 AM
#34
Re: .txt files
As I mentioned earlier...my approach is a database (others prefer other options). Here's mine:
AccessShellCSVProblem.zip
looks like:
Sam I am (as well as Confused at times).
-
Apr 9th, 2021, 08:56 AM
#35
Thread Starter
Fanatic Member
Re: .txt files
I copied and modified some code from "http://vbcity.com/forums/t/54158.aspx". Even though it said for vb.NET, it worked for VB6.
I modified the code to
1. Reformat the Score to "00000", for sorting, and modified back for displaying.
2. I changed the input and output files from .csv to .txt. It works just as good.
3. I couldn't make the code sort descending (I know I didn't mention that in my post), so I counted the records that I added to the DB. Then I read the records from the DB by starting with MoveLast then the loop used movePrevious.
Thank you for all your help.
-
Apr 9th, 2021, 09:04 AM
#36
Re: .txt files
Originally Posted by AccessShell
3. I couldn't make the code sort descending (I know I didn't mention that in my post)
Try using DESC like this
oRS.Sort = "Field7 DESC,Field8 DESC"
cheers,
</wqw>
-
Apr 9th, 2021, 09:14 AM
#37
Thread Starter
Fanatic Member
Re: [RESOLVED] .txt files
Thanks, that worked just fine.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|