Hi,
I try to handle a particular type of csv file using ADO facilities but for some reasons I am not able to return from a csv file a column with its decimal numbers but the integer part of it only. The main idea is to restructurate the csv file in other csv file considering some specific rules as you can deduce from code below. I have attached a sample of the original input csv file hoping that someone could explain to me why I cannot reproduce the original format of the latest csv field. Thank you in advance.
Code:
Private Sub ExtractCSV()
Dim i As Long, R As Long, k As Long, p As String, cale As String
Dim myVar() As Variant, vCal As String, newData As String
Dim rec As ADODB.Recordset
Set rec = New ADODB.Recordset
cale = App.Path & "\ZIP Files\"
vCal = "08/02/2017" '... a value from a Calendar control
newData = Format$(vCal, "yyyy-mm-dd") & " 00:00"
p = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & cale & "';Extended Properties='Text;HDR=No;FMT=Delimited'"
With rec
.CursorLocation = adUseClient
.Open "SELECT [F1],[F2],[F4],[F10],[F9] FROM [sample.txt]", p, adOpenStatic, adLockReadOnly, adCmdText
R = .RecordCount
If R > 0 Then
myVar = .GetRows
.Close
Set rec = Nothing
Else
MsgBox "Start Time field is missing."
.Close
Set rec = Nothing
Exit Sub
End If
End With
For i = 0 To R
If myVar(0, i) = "Start Time" Then
k = i + 1
Exit For
End If
Next i
If k = 0 Then
MsgBox "Start Time field is missing."
Exit Sub
End If
Open cale & "\newCSV.txt" For Output As #1 'Open the destination csv file
Do While k < R
Print #1, newData & "," & myVar(1, k) & "," & Chr$(34) & myVar(2, k) & Chr$(34) & ",Reliable," & myVar(3, k) & "," & myVar(4, k)
k = k + 1
Loop
Close #1
MsgBox "Done"
End Sub
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
you can try and set the maxscanrows to 0, but that's never worked for me.
you'll probably have to use a schema.ini to override the auto-detection of the columns.
Code:
p = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & cale & "';Extended Properties='Text;HDR=No;FMT=Delimited;MaxScanRows=0'"
Last edited by DEXWERX; Aug 14th, 2017 at 08:04 AM.
you can try and set the maxscanrows to 0, but that's never worked for me.
you'll probably have to use a schema.ini to override the auto-detection of the columns.
I tried schema.ini but with no success. The csv has a header but as you have noticed it has an offset and for this reason I had to set the HDR=No. Besides my real csv files have complex names that requires to be detected dynamically via Dir function. It would mean to update their names inside schema.ini each time before ADO query. Not very convenient.
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
I tried schema.ini but with no success. The csv has a header but as you have noticed it has an offset and for this reason I had to set the HDR=No. Besides my real csv files have complex names that requires to be detected dynamically via Dir function. It would mean to update their names inside schema.ini each time before ADO query. Not very convenient.
So the crux of your problem is that the file is not a CSV. I see no other way of overriding the field detection if IISAM, except with a schema.ini.
Since these are custom file formats, you could use a custom parser.
The funny thing is I use a parser, to detect the column info and dynamically generate a schema.ini, just so I can use the ISAM Text Driver.
It would take almost no code to just finish the parser.
Dan
But I'm confused as to what you are trying to accomplish.
Spoo, my atypical csv file comes from a server and if we ignore the extra data above the header we can consider it a common csv file with 10 columns. All I want to do is to keep all data below this header and only for columns 1, 2, 4, 10 and 9. I have some doubts that I am on the right way using ADO. Maybe I should go with a simple parser but all this time I remained stuck in this approach
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
So just read in the file as a text file... skip the first 7 rows... then start reading at line 8 (the headers) ... split on the comma, and keep elements (0,1,3,9 & 8) and write them out to a new file.
So just read in the file as a text file... skip the first 7 rows... then start reading at line 8 (the headers) ... split on the comma, and keep elements (0,1,3,9 & 8) and write them out to a new file.
I think you're over thinking it.
-tg
can't split on a commas, when a csv is embedded in a column 3
unless he's sure column 3 will always have 2 commas. then he can just adjust the next column numbers?
I would never leave a booby trap like that though.
Well you seem to have a file that was exported from an Excel worksheet by a slow child or something. The result is full of garbage "rows" that most code designed to accept CSV input is likely to choke on.
Since it appears to be ANSI text, you could probably make use of VB6's intrinsic I/O statements that handle its own "CSV" format. This has special considerations but since you only seem to have and care about some String, Integer, and Long columns it may be good enough here. Example:
Code:
Option Explicit
Private Sub Reformat(ByVal FileNameIn As String, ByVal FileNameOut As String)
Const HEADER_PREFIX As String = "Start Time,"
Dim FileIn As Integer
Dim RawLineIn As String
Dim FoundHeaderLine As Boolean
Dim FoundDataRows As Boolean
Dim FileOut As Integer
Dim RowCount As Long
Dim Fields(1 To 10) As String
Dim F As Long
FileIn = FreeFile(0)
Open FileNameIn For Input As #FileIn
Do Until EOF(FileIn)
Line Input #FileIn, RawLineIn
FoundHeaderLine = Left$(RawLineIn, Len(HEADER_PREFIX)) = HEADER_PREFIX
If FoundHeaderLine Then
FoundDataRows = Not EOF(FileIn)
If FoundDataRows Then
FileOut = FreeFile(0)
Open FileNameOut For Output As #FileOut
Do Until EOF(FileIn)
For F = 1 To 10
Input #FileIn, Fields(F)
Next
Write #FileOut, CInt(Fields(2)), _
Fields(4), _
"Reliable", _
CLng(Fields(10)), _
CInt(Fields(9))
RowCount = RowCount + 1
Loop
Close #FileOut
End If
Exit Do
End If
Loop
Close #FileIn
If FoundHeaderLine Then
If FoundDataRows Then
MsgBox CStr(RowCount) & " rows"
Else
MsgBox "No data rows"
End If
Else
MsgBox "Could not find the header row"
End If
End Sub
Private Sub Main()
Reformat "sample.txt", "newsample.txt"
End Sub
The only niggle might be your added phony column that claims "reliability" which you didn't want to write with text delimiters (quotation marks). However your sample code produces them explicitly for another output String field so this shouldn't be an issue for you.
Even if it is, you could forego Input#/Write# and do your own row input parsing and output formatting if you need finer control.
can't split on a commas, when a csv is embedded in a column 3
unless he's sure column 3 will always have 2 commas. then he can just adjust the next column numbers?
I would never leave a booby trap like that though.
It appears that the first 38 characters are always the same "format".
He could split the lines
Code:
txt = xxx ' the full line
txt1 = Left(txt,38)
txt2 = Mid(txt,39)
Spoo, my atypical csv file comes from a server and if we ignore the extra data above the header we can consider it a common csv file with 10 columns. All I want to do is to keep all data below this header and only for columns 1, 2, 4, 10 and 9. I have some doubts that I am on the right way using ADO. Maybe I should go with a simple parser but all this time I remained stuck in this approach
If so, then the "hiccup" would be in col-04 as it contains 2 commas
However, it is all contained within double-quotes, so it could be isolated as a single "column"
Assuming the above is correct, then maybe something like this snippet
Code:
Private Sub Command1_Click()
'
' open file
fpath = "D:\VBForums\DanD.txt"
Open fpath For Input As #1
' skip over "header"
For ii = 1 To 15
Line Input #1, xtr
If Left(xtr, 10) = "Start Time" Then
Exit For
End If
Next ii
' extract "true" data
Dim txDAN(10)
Do While Not EOF(1)
Line Input #1, xtr
tx1 = Left(xtr, 38)
tx2 = Mid(xtr, 39)
qq = InStr(2, tx2, """")
txq = Left(tx2, qq)
tx3 = Mid(tx2, qq + 1)
'
' col-1
cc = InStr(tx1, ",")
txDAN(1) = Left(tx1, cc - 1)
tmp = Mid(tx1, cc + 1)
' col-2
cc = InStr(tmp, ",")
txDAN(2) = Left(tmp, cc - 1)
tmp = Mid(tmp, cc + 1)
' col-3
txDAN(3) = tmp
' col-4
txDAN(4) = txq
' col-5..9
tmp = Mid(tx3, 2) ' get rid of lead comma
For ii = 5 To 9
cc = InStr(tmp, ",")
txDAN(ii) = Left(tmp, cc - 1)
tmp = Mid(tmp, cc + 1)
Next ii
' col-10
txDAN(10) = tmp
'
Exit Do ' << for demo purposes only
Loop
Close #1
' display 1st line .. parsed
With FG1
' 1. re-size
.Cols = 2
.Rows = 11
For ii = 0 To 1
.Col = ii
If ii = 0 Then
.ColWidth(ii) = 300
Else
.ColWidth(ii) = 8000
End If
Next ii
' captions
.Row = 0
.Col = 0
.Text = "nn"
.Col = 1
.Text = "Data"
' populate
For ii = 1 To 10
.Row = ii
' 1. col 0
.Col = 0
.Text = ii
' 2. other col
.Col = 1
.Text = txDAN(ii)
.CellAlignment = 1
Next ii
End With
'
End Sub
Here is the parsing of the first line of "data"
HTH
Forgive me if you already tried something like this .. or if I missed what you are trying to do
If you need it fast and reliable (able to import even directly from a ByteArray, in case it comes in this way from a Web-Request):
Into a Form, which contains a Command1-Button and an MSHFlexGrid1 (Project needs a reference to vbRichClient5):
Code:
Option Explicit
Private CSVtoADO As New cCSVtoADO
Private Sub Command1_Click()
New_c.Timing True
CSVtoADO.ParseCSVBytes New_c.FSO.ReadByteContent("c:\temp\sample.txt")
Dim Rs As cRecordset 'Select Columns 1,2,4,9 and 10 by name
Set Rs = CSVtoADO.MemDB.GetRs("Select [Start Time]," & _
"[Period]," & _
"[GCELL]," & _
"[TR373:Cell Availability (%)]," & _
"[CR373:Cell In-Service Duration (s)] " & _
"From T")
Caption = "Import and Select of " & Rs.RecordCount & " Records took:" & New_c.Timing
Set MSHFlexGrid1.DataSource = Rs.GetADORsFromContent 'convert SQLite-Rs to ADO-Rs and hang it into the Grid
End Sub
And this here into a generically working (for this type of file) CSV-Import-Class, named cCSVtoADO
Code:
Option Explicit
Implements ICSVCallback
Private CSV As cCSV, Cmd As cCommand
Public Hdrs As cArrayList, MemDB As cMemDB
Public Sub ParseCSVBytes(CSVContent() As Byte)
Set Hdrs = New_c.ArrayList(vbString)
Set CSV = New_c.CSV
CSV.ParseBytes CSVContent, Me
Set Cmd = Nothing
End Sub
Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
Dim S As String
If Cmd Is Nothing Then 'we are still in the header
If Not (ColNr = 0 And Hdrs.Count > 1) Then 'we have not yet reached or read all the HeaderCols
S = CSV.GetStringValue(B, BValStartPos, BValLen)
If Len(S) Then If ColNr < Hdrs.Count Then Hdrs(ColNr) = S Else Hdrs.Add S
Else 'now we have the Header
Set MemDB = New_c.MemDB
MemDB.Exec "Create Table T([" & Hdrs.Join("] Text, [") & "] Text)"
Set Cmd = MemDB.CreateCommand("Insert Into T Values(" & Mid(Replace(Space(Hdrs.Count), " ", ",?"), 2) & ")")
End If
End If
If Not Cmd Is Nothing Then 'we are in "Record-Insert-Mode" now
Cmd.SetTextUTF8Ptr ColNr + 1, VarPtr(B(BValStartPos)), BValLen
If ColNr + 1 = Hdrs.Count Then Cmd.Execute
End If
End Function
The above two Code-Snippets will produce this after only about 9msec total (for import of the *whole* Table and the Column-specific Select:
HTH, at least as a reference, to compare with other solutions...
post a sample of what you actually want in newcsv.txt
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
With all of the rewrites and changes at MSDN it is hard to find a link that is still good. There is a recent incomplete brief overview here for those not up to speed:
With all of the rewrites and changes at MSDN it is hard to find a link that is still good. There is a recent incomplete brief overview here for those not up to speed:
Hello guys,
I know that this particular csv file is a bit tricky but this is the format generated by a linux server that is not under my control and originally it comes archived with a tar.gz extension. Of course, opening each csv file in a notepad and matching it as a standard csv file is not a very convenient stuff as long as I need to use this input in an automatic process. I have to recognize the dille's procedure does respond to all my needs. Beyond readability, it is versatile enough to restore the format of some fields or to insert a new one that are depending of other (in my case the 'Readibility' string). Taking all in all, it is something right on the target without any redundant code line. Thank you very much Dilletante.
I have always appreciated the people that are capable of solving tasks with limited resources. And some solutions to my thread could be a good example at this point. Why should we use pointers, APIs or even ASM lines when for the same purpose we can simply use native vb functions? Maybe the Merri's parser is faster but honestly it is a bit verbose and sometimes the beauty of programming style is more important than speed.
Originally Posted by Spooman
Dan
Assuming the above is correct, then maybe something like this snippet
Code:
Private Sub Command1_Click()
' open file
fpath = "D:\VBForums\DanD.txt"
Open fpath For Input As #1
' skip over "header"
For ii = 1 To 15
Line Input #1, xtr
If Left(xtr, 10) = "Start Time" Then
Exit For
End If
Next ii
' extract "true" data
Dim txDAN(10)
Do While Not EOF(1)
Line Input #1, xtr
tx1 = Left(xtr, 38)
tx2 = Mid(xtr, 39)
qq = InStr(2, tx2, """")
txq = Left(tx2, qq)
tx3 = Mid(tx2, qq + 1)
'
' col-1
cc = InStr(tx1, ",")
txDAN(1) = Left(tx1, cc - 1)
tmp = Mid(tx1, cc + 1)
' col-2
cc = InStr(tmp, ",")
txDAN(2) = Left(tmp, cc - 1)
tmp = Mid(tmp, cc + 1)
' col-3
txDAN(3) = tmp
' col-4
txDAN(4) = txq
' col-5..9
tmp = Mid(tx3, 2) ' get rid of lead comma
For ii = 5 To 9
cc = InStr(tmp, ",")
txDAN(ii) = Left(tmp, cc - 1)
tmp = Mid(tmp, cc + 1)
Next ii
' col-10
txDAN(10) = tmp
'
Exit Do ' << for demo purposes only
Loop
Close #1
' display 1st line .. parsed
With FG1
' 1. re-size
.Cols = 2
.Rows = 11
For ii = 0 To 1
.Col = ii
If ii = 0 Then
.ColWidth(ii) = 300
Else
.ColWidth(ii) = 8000
End If
Next ii
' captions
.Row = 0
.Col = 0
.Text = "nn"
.Col = 1
.Text = "Data"
' populate
For ii = 1 To 10
.Row = ii
' 1. col 0
.Col = 0
.Text = ii
' 2. other col
.Col = 1
.Text = txDAN(ii)
.CellAlignment = 1
Next ii
End With
End Sub
Spoo
As a detail, it is not very robust to finding the header with For ii = 1 To 15...Maybe the header is missing or it is placed after line 15. I suppose it would be preffered to have here a Do Until EOF.
Anyway, I really appreciate your involving and your willingness to help me.
Thank you Dexwerx, techgnome, LaVolpe, IkkeEnGij and Olaf for all your support and guidance.
Last edited by Daniel Duta; Aug 15th, 2017 at 05:09 AM.
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
I removed the Header of the File to the first actuall Data
you could create the Headernames for the Flexgrid
place a Flexgrid on the form and a commandbutton.
Code:
Private Sub Command3_Click()
Dim i As Long, j As Long
Dim s As String, s1() As String, s2() As String
Dim Filename As String
Dim FNr As Integer
Filename = "c:\sample.txt"
FNr = FreeFile
Open Filename For Binary As #FNr
'Länge feststellen und laden
s = Space(LOF(FNr))
Get #FNr, , s
Close #FNr
'FolgeCrLf entfernen
Do While Right(s, 2) = vbCrLf
s = Left(s, Len(s) - 2)
Loop
With MSFlexGrid1
.Clear
'aufteilen in Zeilen
s1 = Split(s, vbCrLf)
.Rows = UBound(s1) + 1
'aufteilen in Spalten
s2() = Split(s1(0), ",")
.Cols = UBound(s2) + 1
.FixedRows = 1
.FixedCols = 0
'Daten übernehmen
For i = 0 To UBound(s1)
s1(i) = Replace(s1(i), Chr(34), vbNullString)
s2() = Split(s1(i), ",")
For j = 0 To UBound(s2)
.TextMatrix(i, j) = s2(j)
Next
Next
End With
End Sub
As a detail, it is not very robust to finding the header with For ii = 1 To 15...Maybe the header is missing or it is placed after line 15. I suppose it would be preffered to have here a Do Until EOF.
No argument .. except ..
.. No argument .. For ii = 1 To 15 is definitely quick-and dirty, Do Until EOF is more sensible
.. Except .. Missing header is definitely an issue. How to determine?
More info would be needed as to possible file structures
1. Couldn't rely on finding 1st comma, since the header you presented contains many.
2. Perhaps a "common key" is that in all file structure variations, the "real data" always has a "mm/dd/yyyy" as lead.
BTW, was I correct in noting the sequence of your desired column "grabbing"? .. ie, 1, 2, 4, 10 and 9
EDIT 1:
Chris
I'm not sure a simple Split function is adequate.
If xtr is a full line of data, then
Code:
s2 = Split(xtr, ",")
produces the following:
Note than s2(3), s2(4) and s2(5) has split what "should be" a single field into 3 pieces (see post #12).
Spoo
Last edited by Spooman; Aug 15th, 2017 at 08:47 AM.
Seems odd that a file sourced from Linux would have CRLF newlines instead of LF. With LFs this would require another read method because VB6 native I/O treats only CR or CRLF as newline. The FSO TextStream will treat LF or CRLF as newline, and the ADO.Stream object allows you to specify its LineSeparator as CR, LF, or CRLF. Or you could read in text BLOBs and deblock lines from them yourself.
Setting that aside, here is another approach for dealing with quoted CSV data. Note that here the array elements are base-0 so "column 1" is index 0, etc:
Code:
Option Explicit
Private Sub Reformat(ByVal FileNameIn As String, ByVal FileNameOut As String)
Const HEADER_PREFIX As String = "Start Time,"
Const QUOTE As String = """"
Const COMMA As String = ","
Const NUL As String = vbNullChar
Dim FileIn As Integer
Dim RawLineIn As String
Dim FoundHeaderLine As Boolean
Dim FoundDataRows As Boolean
Dim FileOut As Integer
Dim RowCount As Long
Dim Fields() As String
Dim F As Long
FileIn = FreeFile(0)
Open FileNameIn For Input As #FileIn
Do Until EOF(FileIn)
Line Input #FileIn, RawLineIn
FoundHeaderLine = Left$(RawLineIn, Len(HEADER_PREFIX)) = HEADER_PREFIX
If FoundHeaderLine Then Exit Do
Loop
If FoundHeaderLine Then
FoundDataRows = Not EOF(FileIn)
If FoundDataRows Then
FileOut = FreeFile(0)
Open FileNameOut For Output As #FileOut
Do Until EOF(FileIn)
Line Input #FileIn, RawLineIn
'Dequote:
Fields = Split(RawLineIn, QUOTE)
For F = 0 To UBound(Fields) Step 2
Fields(F) = Replace$(Fields(F), COMMA, NUL)
Next
'Split into actual fields:
Fields = Split(Join$(Fields, ""), NUL)
Print #FileOut, Fields(0); COMMA; _
QUOTE; Fields(3); QUOTE; COMMA; _
"Reliable"; COMMA; _
Fields(9); COMMA; _
Fields(8)
RowCount = RowCount + 1
Loop
Close #FileOut
End If
End If
Close #FileIn
If FoundHeaderLine Then
If FoundDataRows Then
MsgBox CStr(RowCount) & " rows"
Else
MsgBox "No data rows"
End If
Else
MsgBox "Could not find the header row"
End If
End Sub
Private Sub Main()
Reformat "sample.txt", "newsample.txt"
End Sub
Seems odd that a file sourced from Linux would have CRLF newlines instead of LF. With LFs this would require another read method because VB6 native I/O treats only CR or CRLF as newline. The FSO TextStream will treat LF or CRLF as newline, and the ADO.Stream object allows you to specify its LineSeparator as CR, LF, or CRLF. Or you could read in text BLOBs and deblock lines from them yourself.
I have found there is an explanation...Actually, my original csv file does contain line feeds but once it was converted as txt file (the rules of the forum regarding attachment do not allow csv files) they were replaced with their combination CrLf automatically. In these new conditions, as you pointed out, VB6 failed to handle the file lines as before. I had to reconsider your first approach so that to cover both situations and I have attached a new sample that reproduces the original extension (archived via tar and gz).
Code:
Private Sub ReformatCSV(ByVal FileNameIn As String, ByVal FileNameOut As String)
Const HEADER_PREFIX As String = "Start Time"
Dim FileIn As Integer, FileOut As Integer
Dim i As Long, k As Long
Dim inArr() As String, outArr() As String
Dim strFile As String
FileIn = FreeFile(0)
Open FileNameIn For Input As #FileIn
strFile = Input$(LOF(FileIn), FileIn) 'load an entire file
Close #FileIn
If InStr(1, strFile, vbCrLf) = 0 Then
strFile = Replace(strFile, vbLf, vbCrLf) 'replace any linefeed
strFile = Replace(strFile, Chr(34), "") 'replace any double quota
End If
inArr = Split(strFile, vbCrLf) 'load all file lines in array
For i = 0 To UBound(inArr)
If InStr(1, Mid$(inArr(i), 1, InStr(1, inArr(i), ",")), HEADER_PREFIX) > 0 Then
k = i + 1
Exit For
End If
Next i
If k = 0 Then
MsgBox "Could not find the header row."
Exit Sub
End If
FileOut = FreeFile(0)
Open FileNameOut For Output As #FileOut
For i = k To UBound(inArr) - 1
outArr = Split(inArr(i), ",") 'split each line by comma
Write #FileOut, outArr(0), CInt(outArr(1)), outArr(2), "Reliable"
Next i
Close #FileOut
MsgBox CStr(UBound(inArr) - k) & " rows."
End Sub
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
Just for completeness (with regards to Daniels latest example-data) - here the adaption of the Form-Code
(the little cCSVtoADO class, as posted in #15 was already generically working and can remain unchanged)...
Code:
Option Explicit
Private CSVtoADO As New cCSVtoADO
Private Sub Command1_Click()
New_c.Timing True
Dim BSrc() As Byte, BDst() As Byte
BSrc = New_c.FSO.ReadByteContent("c:\temp\sample.tar.gz")
New_c.Crypt.GzDecompress BSrc, BDst 'GZ-Decompression from Src-to-Dst-ByteArrays
CSVtoADO.ParseCSVBytes BDst
With CSVtoADO
Dim Rs As cRecordset, FldList As String 'Select Columns 1, 2, 4, 10 and 9 using zerobased Hdr-Indexes
With New_c.ArrayList(vbString, .Hdrs(0), .Hdrs(1), .Hdrs(3), .Hdrs(9), .Hdrs(8))
Set Rs = CSVtoADO.MemDB.GetRs("Select [" & .Join("],[") & "] From T")
End With
' 'or alternatively (as in the other snippet), do the Select directly by explicitly given Field-Names
' Set Rs = .MemDB.GetRs("Select [Start Time], Period, Identifier, [Duration (s)], [(%)] From T")
End With
Caption = "Import and Select of " & Rs.RecordCount & " Records took:" & New_c.Timing
Set MSHFlexGrid1.DataSource = Rs.GetADORsFromContent 'convert SQLite-Rs to ADO-Rs and hang it into the Grid
End Sub
Note, that the Import happens directly from the Byte-Array-Representation of the tar.gz...
so the whole thing can be imported without the need to write any temporary Files.
Here again the timing for the whole thing (including the GZ-Decompression from Src-to-Dst-ByteArrays):
Just for completeness (with regards to Daniels latest example-data) - here the adaption of the Form-Code
(the little cCSVtoADO class, as posted in #15 was already generically working and can remain unchanged)...
Processing the csv without desarchiving is an advantage and I think it is worth a try. But, could you be more specific what/where is cCSVtoADO ? Also, is New_c class a timer created by you? I just want to add that I need to save the final result not in a grid but in a simple text file because this restructured file will be used as input for an application. Thank you.
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
Processing the csv without desarchiving is an advantage and I think it is worth a try. But, could you be more specific what/where is cCSVtoADO ?
It's a (Project-Private) Class you could create and name appropriately as cCSVtoADO, then paste the little code-snippet into it, which I described in post #15...
Originally Posted by Daniel Duta
Also, is New_c class a timer created by you?
No, it's the (globally available without declaring it) "New Object Constructor" (after you set a reference to vbRichClient5 into your Project).
Originally Posted by Daniel Duta
I just want to add that I need to save the final result not in a grid but in a simple text file because this restructured file will be used as input for an application. Thank you.
Then I misunderstood somehow ...
Though it's easy to define a second Class (let's name it cCSVtoSB) and adapt the Code (from the other Class) for StringBuilder-Output instead of MemDB-import.
Code:
Option Explicit
Implements ICSVCallback
Public CSV As cCSV, Hdrs As cArrayList, SB As cStringBuilder
Public Sub ParseCSVBytes(CSVContent() As Byte)
Set Hdrs = New_c.ArrayList(vbString)
Set SB = Nothing
Set CSV = New_c.CSV
CSV.ParseBytes CSVContent, Me
End Sub
Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
Static S As String
If SB Is Nothing Then 'we are still in the header
If Not (ColNr = 0 And Hdrs.Count > 1) Then 'we have not yet reached or read all the HeaderCols
S = CSV.GetStringValue(B, BValStartPos, BValLen)
If Len(S) Then If ColNr < Hdrs.Count Then Hdrs(ColNr) = S Else Hdrs.Add S
Else 'now we have the Header
Set SB = New_c.StringBuilder: S = """Reliable""" & vbCrLf 'buffer the last Col (including CrLf)
End If
End If
If Not SB Is Nothing Then 'now we are in "Row-Insert-Mode"
Select Case ColNr
Case 0: SB.Append """" & CSV.GetStringValue(B, BValStartPos, BValLen) & ""","
Case 1: SB.Append CSV.GetStringValue(B, BValStartPos, BValLen) & ","
Case 2: SB.Append """" & CSV.GetStringValue(B, BValStartPos, BValLen) & ""","
Case 3: SB.Append S 'this is the last ColNr we write (including CrLf)
End Select
End If
End Function
The above would then produce the same FileOutput as you did in your routine, only more than twice as fast (including GZ-decompression).
For convenience sake, I've zipped the whole thing up now (including the MemDB-approach): CSVtoMemDBorStringBuilder.zip
As said, the above project needs a registered vbRichClient5 on your dev-machine (a download of a recent version is recommended).
The above would then produce the same FileOutput as you did in your routine, only more than twice as fast (including GZ-decompression).
Yes, you are right, I have got similar results. I have noticed that your class can detect first valid csv row automatically and that is good. But let's say that I would like to restructure the rows in this way :
Nothing more than adding a new column at the end with two fields merged together. How could I integrate this concatenation of fields in your function as long as I cannot see the previous field values from the same row ? In first case I have a field index but in the second I do not.
Code:
If Not SB Is Nothing Then 'now we are in "Row-Insert-Mode"
Select Case ColNr
Case 0: SB.Append """" & CSV.GetStringValue(B, BValStartPos, BValLen) & ""","
Case 1: SB.Append CSV.GetStringValue(B, BValStartPos, BValLen) & ","
Case 2: SB.Append """" & CSV.GetStringValue(B, BValStartPos, BValLen) & ""","
Case 3: SB.Append S 'this is the last ColNr we write (including CrLf)
End Select
End If
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
Yes, you are right, I have got similar results. I have noticed that your class can detect first valid csv row automatically and that is good. But let's say that I would like to restructure the rows in this way :
Nothing more than adding a new column at the end with two fields merged together.
How could I integrate this concatenation of fields in your function as long as I cannot see the previous field values from the same row ?
In first case I have a field index but in the second I do not.
Just define a Col-specific BufferContainer at the Class-Level to store intermediate Col-Values...
Code:
Private C(0 To 63) As String 'define a static ColValue-Buffer at Class-Level
...
...
If Not SB Is Nothing Then 'now we are in "Row-Insert-Mode"
Select Case ColNr 'generic ColArr-Buffering (reading only selected Columns from the incoming stream)
Case 0, 1, 2: C(ColNr) = CSV.GetStringValue(B, BValStartPos, BValLen)
End Select
If ColNr = Hdrs.Count - 1 Then 'write the new line, after the parser reached the last ColNr in the Row
SB.AppendNL """" & C(0) & """," & C(1) & ",""" & C(2) & """,""Reliable"",""" & C(0) & " anyString " & C(2) & """"
End If
End If
Just define a Col-specific BufferContainer at the Class-Level to store intermediate Col-Values...
Code:
Private C(0 To 63) As String 'define a static ColValue-Buffer at Class-Level
...
...
If Not SB Is Nothing Then 'now we are in "Row-Insert-Mode"
Select Case ColNr 'generic ColArr-Buffering (reading only selected Columns from the incoming stream)
Case 0, 1, 2: C(ColNr) = CSV.GetStringValue(B, BValStartPos, BValLen)
End Select
If ColNr = Hdrs.Count - 1 Then 'write the new line, after the parser reached the last ColNr in the Row
SB.AppendNL """" & C(0) & """," & C(1) & ",""" & C(2) & """,""Reliable"",""" & C(0) & " anyString " & C(2) & """"
End If
End If
HTH
Olaf
I should have known that.. Now I see that you changed the first Append method with AppendNL. Could I ask you which is the difference from one an other ? Also, I have noticed that for all your global objects you call a main class as New_c and I have not idea how it is referenced or where it placed in the main library vbRichClient5. Regarding your csv parser I have to recognize that it is reliable, fast and advantageous especially if we consider here the GZ-Decompression process ...However, I would like to launch a new challenge to this library just to test its versatility. Mainly, my task requires to move some csv files from a certain format (with LF delimitation and some particular fields) to other format that involves some changes of the fields. Besides that I have to merge my input files (they are six) in one output file only. For this requirement I should keep the output file open until the latest input file is parsed and appended line by line. At the moment the function ICSVCallback_NewValue together with New_c.FSO.WriteTextContent App.Path & "\sample_out2.csv", CSVtoSB.SB.ToString do not seem to consider such a detail. I imagine that ReadByteContent, GzDecompress and ParseCSVBytes should be included in a loop but how to keep the destination file (output) open on appending so that to avoid an overwriting process ? Thank you in advance.
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
you changed the first Append method with AppendNL. Could I ask you which is the difference from one an other ?
One of them appends the passed String as a New Line...
Originally Posted by Daniel Duta
Also, I have noticed that for all your global objects you call a main class as New_c and I have not idea how it is referenced or where it placed in the main library vbRichClient5.
If you do a search for the term "New_c" in the VB6-ObjectExplorer, you will find
it listed as a Member of the Class cGlobal.
And cGlobal in turn, is a Class which was compiled with its Instancing-Property set to: 6 - GlobalMultiUse.
VB6-Dll-Classes which were compiled with Instancing = 6, will expose their methods directly in the Code-Editor (without prefixing them with an ObjectName).
Originally Posted by Daniel Duta
Besides that I have to merge my input files (they are six) in one output file only.
For this requirement I should keep the output file open until the latest input file is parsed and appended line by line.
You can do that completely outside the CSVtoSB-ObjectInstance (within your Form, or Module, or other Class).
Here is a few extension-lines, for what's currently in Command2_Click of the last example I've posted further above (as a Zip).
I've marked the additions to the old code in magenta...
Code:
New_c.Timing True
Dim Strm As cStream, DL As cDirList, i As Long, BSrc() As Byte, BDst() As Byte
'open the output-file as a Stream
Set Strm = New_c.FSO.CreateFileStream(App.Path & "\outfile.csv", STRM_WRITE)
'scan a directory for *.gz Files (choose any Pattern, the VB6-Like statement allows)
Set DL = New_c.FSO.GetDirList(App.Path, dlSortNone, "sample*.gz")
'loop over the found files in DL
For i = 0 To DL.FilesCount - 1
BSrc = New_c.FSO.ReadByteContent(DL.Path & DL.FileName(i)) 'read the source-file
New_c.Crypt.GzDecompress BSrc, BDst 'GZ-Decompression from Src-to-Dst-ByteArrays
CSVtoSB.ParseCSVBytes BDst 'perform the CSV-parsing (buffering the result in SB)
Strm.WriteFromByteArr CSVtoSB.SB.ToUTF8 'append the current content of SB as a ByteArray to the out-stream
Next
Caption = Caption & ", cCSVtoSB" & New_c.Timing
The above code is fully compatible with the former Demo-Code (it produces the same results).
It is scanning the App.Path for all files with the ending *.gz - and since the Demo-AppPath
contains only one of those Files, the loop covers only that single one of course.
As for the Strm-Object (the cStream-instance), it will hold the file(handle) open internally, until you set
the Object to Nothing explicitely - or until the Procedure which contains the Strm-Variable-Declaration gets exited
(then the Strm-Object is automatically set to Nothing and the internal FileHandle released as well).
If you do a search for the term "New_c" in the VB6-ObjectExplorer, you will find
it listed as a Member of the Class global.
And cGlobal in turn, is a Class which was compiled with its Instancing-Property set to: 6 - GlobalMultiUse.
VB6-Dll-Classes which were compiled with Instancing = 6, will expose their methods directly in the Code-Editor (without prefixing them with an ObjectName).
I searched for the New_c and I didn't find it in the VB6-ObjectExplorer, anyway I have never heard such details before. I just wonder if New_c is member of cGlobal class how could we expose (see in a list) other members of the same class ? Perhaps you could paste me a link regarding this topic because it is useful. Other thing that remained a bit unclear to me is although I use objects from vbRichClient library I don't have any declaration or set to it at the first level (such as Private vbRichLib As vbRichClient5). Is it ok ?
Regarding CSV parser, after I implemented your new lines it worked from first attempt as I wanted. It is pretty amazing that all processes involved (getting the compatible files, reading all inputs, CSV-parsing, writing the output file) take less than 100 ms. I don't know how to thank you enough because usually such improvements/libraries are not free.
"VB code is practically pseudocode" - Tanner Helland "When you do things right, people won't be sure you've done anything at all" - Matt Groening "If you wait until you are ready, it is almost certainly too late" - Seth Godin "Believe nothing you hear, and only one half that you see" - Edgar Allan Poe
I searched for the New_c and I didn't find it in the VB6-ObjectExplorer, anyway I have never heard such details before. I just wonder if New_c is member of cGlobal class how could we expose (see in a list) other members of the same class ? Perhaps you could paste me a link regarding this topic because it is useful.
Other thing that remained a bit unclear to me is although I use objects from vbRichClient library I don't have any declaration or set to it at the first level (such as Private vbRichLib As vbRichClient5). Is it ok ?[/QUOTE]
new_C is method of a global multiuse class (cGlobal), which is why you don't need to create an instance of it in order to use it (and yes, it is OK to reference it without having an instance).
PS: You can also create your own global multiuse classes by changing te instancing property of a class in an ActiveX DLL to 6-GlobalMultiuse.
Regarding CSV parser, after I implemented your new lines it worked from first attempt as I wanted. It is pretty amazing that all processes involved (getting the compatible files, reading all inputs, CSV-parsing, writing the output file) take less than 100 ms. I don't know how to thank you enough because usually such improvements/libraries are not free.
Well you are in for a treat as you explore the power and features of vbRichClient5! We're very fortunate to have it available to us as it really extends the usefulness of our favourite language and development environment.