Results 1 to 6 of 6

Thread: Fast & correctly reading CSV parser (Comma-separated values)

  1. #1

    Thread Starter
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Fast & correctly reading CSV parser (Comma-separated values)

    I'm rather surprised that I didn't really find any CSV parsers from our CodeBank. You can find CSV saving code, but reading a CSV file seems to be ignored.

    My function here attempts to read a CSV file properly. This means:
    • Rows are separated by line change.
    • Columns are separated by comma.
    • Commas and line changes are allowed within quotes.
    • Dual quotes within quotes = quote.
    • Space characters are preserved as is required by RFC4180
    • You can customize row, column and quote separators (this allows reading tab separated CSV files generated by Excel)


    The procedure returns one dimensional string array, but it holds 2D data. This is why it also outputs the number of columns and rows.

    Usage is as follows:
    Code:
    Dim strCSV() As String, lngRows As Long, lngCols As Long
        Dim strFile As String, lngA As Long
    
        ' read sample data to strFile
        Open "test.csv" For Input As #1
            strFile = Input(LOF(1), #1)
        Close #1
    
        ' parse CSV
        SplitCSV strFile, strCSV, lngCols, lngRows
    
        ' sample output
        For lngA = 0 To UBound(strCSV)
            Debug.Print "Row " & (lngA \ lngCols), "Column " & (lngA Mod lngCols), "Data: " & strCSV(lngA)
        Next lngA
    You can also input the number of columns you want and the rest of the columns are ignored. This does not work for rows, all rows are always returned. The column feature is useful if you want to add new columns, you can simply directly work with the string array and have the new columns available.

    Finally, the procedure below is optimized for speed. It could be better, but it does a very good job as it is. Using other solutions such as a VBScript regular expression will be a lot slower.

    CODE IN NEXT POST
    Last edited by Merri; Feb 22nd, 2010 at 10:59 AM.

  2. #2

    Thread Starter
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Fast CSV (Comma-separated values) parser

    Code:
    ' modCSV.bas
    Option Explicit
    
    Private Declare Function ArrayPtr Lib "msvbvm60" Alias "VarPtr" (Arr() As Any) As Long
    Private Declare Sub PutMem4 Lib "msvbvm60" (ByVal Ptr As Long, ByVal Value As Long)
    Private Declare Function SafeArrayRedim Lib "oleaut32" (ByVal saPtr As Long, saBound As Long) As Long
    
    ' returns one dimensional zero based string array in ResultSplit containing parsed CSV cells
    ' - ResultCols (in/out) number of columns; if positive on input the CSV data is fixed to given number of columns
    ' - ResultRows (out) number of rows
    Public Sub SplitCSV(Expression As String, ResultSplit() As String, ResultCols As Long, ResultRows As Long, Optional ColumnDelimiter As String = ",", Optional RowDelimiter As String = vbNewLine, Optional Quote As String = """")
        Dim CSV() As Integer, HeaderCSV(5) As Long, lngCSV As Long
        ' general variables that we need
        Dim intColumn As Integer, intQuote As Integer, lngRow As Long, strRow As String
        Dim lngExpLen As Long, lngRowLen As Long
        Dim blnQuote As Boolean, lngA As Long, lngB As Long, lngC As Long, lngCount As Long, lngResults() As Long
        ' some dummy variables that we happen to need
        Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
        ' length information
        lngExpLen = LenB(Expression)
        lngRowLen = LenB(RowDelimiter)
        ' validate lengths
        If lngExpLen > 0 And lngRowLen > 0 Then
            ' column delimiter
            If LenB(ColumnDelimiter) Then intColumn = AscW(ColumnDelimiter): ColumnDelimiter = Left$(ColumnDelimiter, 1) Else intColumn = 44: ColumnDelimiter = ","
            ' quote character
            If LenB(Quote) Then intQuote = AscW(Quote): Quote = Left$(Quote, 1) Else intQuote = 34: Quote = """"
            ' maximum number of results
            ReDim lngResults(0 To (lngExpLen \ lngRowLen))
            ' prepare CSV array
            HeaderCSV(0) = 1
            HeaderCSV(1) = 2
            HeaderCSV(3) = StrPtr(Expression)
            HeaderCSV(4) = Len(Expression)
            ' assign Expression data to the Integer array
            lngCSV = ArrayPtr(CSV)
            PutMem4 lngCSV, VarPtr(HeaderCSV(0))
            ' find first row delimiter, see if within quote or not
            lngA = InStrB(1, Expression, RowDelimiter, Compare)
            Do Until (lngA And 1) Or (lngA = 0)
                lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
            Loop
            lngB = InStrB(1, Expression, Quote, Compare)
            Do Until (lngB And 1) Or (lngB = 0)
                lngB = InStrB(lngB + 1, Expression, Quote, Compare)
            Loop
            Do While lngA > 0
                If lngA + lngRowLen <= lngB Or lngB = 0 Then
                    lngResults(lngCount) = lngA
                    lngA = InStrB(lngA + lngRowLen, Expression, RowDelimiter, Compare)
                    Do Until (lngA And 1) Or (lngA = 0)
                        lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
                    Loop
                    If lngCount Then
                        lngCount = lngCount + 1
                    Else
                        ' calculate number of resulting columns if invalid number of columns
                        If ResultCols < 1 Then
                            ResultCols = 1
                            intColumn = AscW(ColumnDelimiter)
                            For lngC = 0 To (lngResults(0) - 1) \ 2
                                If blnQuote Then
                                    If CSV(lngC) <> intQuote Then Else blnQuote = False
                                Else
                                    Select Case CSV(lngC)
                                        Case intQuote
                                            blnQuote = True
                                        Case intColumn
                                            ResultCols = ResultCols + 1
                                    End Select
                                End If
                            Next lngC
                        End If
                        lngCount = 1
                    End If
                Else
                    lngB = InStrB(lngB + 2, Expression, Quote, Compare)
                    Do Until (lngB And 1) Or (lngB = 0)
                        lngB = InStrB(lngB + 1, Expression, Quote, Compare)
                    Loop
                    If lngB Then
                        lngA = InStrB(lngB + 2, Expression, RowDelimiter, Compare)
                        Do Until (lngA And 1) Or (lngA = 0)
                            lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
                        Loop
                        If lngA Then
                            lngB = InStrB(lngB + 2, Expression, Quote, Compare)
                            Do Until (lngB And 1) Or (lngB = 0)
                                lngB = InStrB(lngB + 1, Expression, Quote, Compare)
                            Loop
                        End If
                    End If
                End If
            Loop
            lngResults(lngCount) = lngExpLen + 1
            ' number of rows
            ResultRows = lngCount + 1
            ' string array items to return
            ReDim Preserve ResultSplit(0 To ResultRows * ResultCols - 1)
            ' first row
            lngCount = 0
            strRow = LeftB$(Expression, lngResults(0) - 1)
            HeaderCSV(3) = StrPtr(strRow)
            lngC = 0
            blnQuote = False
            For lngB = 0 To (lngResults(0) - 1) \ 2
                If blnQuote Then
                    Select Case CSV(lngB)
                        Case intQuote
                            If CSV(lngB + 1) = intQuote Then
                                ' skip next char (quote)
                                lngB = lngB + 1
                                ' add quote char
                                CSV(lngC) = intQuote
                                lngC = lngC + 1
                            Else
                                blnQuote = False
                            End If
                        Case Else
                            ' add this char
                            If lngB > lngC Then CSV(lngC) = CSV(lngB)
                            lngC = lngC + 1
                    End Select
                Else
                    Select Case CSV(lngB)
                        Case intQuote
                            blnQuote = True
                        Case intColumn
                            ' add this column
                            ResultSplit(lngCount) = Left$(strRow, lngC)
                            ' max column reached?
                            lngCount = lngCount + 1
                            If lngCount >= ResultCols Then Exit For
                            ' start filling column string buffer from start (strRow)
                            lngC = 0
                        Case Else
                            ' add this char
                            If lngB > lngC Then CSV(lngC) = CSV(lngB)
                            lngC = lngC + 1
                    End Select
                End If
            Next lngB
            ' add last column item?
            If lngCount < ResultCols Then ResultSplit(lngCount) = Left$(strRow, lngC - 1)
            ' rows after first
            For lngA = 1 To ResultRows - 1
                ' start index for columns
                lngRow = lngA * ResultCols
                lngCount = 0
                strRow = MidB$(Expression, lngResults(lngA - 1) + lngRowLen, lngResults(lngA) - lngResults(lngA - 1) - lngRowLen)
                HeaderCSV(3) = StrPtr(strRow)
                lngC = 0
                blnQuote = False
                For lngB = 0 To (lngResults(lngA) - lngResults(lngA - 1) - lngRowLen) \ 2
                    If blnQuote Then
                        Select Case CSV(lngB)
                            Case intQuote
                                If CSV(lngB + 1) = intQuote Then
                                    ' skip next char (quote)
                                    lngB = lngB + 1
                                    ' add quote char
                                    CSV(lngC) = intQuote
                                    lngC = lngC + 1
                                Else
                                    blnQuote = False
                                End If
                            Case Else
                                ' add this char
                                CSV(lngC) = CSV(lngB)
                                lngC = lngC + 1
                        End Select
                    Else
                        Select Case CSV(lngB)
                            Case intQuote
                                blnQuote = True
                            Case intColumn
                                ' add this column
                                ResultSplit(lngRow + lngCount) = Left$(strRow, lngC)
                                ' max column reached?
                                lngCount = lngCount + 1
                                If lngCount >= ResultCols Then Exit For
                                ' start filling column string buffer from start (strRow)
                                lngC = 0
                            Case Else
                                ' add this char
                                If lngB > lngC Then CSV(lngC) = CSV(lngB)
                                lngC = lngC + 1
                        End Select
                    End If
                Next lngB
                ' add last column item?
                If lngCount < ResultCols Then ResultSplit(lngRow + lngCount) = Left$(strRow, lngC - 1)
            Next lngA
            ' clean up CSV array
            PutMem4 lngCSV, 0
        Else
            ResultCols = 0
            ResultRows = 0
            ' clean any possible data that exists in the passed string array (like if it is multidimensional)
            If Not Not ResultSplit Then Erase ResultSplit
            ' mysterious IDE error fix
            Debug.Assert App.hInstance
            ' reset to one element, one dimension
            ReDim ResultSplit(0 To 0)
            ' custom redimension: remove the items (this duplicates the VB6 Split behavior)
            SafeArrayRedim Not Not ResultSplit, SafeArrayBound(0)
        End If
    End Sub
    Let me know if you have a troublesome file that does not open as expected and I'll know whether the problem is in the file or in the parser.
    Last edited by Merri; May 2nd, 2010 at 02:34 AM. Reason: Minor bug fix

  3. #3
    Member
    Join Date
    Jul 2017
    Posts
    43

    Re: Fast & correctly reading CSV parser (Comma-separated values)

    Here's a slightly modified version that puts the results into a custom type so you can say xResult.Rows(i).Columns(j)

    Usage:
    Code:
    Dim xResult as tCSVResult
    xResult = SplitCSV(strCSVText)
    
    'Loop through rows and column
    for i = lbound(xresult.Rows) to ubound(xresult.Rows)
    for j = lbound(xresult.Rows(i).Columns) to ubound(xresult.Rows(i).Columns)
    next i
    There's probably better ways of doing this, I just hacked it together really quickly and have barely done any testing.



    Code:
    Private Type tCSVRow
        Columns()           As String
        Initialized         As Boolean
    End Type
    
    Public Type tCSVResult
        Rows()              As tCSVRow
        Initialized         As Boolean
        RowCount            As Long
        ColumnCount         As Long
    End Type
    
    
    ' returns one dimensional zero based string array in ResultSplit containing parsed CSV cells
    ' - ResultCols (in/out) number of columns; if positive on input the CSV data is fixed to given number of columns
    ' - ResultRows (out) number of rows
    Public Function SplitCSV(Expression As String, Optional ColumnDelimiter As String = ",", Optional RowDelimiter As String = vbNewLine, Optional Quote As String = """") As tCSVResult
        Dim CSV() As Integer, HeaderCSV(5) As Long, lngCSV As Long
        ' general variables that we need
        Dim intColumn As Integer, intQuote As Integer, lngRow As Long, strRow As String
        Dim lngExpLen As Long, lngRowLen As Long
        Dim blnQuote As Boolean, lngA As Long, lngB As Long, lngC As Long, lngCount As Long, lngResults() As Long
        ' some dummy variables that we happen to need
        Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
        ' length information
        lngExpLen = LenB(Expression)
        lngRowLen = LenB(RowDelimiter)
        ' validate lengths
        If lngExpLen > 0 And lngRowLen > 0 Then
            ' column delimiter
            If LenB(ColumnDelimiter) Then intColumn = AscW(ColumnDelimiter): ColumnDelimiter = Left$(ColumnDelimiter, 1) Else intColumn = 44: ColumnDelimiter = ","
            ' quote character
            If LenB(Quote) Then intQuote = AscW(Quote): Quote = Left$(Quote, 1) Else intQuote = 34: Quote = """"
            ' maximum number of results
            ReDim lngResults(0 To (lngExpLen \ lngRowLen))
            ' prepare CSV array
            HeaderCSV(0) = 1
            HeaderCSV(1) = 2
            HeaderCSV(3) = StrPtr(Expression)
            HeaderCSV(4) = Len(Expression)
            ' assign Expression data to the Integer array
            lngCSV = ArrayPtr(CSV)
            PutMem4 lngCSV, VarPtr(HeaderCSV(0))
            ' find first row delimiter, see if within quote or not
            lngA = InStrB(1, Expression, RowDelimiter, Compare)
            Do Until (lngA And 1) Or (lngA = 0)
                lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
            Loop
            lngB = InStrB(1, Expression, Quote, Compare)
            Do Until (lngB And 1) Or (lngB = 0)
                lngB = InStrB(lngB + 1, Expression, Quote, Compare)
            Loop
            Do While lngA > 0
                If lngA + lngRowLen <= lngB Or lngB = 0 Then
                    lngResults(lngCount) = lngA
                    lngA = InStrB(lngA + lngRowLen, Expression, RowDelimiter, Compare)
                    Do Until (lngA And 1) Or (lngA = 0)
                        lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
                    Loop
                    If lngCount Then
                        lngCount = lngCount + 1
                    Else
                        ' calculate number of resulting columns if invalid number of columns
                        If SplitCSV.ColumnCount < 1 Then
                            SplitCSV.ColumnCount = 1
                            intColumn = AscW(ColumnDelimiter)
                            For lngC = 0 To (lngResults(0) - 1) \ 2
                                If blnQuote Then
                                    If CSV(lngC) <> intQuote Then Else blnQuote = False
                                Else
                                    Select Case CSV(lngC)
                                        Case intQuote
                                            blnQuote = True
                                        Case intColumn
                                            SplitCSV.ColumnCount = SplitCSV.ColumnCount + 1
                                    End Select
                                End If
                            Next lngC
                        End If
                        lngCount = 1
                    End If
                Else
                    lngB = InStrB(lngB + 2, Expression, Quote, Compare)
                    Do Until (lngB And 1) Or (lngB = 0)
                        lngB = InStrB(lngB + 1, Expression, Quote, Compare)
                    Loop
                    If lngB Then
                        lngA = InStrB(lngB + 2, Expression, RowDelimiter, Compare)
                        Do Until (lngA And 1) Or (lngA = 0)
                            lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
                        Loop
                        If lngA Then
                            lngB = InStrB(lngB + 2, Expression, Quote, Compare)
                            Do Until (lngB And 1) Or (lngB = 0)
                                lngB = InStrB(lngB + 1, Expression, Quote, Compare)
                            Loop
                        End If
                    End If
                End If
            Loop
            lngResults(lngCount) = lngExpLen + 1
            ' number of rows
            SplitCSV.RowCount = lngCount '+ 1
            ' string array items to return
            ReDim Preserve SplitCSV.Rows(0 To SplitCSV.RowCount - 1)
            ' first row
            lngCount = 0
            strRow = LeftB$(Expression, lngResults(0) - 1)
            HeaderCSV(3) = StrPtr(strRow)
            lngC = 0
            blnQuote = False
            For lngB = 0 To (lngResults(0) - 1) \ 2
                If blnQuote Then
                    Select Case CSV(lngB)
                        Case intQuote
                            If CSV(lngB + 1) = intQuote Then
                                ' skip next char (quote)
                                lngB = lngB + 1
                                ' add quote char
                                CSV(lngC) = intQuote
                                lngC = lngC + 1
                            Else
                                blnQuote = False
                            End If
                        Case Else
                            ' add this char
                            If lngB > lngC Then CSV(lngC) = CSV(lngB)
                            lngC = lngC + 1
                    End Select
                Else
                    Select Case CSV(lngB)
                        Case intQuote
                            blnQuote = True
                        Case intColumn
                            ' add this column
                            If SplitCSV.Rows(0).Initialized = True Then
                            
                            Else
                                ReDim SplitCSV.Rows(0).Columns(0 To SplitCSV.ColumnCount - 1)
                                SplitCSV.Rows(0).Initialized = True
                            End If
                            SplitCSV.Rows(0).Columns(lngCount) = Left$(strRow, lngC)
                            ' max column reached?
                            lngCount = lngCount + 1
                            If lngCount >= SplitCSV.ColumnCount Then Exit For
                            ' start filling column string buffer from start (strRow)
                            lngC = 0
                        Case Else
                            ' add this char
                            If lngB > lngC Then CSV(lngC) = CSV(lngB)
                            lngC = lngC + 1
                    End Select
                End If
            Next lngB
            ' add last column item?
            If lngCount < SplitCSV.ColumnCount Then
                If SplitCSV.Rows(0).Initialized = True Then
                
                Else
                    ReDim SplitCSV.Rows(0).Columns(0 To SplitCSV.ColumnCount - 1)
                    SplitCSV.Rows(0).Initialized = True
                End If
                SplitCSV.Rows(0).Columns(lngCount) = Left$(strRow, lngC - 1)
            End If
            ' rows after first
            For lngA = 1 To SplitCSV.RowCount - 1
                ' start index for columns
                lngRow = lngA * SplitCSV.ColumnCount
                lngCount = 0
                strRow = MidB$(Expression, lngResults(lngA - 1) + lngRowLen, lngResults(lngA) - lngResults(lngA - 1) - lngRowLen)
                HeaderCSV(3) = StrPtr(strRow)
                lngC = 0
                blnQuote = False
                For lngB = 0 To (lngResults(lngA) - lngResults(lngA - 1) - lngRowLen) \ 2
                    If blnQuote Then
                        Select Case CSV(lngB)
                            Case intQuote
                                If CSV(lngB + 1) = intQuote Then
                                    ' skip next char (quote)
                                    lngB = lngB + 1
                                    ' add quote char
                                    CSV(lngC) = intQuote
                                    lngC = lngC + 1
                                Else
                                    blnQuote = False
                                End If
                            Case Else
                                ' add this char
                                CSV(lngC) = CSV(lngB)
                                lngC = lngC + 1
                        End Select
                    Else
                        Select Case CSV(lngB)
                            Case intQuote
                                blnQuote = True
                            Case intColumn
                                ' add this column
                                If SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True Then
                                
                                Else
                                    ReDim SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(0 To SplitCSV.ColumnCount - 1)
                                    SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True
                                End If
                                SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(lngCount) = Left$(strRow, lngC)
                                ' max column reached?
                                lngCount = lngCount + 1
                                If lngCount >= SplitCSV.ColumnCount Then Exit For
                                ' start filling column string buffer from start (strRow)
                                lngC = 0
                            Case Else
                                ' add this char
                                If lngB > lngC Then CSV(lngC) = CSV(lngB)
                                lngC = lngC + 1
                        End Select
                    End If
                Next lngB
                ' add last column item?
                If lngCount < SplitCSV.ColumnCount Then
                    If SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True Then
                    
                    Else
                        ReDim SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(0 To SplitCSV.ColumnCount - 1)
                        SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True
                    End If
                    SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(lngCount) = Left$(strRow, lngC - 1)
                End If
            Next lngA
            ' clean up CSV array
            PutMem4 lngCSV, 0
        Else
            SplitCSV.ColumnCount = 0
            SplitCSV.RowCount = 0
            ' clean any possible data that exists in the passed string array (like if it is multidimensional)
            If Not Not SplitCSV.Rows Then Erase SplitCSV.Rows
            ' mysterious IDE error fix
            Debug.Assert App.hInstance
            ' reset to one element, one dimension
            ReDim SplitCSV.Rows(0 To 0)
            ' custom redimension: remove the items (this duplicates the VB6 Split behavior)
            SafeArrayRedim Not Not SplitCSV.Rows, SafeArrayBound(0)
        End If
    End Function

  4. #4
    Frenzied Member
    Join Date
    Aug 2020
    Posts
    1,421

    Re: Fast & correctly reading CSV parser (Comma-separated values)

    Quote Originally Posted by bahbahbah View Post
    Here's a slightly modified version that puts the results into a custom type so you can say xResult.Rows(i).Columns(j)

    Usage:
    Code:
    Dim xResult as tCSVResult
    xResult = SplitCSV(strCSVText)
    
    'Loop through rows and column
    for i = lbound(xresult.Rows) to ubound(xresult.Rows)
    for j = lbound(xresult.Rows(i).Columns) to ubound(xresult.Rows(i).Columns)
    next i
    There's probably better ways of doing this, I just hacked it together really quickly and have barely done any testing.



    Code:
    Private Type tCSVRow
        Columns()           As String
        Initialized         As Boolean
    End Type
    
    Public Type tCSVResult
        Rows()              As tCSVRow
        Initialized         As Boolean
        RowCount            As Long
        ColumnCount         As Long
    End Type
    
    
    ' returns one dimensional zero based string array in ResultSplit containing parsed CSV cells
    ' - ResultCols (in/out) number of columns; if positive on input the CSV data is fixed to given number of columns
    ' - ResultRows (out) number of rows
    Public Function SplitCSV(Expression As String, Optional ColumnDelimiter As String = ",", Optional RowDelimiter As String = vbNewLine, Optional Quote As String = """") As tCSVResult
        Dim CSV() As Integer, HeaderCSV(5) As Long, lngCSV As Long
        ' general variables that we need
        Dim intColumn As Integer, intQuote As Integer, lngRow As Long, strRow As String
        Dim lngExpLen As Long, lngRowLen As Long
        Dim blnQuote As Boolean, lngA As Long, lngB As Long, lngC As Long, lngCount As Long, lngResults() As Long
        ' some dummy variables that we happen to need
        Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
        ' length information
        lngExpLen = LenB(Expression)
        lngRowLen = LenB(RowDelimiter)
        ' validate lengths
        If lngExpLen > 0 And lngRowLen > 0 Then
            ' column delimiter
            If LenB(ColumnDelimiter) Then intColumn = AscW(ColumnDelimiter): ColumnDelimiter = Left$(ColumnDelimiter, 1) Else intColumn = 44: ColumnDelimiter = ","
            ' quote character
            If LenB(Quote) Then intQuote = AscW(Quote): Quote = Left$(Quote, 1) Else intQuote = 34: Quote = """"
            ' maximum number of results
            ReDim lngResults(0 To (lngExpLen \ lngRowLen))
            ' prepare CSV array
            HeaderCSV(0) = 1
            HeaderCSV(1) = 2
            HeaderCSV(3) = StrPtr(Expression)
            HeaderCSV(4) = Len(Expression)
            ' assign Expression data to the Integer array
            lngCSV = ArrayPtr(CSV)
            PutMem4 lngCSV, VarPtr(HeaderCSV(0))
            ' find first row delimiter, see if within quote or not
            lngA = InStrB(1, Expression, RowDelimiter, Compare)
            Do Until (lngA And 1) Or (lngA = 0)
                lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
            Loop
            lngB = InStrB(1, Expression, Quote, Compare)
            Do Until (lngB And 1) Or (lngB = 0)
                lngB = InStrB(lngB + 1, Expression, Quote, Compare)
            Loop
            Do While lngA > 0
                If lngA + lngRowLen <= lngB Or lngB = 0 Then
                    lngResults(lngCount) = lngA
                    lngA = InStrB(lngA + lngRowLen, Expression, RowDelimiter, Compare)
                    Do Until (lngA And 1) Or (lngA = 0)
                        lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
                    Loop
                    If lngCount Then
                        lngCount = lngCount + 1
                    Else
                        ' calculate number of resulting columns if invalid number of columns
                        If SplitCSV.ColumnCount < 1 Then
                            SplitCSV.ColumnCount = 1
                            intColumn = AscW(ColumnDelimiter)
                            For lngC = 0 To (lngResults(0) - 1) \ 2
                                If blnQuote Then
                                    If CSV(lngC) <> intQuote Then Else blnQuote = False
                                Else
                                    Select Case CSV(lngC)
                                        Case intQuote
                                            blnQuote = True
                                        Case intColumn
                                            SplitCSV.ColumnCount = SplitCSV.ColumnCount + 1
                                    End Select
                                End If
                            Next lngC
                        End If
                        lngCount = 1
                    End If
                Else
                    lngB = InStrB(lngB + 2, Expression, Quote, Compare)
                    Do Until (lngB And 1) Or (lngB = 0)
                        lngB = InStrB(lngB + 1, Expression, Quote, Compare)
                    Loop
                    If lngB Then
                        lngA = InStrB(lngB + 2, Expression, RowDelimiter, Compare)
                        Do Until (lngA And 1) Or (lngA = 0)
                            lngA = InStrB(lngA + 1, Expression, RowDelimiter, Compare)
                        Loop
                        If lngA Then
                            lngB = InStrB(lngB + 2, Expression, Quote, Compare)
                            Do Until (lngB And 1) Or (lngB = 0)
                                lngB = InStrB(lngB + 1, Expression, Quote, Compare)
                            Loop
                        End If
                    End If
                End If
            Loop
            lngResults(lngCount) = lngExpLen + 1
            ' number of rows
            SplitCSV.RowCount = lngCount '+ 1
            ' string array items to return
            ReDim Preserve SplitCSV.Rows(0 To SplitCSV.RowCount - 1)
            ' first row
            lngCount = 0
            strRow = LeftB$(Expression, lngResults(0) - 1)
            HeaderCSV(3) = StrPtr(strRow)
            lngC = 0
            blnQuote = False
            For lngB = 0 To (lngResults(0) - 1) \ 2
                If blnQuote Then
                    Select Case CSV(lngB)
                        Case intQuote
                            If CSV(lngB + 1) = intQuote Then
                                ' skip next char (quote)
                                lngB = lngB + 1
                                ' add quote char
                                CSV(lngC) = intQuote
                                lngC = lngC + 1
                            Else
                                blnQuote = False
                            End If
                        Case Else
                            ' add this char
                            If lngB > lngC Then CSV(lngC) = CSV(lngB)
                            lngC = lngC + 1
                    End Select
                Else
                    Select Case CSV(lngB)
                        Case intQuote
                            blnQuote = True
                        Case intColumn
                            ' add this column
                            If SplitCSV.Rows(0).Initialized = True Then
                            
                            Else
                                ReDim SplitCSV.Rows(0).Columns(0 To SplitCSV.ColumnCount - 1)
                                SplitCSV.Rows(0).Initialized = True
                            End If
                            SplitCSV.Rows(0).Columns(lngCount) = Left$(strRow, lngC)
                            ' max column reached?
                            lngCount = lngCount + 1
                            If lngCount >= SplitCSV.ColumnCount Then Exit For
                            ' start filling column string buffer from start (strRow)
                            lngC = 0
                        Case Else
                            ' add this char
                            If lngB > lngC Then CSV(lngC) = CSV(lngB)
                            lngC = lngC + 1
                    End Select
                End If
            Next lngB
            ' add last column item?
            If lngCount < SplitCSV.ColumnCount Then
                If SplitCSV.Rows(0).Initialized = True Then
                
                Else
                    ReDim SplitCSV.Rows(0).Columns(0 To SplitCSV.ColumnCount - 1)
                    SplitCSV.Rows(0).Initialized = True
                End If
                SplitCSV.Rows(0).Columns(lngCount) = Left$(strRow, lngC - 1)
            End If
            ' rows after first
            For lngA = 1 To SplitCSV.RowCount - 1
                ' start index for columns
                lngRow = lngA * SplitCSV.ColumnCount
                lngCount = 0
                strRow = MidB$(Expression, lngResults(lngA - 1) + lngRowLen, lngResults(lngA) - lngResults(lngA - 1) - lngRowLen)
                HeaderCSV(3) = StrPtr(strRow)
                lngC = 0
                blnQuote = False
                For lngB = 0 To (lngResults(lngA) - lngResults(lngA - 1) - lngRowLen) \ 2
                    If blnQuote Then
                        Select Case CSV(lngB)
                            Case intQuote
                                If CSV(lngB + 1) = intQuote Then
                                    ' skip next char (quote)
                                    lngB = lngB + 1
                                    ' add quote char
                                    CSV(lngC) = intQuote
                                    lngC = lngC + 1
                                Else
                                    blnQuote = False
                                End If
                            Case Else
                                ' add this char
                                CSV(lngC) = CSV(lngB)
                                lngC = lngC + 1
                        End Select
                    Else
                        Select Case CSV(lngB)
                            Case intQuote
                                blnQuote = True
                            Case intColumn
                                ' add this column
                                If SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True Then
                                
                                Else
                                    ReDim SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(0 To SplitCSV.ColumnCount - 1)
                                    SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True
                                End If
                                SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(lngCount) = Left$(strRow, lngC)
                                ' max column reached?
                                lngCount = lngCount + 1
                                If lngCount >= SplitCSV.ColumnCount Then Exit For
                                ' start filling column string buffer from start (strRow)
                                lngC = 0
                            Case Else
                                ' add this char
                                If lngB > lngC Then CSV(lngC) = CSV(lngB)
                                lngC = lngC + 1
                        End Select
                    End If
                Next lngB
                ' add last column item?
                If lngCount < SplitCSV.ColumnCount Then
                    If SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True Then
                    
                    Else
                        ReDim SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(0 To SplitCSV.ColumnCount - 1)
                        SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Initialized = True
                    End If
                    SplitCSV.Rows(lngRow / SplitCSV.ColumnCount).Columns(lngCount) = Left$(strRow, lngC - 1)
                End If
            Next lngA
            ' clean up CSV array
            PutMem4 lngCSV, 0
        Else
            SplitCSV.ColumnCount = 0
            SplitCSV.RowCount = 0
            ' clean any possible data that exists in the passed string array (like if it is multidimensional)
            If Not Not SplitCSV.Rows Then Erase SplitCSV.Rows
            ' mysterious IDE error fix
            Debug.Assert App.hInstance
            ' reset to one element, one dimension
            ReDim SplitCSV.Rows(0 To 0)
            ' custom redimension: remove the items (this duplicates the VB6 Split behavior)
            SafeArrayRedim Not Not SplitCSV.Rows, SafeArrayBound(0)
        End If
    End Function
    Hi bahbahbah, if you have time and interest, maybe you can compare the performance of your code and RC6.CSV when reading huge text files. These results will be very meaningful.

  5. #5
    Member
    Join Date
    Jul 2017
    Posts
    43

    Re: Fast & correctly reading CSV parser (Comma-separated values)

    Not sure how to use the RC6 CSV module - no documentation. If you can provide a sample I can check it.

    Here's some basic timings using GetTickCount

    Original: 703ms
    Original using types: 860ms
    Original using class modules: 1100ms
    RC6: 125ms

    Not sure that the RC6 one is correct as I've no idea how to use it. I've implemented it like so:

    In declarations:

    Code:
    Implements RC6.ICSVCallback
    
    Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
    
    End Function

    And the code to run it:

    Code:
        Dim xTemp As RC6.cCSV
        Set xTemp = New RC6.cCSV
        xTemp.ParseFile XXXFILENAMEXXX, Me

    Besides the callback, I'm not sure there's a way to get the data? So presumably you'd have to store that yourself? I'd guess that would up the time a wee bit.

  6. #6
    Frenzied Member
    Join Date
    Aug 2020
    Posts
    1,421
    Last edited by SearchingDataOnly; Apr 28th, 2021 at 01:20 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width