[RESOLVED] CSV Full Outer Join - alternatives?-VBForums
Results 1 to 25 of 25

Thread: [RESOLVED] CSV Full Outer Join - alternatives?

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Resolved [RESOLVED] CSV Full Outer Join - alternatives?

    Say we have two CSV files: a.txt and b.txt that each have 4 fields:

    Year, Number, Number, Number

    We want to combine the two files on Year keeping all of the Number values and filling in with zeros where no matching row exists:

    a.txt:
    Code:
    1901,1,1,1
    1902,2,2,2
    1904,4,4,4
    1906,6,6,6
    1908,8,8,8
    b.txt:
    Code:
    1901,11,11,11
    1902,12,12,12
    1903,13,13,13
    1904,14,14,14
    1905,15,15,15
    1906,16,16,16
    Desired c.txt:
    Code:
    1901,1,1,1,11,11,11
    1902,2,2,2,12,12,12
    1903,0,0,0,13,13,13
    1904,4,4,4,14,14,14
    1905,0,0,0,15,15,15
    1906,6,6,6,16,16,16
    1908,8,8,8,0,0,0

    This code tackles it twice, once the "long way" for ilustration:

    Code:
    Option Explicit
    
    'Implementing the Equivalent of a FULL OUTER JOIN in Microsoft Jet SQL.
    
    Private Sub Main()
        'We'll do our work in App.Path:
        ChDir App.Path
        ChDrive App.Path
        'Clean up from any prior test run:
        On Error Resume Next
        Kill "inner.txt"
        Kill "left.txt"
        Kill "right.txt"
        Kill "c4steps.txt"
        Kill "c.txt"
        Kill "schema.ini"
        On Error GoTo 0
        With New ADODB.Connection
            .Open "Provider=Microsoft.Jet.OLEDB.4.0;" _
                & "Data Source='.';" _
                & "Extended Properties='Text;Hdr=No'"
            'Do it in 4 steps for illustration:
            .Execute "SELECT [A].*, [B].[F2], [B].[F3], [B].[F4] " _
                   & "INTO [inner.txt] FROM " _
                   & "[a.txt] [A] INNER JOIN [b.txt] [B] ON " _
                   & "[A].[F1] = [B].[F1]", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Execute "SELECT [A].*, 0 AS [B_F2], 0 AS [B_F3], 0 AS [B_F4] " _
                   & "INTO [left.txt] FROM " _
                   & "[a.txt] [A] LEFT JOIN [b.txt] [B] ON " _
                   & "[A].[F1] = [B].[F1] " _
                   & "WHERE [B].[F1] IS NULL", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Execute "SELECT [B].[F1], 0 AS [A_F2], 0 AS [A_F3], 0 AS [A_F4], " _
                   & "[B].[F2], [B].[F3], [B].[F4] " _
                   & "INTO [right.txt] FROM " _
                   & "[a.txt] [A] RIGHT JOIN [b.txt] [B] ON " _
                   & "[A].[F1] = [B].[F1] " _
                   & "WHERE [A].[F1] IS NULL", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Execute "SELECT * " _
                   & "INTO [c4steps.txt] FROM (" _
                   & "SELECT * FROM [inner.txt] UNION ALL " _
                   & "SELECT * FROM [left.txt] UNION ALL " _
                   & "SELECT * FROM [right.txt]) " _
                   & "ORDER BY [F1]", _
                     , _
                     adCmdText Or adExecuteNoRecords
            'Do it all in one go:
            .Execute "SELECT * " _
                   & "INTO [c.txt] FROM (" _
                   & "SELECT [A].*, [B].[F2], [B].[F3], [B].[F4] " _
                   & "FROM [a.txt] [A] INNER JOIN [b.txt] [B] ON " _
                   & "[A].[F1] = [B].[F1] UNION ALL " _
                   & "SELECT [A].*, 0 AS [B_F2], 0 AS [B_F3], 0 AS [B_F4] " _
                   & "FROM [a.txt] [A] LEFT JOIN [b.txt] [B] ON " _
                   & "[A].[F1] = [B].[F1] " _
                   & "WHERE [B].[F1] IS NULL UNION ALL " _
                   & "SELECT [B].[F1], 0 AS [A_F2], 0 AS [A_F3], 0 AS [A_F4], " _
                   & "[B].[F2], [B].[F3], [B].[F4] " _
                   & "FROM [a.txt] [A] RIGHT JOIN [b.txt] [B] ON " _
                   & "[A].[F1] = [B].[F1] " _
                   & "WHERE [A].[F1] IS NULL) " _
                   & "ORDER BY [F1]", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Close
        End With
        MsgBox "Done"
    End Sub

    My question is: Do we have another way to do this? Perhaps using some other preinstalled library, manually via brute force VB6 code, or with the aid of some 3rd party library for acting on tabular text data?

    I was thinking of the free Log Parser 2.2 from Microsoft but it doesn't do joins of any kind.
    Attached Files Attached Files

  2. #2
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    12,179

    Re: CSV Full Outer Join - alternatives?

    My first thought would be execute a left join first then execute a right join using a where clause that filters it to get only the items where there is no match in the other table/file then combine the results.

    Edit: I have not tried it of course but seems like it would do the trick?

  3. #3
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    16,720

    Re: CSV Full Outer Join - alternatives?

    Here is a solution (at least returns same results as your project). It is based off this wikipedia post.

    Using sSQL to organize my thoughts
    Code:
    sSQL = "Select [A].*, [B].[F2],[B].[F3],[B].[F4] FROM [a.txt] AS [A] " & _
        "INNER JOIN [b.txt] As [B] ON [B].[F1]=[A].[F1] "
    
    sSQL = sSQL & "UNION ALL Select [A1].*, [B1].[F2],[B1].[F3],[B1].[F4] " & _
            "FROM [a.txt] [A1] LEFT OUTER JOIN [b.txt] AS [B1] ON [B1].[F1]=[A1].[F1] " & _
            "WHERE [A1].[F1]=[B1].[F1] AND [B1].[F1] IS NULL "
    
    sSQL = sSQL & "UNION ALL Select [A2].*, [B2].[F2],[B2].[F3],[B2].[F4] " & _
            "FROM [a.txt] [A2] LEFT OUTER JOIN [b.txt] AS [B2] ON [B2].[F1]=[A2].[F1] " & _
            "WHERE [A2].[F1]=[B2].[F1] AND [A2].[F1] IS NULL"
    
    sSQL = "SELECT foj.* INTO [inner.txt] FROM (" & sSQL & ") As foj "
    Now execute that SQL statement.

    Edited: Ignore above. I misread your code and tried to match inner text. Oops. I think a similar solution can be made to match your c.txt. If you don't find it in a bit, I'll look again.
    Last edited by LaVolpe; Sep 20th, 2017 at 05:07 PM.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  4. #4
    Fanatic Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    827

    Re: CSV Full Outer Join - alternatives?

    I know you're no fan of SQLite and vbRichClient5, but I did manage to meet your data requirements using a "simulated" full outer join in SQLite as follows:

    Code:
    Sub TestJoin()
       Dim lo_Cnn As vbRichClient5.cConnection
       Dim lo_Rs As vbRichClient5.cRecordset
       Dim ii As Long
       
       ' Populate sample tables and data....you could slurp this information from a file into the DB of course.
       ' RC5 has a CSV parsing class, StringBuilder class, and FSO class that can make this process easier.
    
       Set lo_Cnn = New_c.Connection(":memory:", DBCreateInMemory)
       lo_Cnn.Execute "CREATE TABLE table_a (year INTEGER, a INTEGER DEFAULT 0, b INTEGER DEFAULT 0, c INTEGER DEFAULT 0)"
       lo_Cnn.Execute "CREATE TABLE table_b (year INTEGER, a INTEGER DEFAULT 0, b INTEGER DEFAULT 0, c INTEGER DEFAULT 0)"
       
       lo_Cnn.Execute "INSERT INTO table_a (year, a, b, c) VALUES (1901, 1,2,3)"
       lo_Cnn.Execute "INSERT INTO table_a (year, a, b, c) VALUES (1902, 4,5,6)"
       lo_Cnn.Execute "INSERT INTO table_a (year, a, b, c) VALUES (1904, 7,8,9)"
       lo_Cnn.Execute "INSERT INTO table_a (year, a, b, c) VALUES (1906, 10,11,12)"
       lo_Cnn.Execute "INSERT INTO table_a (year, a, b, c) VALUES (1908, 13,14,15)"
    
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1901, 101,102,103)"
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1902, 104,105,106)"
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1903, 107,108,109)"
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1904, 110,111,112)"
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1905, 113,114,115)"
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1906, 116,117,118)"
       lo_Cnn.Execute "INSERT INTO table_b (year, a, b, c) VALUES (1908, 119,120,121)"
    
       ' This is the SQL statement that joins the data   
       Set lo_Rs = lo_Cnn.OpenRecordset("SELECT table_a.year, ifnull(table_a.a, 0), ifnull(table_a.b, 0), ifnull(table_a.c, 0), ifnull(table_b.a, 0), ifnull(table_b.b, 0), ifnull(table_b.c, 0) FROM table_a " & _
                       "LEFT JOIN table_b USING(year) UNION ALL " & _
                       "SELECT table_b.year, ifnull(table_a.a, 0), ifnull(table_a.b, 0), ifnull(table_a.c, 0), ifnull(table_b.a, 0), ifnull(table_b.b, 0), ifnull(table_b.c, 0) FROM table_b " & _
                       "LEFT JOIN table_a USING(year) WHERE table_a.year IS NULL ORDER BY year ASC")
       
       ' Output the joined data
       Do While Not lo_Rs.EOF
          For ii = 0 To lo_Rs.Fields.Count - 1
             Debug.Print lo_Rs.Fields(ii).Value & vbTab;
          Next ii
          Debug.Print vbNewLine
          
          lo_Rs.MoveNext
       Loop
    End Sub
    Results:

    Code:
    1901  1  2  3  101   102   103   
    
    1902  4  5  6  104   105   106   
    
    1903  0  0  0  107   108   109   
    
    1904  7  8  9  110   111   112   
    
    1905  0  0  0  113   114   115   
    
    1906  10 11 12 116   117   118   
    
    1908  13 14 15 119   120   121
    Haven't tested if the performance is any better than your solution though.

    Got the idea for the simulated full outer join from here: http://www.sqlitetutorial.net/sqlite-full-outer-join/ (in case it will work with Jet and you like the approach better than what you already have).

  5. #5
    PowerPoster
    Join Date
    Jun 2013
    Posts
    3,200

    Re: CSV Full Outer Join - alternatives?

    In case the Data-Fields (besides the Year-Column) are all Numbers, you could do a unified Group By Query (with a Sum-Aggregate on all Data-Columns) like this:
    Code:
    Select Year, Sum(F1) As F1, Sum(F2) As F2, Sum(F3) As F3, Sum(F4) As F4, Sum(F5) As F5, Sum(F6) As F6 From (
        Select Year, F1, F2, F3, 0 As F4, 0 As F5, 0 As F6 From A
        Union All
        Select Year, 0 As F1, 0 As F2, 0 As F3, F1 As F4, F2 As F5, F3 As F6 From B
    ) Group By Year
    If the CSV-Tables A and B are huge ones, SQLite can act as a fast Import-Target, which then allows also
    faster Queries on its (temporarily created) InMemory or (if the import-data is >500MB) also FileBased-DB-Tables.

    Here's a Demo (needs a Project-Reference to vbRIchClient5 and also an MSHFlexGrid1 on the Form):
    Code:
    Option Explicit
    
    Implements ICSVCallback
    Private Cnn As cConnection, CSV As cCSV, Cmd As cCommand
    
    Private Sub Form_Load()
      Set Cnn = New_c.Connection(, DBCreateInMemory)
          Cnn.Execute "Create Table A(Year Integer Primary Key, F1 Integer, F2 Integer, F3 Integer)"
          Cnn.Execute "Create Table B(Year Integer Primary Key, F1 Integer, F2 Integer, F3 Integer)"
          
      With New_c.StringBuilder
        .Clear
          .AppendNL "1901,1,1,1"
          .AppendNL "1902,2,2,2"
          .AppendNL "1904,4,4,4"
          .AppendNL "1906,6,6,6"
          .AppendNL "1908,8,8,8"
        ImportToTable "A", .ToUTF8
        
        .Clear
          .AppendNL "1901,11,11,11"
          .AppendNL "1902,12,12,12"
          .AppendNL "1903,13,13,13"
          .AppendNL "1904,14,14,14"
          .AppendNL "1905,15,15,15"
          .AppendNL "1906,16,16,16"
        ImportToTable "B", .ToUTF8
    
        .Clear
          .AppendNL "Select A.Year, A.F1,A.F2,A.F3, B.F1 F4,B.F2 F5,B.F3 F6 From A Left Outer Join B Using(Year)"
          .AppendNL "Union"
          .AppendNL "Select B.Year, A.F1,A.F2,A.F3, B.F1 F4,B.F2 F5,B.F3 F6 From B Left Outer Join A Using(Year)"
        Set MSHFlexGrid1.DataSource = Cnn.OpenRecordset(.ToString).DataSource
      End With
    End Sub
    
    Sub ImportToTable(TableName As String, CSVContent() As Byte)
      Cnn.BeginTrans
      Set Cmd = Cnn.CreateCommand("Insert Into " & TableName & " Values(?,?,?,?)")
      Set CSV = New_c.CSV
          CSV.ParseBytes CSVContent, Me
      Cnn.CommitTrans
    End Sub
    Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
      Cmd.SetInt32 ColNr + 1, CSV.parseNumber(B, BValStartPos, BValLen)
      If ColNr = 3 Then Cmd.Execute
    End Function
    Here's the above Demos Output-Result (using the example-data you have given):


    HTH

    Olaf
    Last edited by Schmidt; Sep 20th, 2017 at 05:22 PM.

  6. #6
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    16,720

    Re: CSV Full Outer Join - alternatives?

    Just to follow up...

    This is revised from post #2. It produces same results as your c.txt file and runs significantly faster than the multi-file solution in your zip. If useful, great. If not, no biggie.

    Code:
    sSQL = "Select [A].*, [B].[F2],[B].[F3],[B].[F4] " & _
        "FROM [a.txt] AS [A] INNER JOIN [b.txt] As [B] ON [B].[F1]=[A].[F1] "
        
    sSQL = sSQL & "UNION ALL Select [A1].*, 0 As [F2],0 As [F3],0 As [F4] " & _
            "FROM [a.txt] [A1] LEFT OUTER JOIN [b.txt] AS [B1] ON [B1].[F1]=[A1].[F1] " & _
            "WHERE [B1].[F1] IS NULL "
            
    sSQL = sSQL & "UNION ALL Select [B2].[F1] ,0 As [F2],0 As [F3],0 As [F4], [B2].[F2],[B2].[F3],[B2].[F4] " & _
            "FROM [a.txt] [A2] RIGHT OUTER JOIN [b.txt] AS [B2] ON [B2].[F1]=[A2].[F1] " & _
            "WHERE [A2].[F1] IS NULL"
            
    sSQL = "SELECT foj.* INTO [C.txt] FROM (" & sSQL & ") As foj ORDER BY F1"
    Edited: The downside to this approach is that you have to pre-fill the null columns which means you need to know in advance or build a routine to discover how many there will be in both source files.
    Last edited by LaVolpe; Sep 20th, 2017 at 05:31 PM.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  7. #7

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: CSV Full Outer Join - alternatives?

    The multiple query approach was there for illustration. It is followed by a single compound query.

    I guess what I was hoping to find is something more specifically designed to work with tabular data text files that does this kind of "combination."

    I suppose you could build something around the Collection class as well... ok, yes that seems to work:

    Code:
    Option Explicit
    
    Private Sub Main()
        Dim Collection As Collection
        Dim F As Integer
        Dim Y As Integer, N1 As Integer, N2 As Integer, N3 As Integer
        Dim Item As Variant
    
        'We'll do our work in App.Path:
        ChDir App.Path
        ChDrive App.Path
        Set Collection = New Collection
        F = FreeFile(0)
        Open "a.txt" For Input As #F
        Do Until EOF(F)
            Input #F, Y, N1, N2, N3
            InsertInOrder Collection, Array(Y, N1, N2, N3, 0, 0, 0)
        Loop
        Close #F
        F = FreeFile(0)
        Open "b.txt" For Input As #F
        Do Until EOF(F)
            Input #F, Y, N1, N2, N3
            JoinInOrder Collection, Array(Y, 0, 0, 0, N1, N2, N3)
        Loop
        Close #F
        F = FreeFile(0)
        Open "c.txt" For Output As #F
        For Each Item In Collection
            Write #F, Item(0), Item(1), Item(2), Item(3), Item(4), Item(5), Item(6)
        Next
        Close #F
    
        MsgBox "Done"
    End Sub
    Code:
    Option Explicit
    
    Private Const KEY_INDEX = 0
    
    Public Sub InsertInOrder( _
        ByVal Collection As Collection, _
        ByRef NewItem As Variant)
        'NewItem(0) is the key field.
    
        Dim Min As Long, Mid As Long, Max As Long
        Dim NewKey As Variant, TestKey As Variant
        Dim CompareStrings As Boolean, Compare As Integer
    
        With Collection
            If .Count = 0 Then
                .Add NewItem
                Exit Sub
            End If
            Min = 0
            Max = .Count - 1
            NewKey = NewItem(KEY_INDEX)
            CompareStrings = VarType(NewKey) = vbString
            Do Until Min > Max
                Mid = (Min + Max) \ 2
                TestKey = .Item(Mid + 1)(KEY_INDEX)
                If CompareStrings Then
                    Compare = StrComp(TestKey, NewKey, vbBinaryCompare)
                Else
                    Compare = Sgn(TestKey - NewKey)
                End If
                Select Case Compare
                    Case Is < 0
                        Min = Mid + 1
                    Case Is > 0
                        Max = Mid - 1
                    Case 0
                        .Add NewItem, , , Mid + 1
                        Exit Sub
                End Select
            Loop
            If Min = Mid Then
                .Add NewItem, , Mid + 1
            Else
                .Add NewItem, , , Mid + 1
            End If
        End With
    End Sub
    
    Public Sub JoinInOrder( _
        ByVal Collection As Collection, _
        ByRef NewItem As Variant)
        'NewItem(0) is the key field.
    
        Dim Min As Long, Mid As Long, Max As Long
        Dim NewKey As Variant, TestItem As Variant, TestKey As Variant
        Dim CompareStrings As Boolean, Compare As Integer
    
        With Collection
            If .Count = 0 Then
                .Add NewItem
                Exit Sub
            End If
            Min = 0
            Max = .Count - 1
            NewKey = NewItem(KEY_INDEX)
            CompareStrings = VarType(NewKey) = vbString
            Do Until Min > Max
                Mid = (Min + Max) \ 2
                TestItem = .Item(Mid + 1)
                TestKey = TestItem(KEY_INDEX)
                If CompareStrings Then
                    Compare = StrComp(TestKey, NewKey, vbBinaryCompare)
                Else
                    Compare = Sgn(TestKey - NewKey)
                End If
                Select Case Compare
                    Case Is < 0
                        Min = Mid + 1
                    Case Is > 0
                        Max = Mid - 1
                    Case 0
                        TestItem(4) = NewItem(4)
                        TestItem(5) = NewItem(5)
                        TestItem(6) = NewItem(6)
                        .Remove Mid + 1
                        If Mid + 1 = .Count Then
                            .Add TestItem
                        Else
                            .Add TestItem, , Mid + 1
                        End If
                        Exit Sub
                End Select
            Loop
            If Min = Mid Then
                .Add NewItem, , Mid + 1
            Else
                .Add NewItem, , , Mid + 1
            End If
        End With
    End Sub
    Found and fixed a bug. Shows that this approach is perhaps even trickier to get right than a complex SQL query.

    Added slightly larger test inputs.
    Attached Files Attached Files
    Last edited by dilettante; Sep 20th, 2017 at 07:42 PM.

  8. #8

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: CSV Full Outer Join - alternatives?

    If you don't like Collections and Variants containing arrays I suppose a fabricated ADO Recordset might also be used.

  9. #9
    PowerPoster
    Join Date
    Jun 2013
    Posts
    3,200

    Re: CSV Full Outer Join - alternatives?

    Not sure, for what scenario the whole thing should be optimized.

    Is it "more speed, when the Files get larger" or is it "sparse, easy to write code"?

    FWIW, I've made a Comparison-Zip with the current approaches which came in so far (including a new, RC5-SortedDictionary one):
    OuterJoinComparisons.zip

    There's 5 variants (each of them reading from the same a- and b.txt ... and then writing c.txt as the output-result,
    c.txt is Killed before each of the test-runs - and a CheckSum of the new generated c.txt of each approach is calculated)

    Here's the results (measured on a native compiled binary, but the IDE-results don't differ much):


    HTH

    Olaf

  10. #10
    Fanatic Member Spooman's Avatar
    Join Date
    Mar 2017
    Posts
    869

    Re: CSV Full Outer Join - alternatives?

    Olaf

    This is somewhat OT, but ...

    .. given that the minimum Timer interval is 1 ms, how do you get values to 2 decimal places, eg, 985.29 ??

    Spoo

  11. #11
    PowerPoster
    Join Date
    Oct 2013
    Posts
    2,913

    Re: CSV Full Outer Join - alternatives?

    You can use the GetTickCount API or the QueryPerformanceCounter API:
    http://www.vbforums.com/showthread.p...mancefrequency

  12. #12
    Fanatic Member ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    720

    Re: CSV Full Outer Join - alternatives?

    Hi,

    my 5 cent's.

    I changed the Files to ; instead of ,
    in Post#1 you said CSV

    Edit revised Code

    Code:
    Dim Cn As ADODB.Connection
    
    Private Sub cmdExecute_Click()
    Command1.Value = True 'create Table
    Command2.Value = True 'insert File a.txt
    Command3.Value = True 'insert File b.txt
    Command6.Value = True 'create Helper Table
    Command4.Value = True 'Update Value part1
    Command5.Value = True 'update Value part2
    Command7.Value = True 'insert Sorted data to new Table
    End Sub
    
    Private Sub Command1_Click()
      Dim sSql As String
       Dim s() As String
       Dim TableName As String
          TableName = "tbl_Test"
          On Error Resume Next
          sSql = "Drop Table " & TableName
          Cn.Execute sSql
          Err.Clear
          On Error GoTo 0
       
          'with DEFAULT 0
          ReDim s(11)
          s(0) = "Create Table " & TableName
          s(1) = "(m_ID Int Identity"
          s(1) = "(m_ID Int Identity Primary Key" 'PrimärKey
          s(2) = ",m_Year Int"
          s(3) = ",m_atxt1 Int DEFAULT 0 "
          s(4) = ",m_atxt2 Int DEFAULT 0"
          s(5) = ",m_atxt3 Int DEFAULT 0"
          s(6) = ",m_btxt1 Int DEFAULT 0 "
          s(7) = ",m_btxt2 Int DEFAULT 0"
          s(8) = ",m_btxt3 Int DEFAULT 0)"
          
          'without DEFAULT
    '    ReDim s(11)
    '      s(0) = "Create Table " & TableName
    '      s(1) = "(m_ID Int Identity"
    '  '    s(1) = "(m_ID Int Identity Primary Key" 'zusätzlich PrimärKey
    '      s(2) = ",m_Year Int"
    '      s(3) = ",m_atxt1 Int"
    '      s(4) = ",m_atxt2 Int"
    '      s(5) = ",m_atxt3 Int"
    '      s(6) = ",m_btxt1 Int"
    '      s(7) = ",m_btxt2 Int"
    '      s(8) = ",m_btxt3 Int)"
          sSql = Join(s, " ")
          Cn.Execute sSql
    End Sub
    
    Private Sub Command2_Click()
    Dim sSql As String
    'first a.txt:
    sSql = "INSERT INTO [tbl_Test] (m_Year,m_atxt1,m_atxt2,m_atxt3)"
    sSql = sSql & "SELECT  F1 ,F2,F3,F4 FROM [Text;DATABASE=C:\Text_Test;HDR=No].[a.txt]"
    Cn.Execute sSql
    End Sub
    
    Private Sub Command3_Click()
    Dim sSql As String
    'now b.txt
    sSql = "INSERT INTO [tbl_Test] (m_Year,m_btxt1,m_btxt2,m_btxt3)"
    sSql = sSql & " SELECT  F1 ,F2,F3,F4 FROM [Text;DATABASE=C:\Text_Test;HDR=No].[b.txt]"
    
    Cn.Execute sSql
    End Sub
    
    
    Private Sub Command6_Click()
     Dim sSql As String
     Dim TableName As String
          TableName = "tbl_Dup"
          On Error Resume Next
          sSql = "Drop Table " & TableName
          Cn.Execute sSql
          Err.Clear
          On Error GoTo 0
    sSql = "SELECT tbl_Test.m_Year, tbl_Test.m_atxt1, tbl_Test.m_atxt2, "
    sSql = sSql & "tbl_Test.m_atxt3, tbl_Test.m_btxt1, tbl_Test.m_btxt2, "
    sSql = sSql & "tbl_Test.m_btxt3 INTO tbl_Dup FROM tbl_Test;"
    Cn.Execute sSql
    End Sub
    
    
    Private Sub Command4_Click()
    'Update the Values
    Dim sSql As String
    sSql = "UPDATE tbl_Dup LEFT JOIN tbl_Test ON tbl_Dup.m_Year "
    sSql = sSql & "= tbl_Test.m_Year SET tbl_Test.m_btxt1 = "
    sSql = sSql & "[tbl_Dup].[m_btxt1], tbl_Test.m_btxt2 = [tbl_Dup].[m_btxt2], "
    sSql = sSql & "tbl_Test.m_btxt3 = [tbl_Dup].[m_btxt3]"
    sSql = sSql & " WHERE (((tbl_Test.m_btxt1) Like 0) AND "
    sSql = sSql & "((tbl_Test.m_btxt2) Like 0) AND ((tbl_Test.m_btxt3) Like 0) "
    sSql = sSql & " AND ((tbl_Test.m_Year) Like [tbl_Dup].[m_Year]));"
    Cn.Execute sSql
    End Sub
    
    Private Sub Command5_Click()
    'Update
    Dim sSql As String
    sSql = "UPDATE tbl_Dup RIGHT JOIN tbl_Test ON tbl_Dup.m_Year "
    sSql = sSql & "= tbl_Test.m_Year SET tbl_Test.m_atxt1 = "
    sSql = sSql & "[tbl_Dup].[m_atxt1], tbl_Test.m_atxt2 = [tbl_Dup].[m_atxt2],"
    sSql = sSql & "tbl_Test.m_atxt3 = [tbl_Dup].[m_atxt3]"
    sSql = sSql & "WHERE (((tbl_Test.m_atxt1) Like 0) AND "
    sSql = sSql & "((tbl_Test.m_atxt2) Like 0) AND ((tbl_Test.m_atxt3) Like 0) "
    sSql = sSql & "AND ((tbl_Test.m_Year) Like [tbl_Dup].[m_Year]));"
    Cn.Execute sSql
    End Sub
    
    Private Sub Command7_Click()
    'Last step sort the Data into new Table
    Dim sSql As String
    
     Dim TableName As String
          TableName = "tbl_Sorted"
          On Error Resume Next
          sSql = "Drop Table " & TableName
          Cn.Execute sSql
          Err.Clear
          On Error GoTo 0
    
    sSql = "SELECT DISTINCT tbl_Test.m_Year, tbl_Test.m_atxt1, tbl_Test.m_atxt2, "
    sSql = sSql & "tbl_Test.m_atxt3, tbl_Test.m_btxt1, tbl_Test.m_btxt2, tbl_Test.m_btxt3"
    sSql = sSql & " INTO tbl_Sorted FROM tbl_Test;"
    Cn.Execute sSql
    End Sub
    
    
    Public Function openTheDatabase() As Boolean
    '-- DB öffnen
    Dim sConnectionString As String
    On Error GoTo dbError
    '-- New connection --
    Set Cn = New ADODB.Connection
    sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" _
                     & "Data Source=" & App.Path & "\db4.mdb"
    Cn.Open sConnectionString
    openTheDatabase = True
    Exit Function
    dbError:
    MsgBox (Err.Description)
    openTheDatabase = False
    End Function
    
    
    
    
    
    Private Sub Form_Load()
      If (Not openTheDatabase()) Then
           MsgBox "Datenbank fehler..."
           Else
           MsgBox "OK"
        Exit Sub
    End If
    End Sub
    regards
    Chris
    Last edited by ChrisE; Sep 21st, 2017 at 09:06 AM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  13. #13
    Fanatic Member Spooman's Avatar
    Join Date
    Mar 2017
    Posts
    869

    Re: CSV Full Outer Join - alternatives?

    Arno

    Thanks ..

    Spoo

  14. #14

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: CSV Full Outer Join - alternatives?

    I'll mark this resolved. It wasn't for any one particular case:

    The files might be large but more often fairly small like 1000 lines of data or less. There might be just the two files A and B or Left and Right, or there might be 3 files (adding a Middle). Or there might be 4 or even 5 files.

    The data might always be comma delimited, either following the locale or always using the invariant locale like most modern text file formats do. Or it might sometimes be Tab delimited.

    The files might even have different numbers and types of columns outside of the matching key column.


    It sounds like we have a few ways to go about this.

    If the inputs all have the same format a Full Outer Join works - as long as we have one. When we don't we can simulate one by compounding other Joins.

    Or we can just code something up to do the same thing.

    To handle more exotic cases where inputs have varying formats we'd have to code something up. If there might be several inputs instead of just two files this may be a better option than using a SQL database engine too.


    I was hoping somebody knew of something like Log Parser that can do this. Log Parser (2.2 or so is the latest I think) is a Microsoft tool for handling text "log" formats. It has both a command line utility program and an ActiveX DLL.

    Log Parser

    Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®.

    You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.

    The world is your database with Log Parser.
    But it doesn't do any kinds of Joins.

  15. #15

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: CSV Full Outer Join - alternatives?

    Quote Originally Posted by Schmidt View Post
    Here's the results (measured on a native compiled binary, but the IDE-results don't differ much)...
    That seems very odd.

    When I run the Collection-based code I posted above it takes very little time. Certainly nothing as long as the other implementations do. I wonder where the bottleneck might be on your machine? Some sort of slowdown of VB6 native text I/O when the language setting uses non-standard delimiters?

    More likely it has to do with whether it is compiled or run in the IDE, how much data there is, etc.

    No big deal either way. An approach based on SQL Joins is probably easier to get working correctly for all inputs. Correct is always more important than fast.
    Last edited by dilettante; Sep 21st, 2017 at 10:57 AM.

  16. #16
    Fanatic Member Spooman's Avatar
    Join Date
    Mar 2017
    Posts
    869

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    dil

    As Chris noted, you did say CSV files.
    So I tried this (no SQL stuff) ..

    Code:
            ' join
            Dim dilC() As String
            ReDim dilC(10)
            dilC(0) = "0000"                    ' populate element 0
            fpath = "D:\VBForums\dilA.txt"
            Open fpath For Input As #1
            '
            nn = 1
            For aa = 1 To 10
                If EOF(1) Then
                    Exit For
                End If
                Line Input #1, xtra
                yra = CInt(Left(xtra, 4))
                fpath = "D:\VBForums\dilB.txt"
                Open fpath For Input As #2
                For bb = 1 To 10
                    Line Input #2, xtrb
                    yrb = CInt(Left(xtrb, 4))
                    yrc = CInt(Left(dilC(nn - 1), 4))
                    ' 1. did it
                    If yrb <= yrc Then
                        b = nada
                    ' 2. insert new yrB before curr yrA
                    ElseIf yrb < yra Then
                        dilC(nn) = Left(xtrb, 4) & ",0,0,0" & Mid(xtrb, 5)
                        nn = nn + 1
                        ' do curr yrA
                        Line Input #2, xtrb
                        yrb = CInt(Left(xtrb, 4))
                        If yrb = yra Then
                            dilC(nn) = xtra & Mid(xtrb, 5)
                            nn = nn + 1
                        Else
                            b = b
                        End If
                        Exit For
                    ' 3. same yr
                    ElseIf yrb = yra Then
                        dilC(nn) = xtra & Mid(xtrb, 5)
                        nn = nn + 1
                        Exit For
                    ' 4. new yr after
                    ElseIf yrb > yra Then
                        b = b
                    End If
                    ' 5. no yrB
                    If EOF(2) Then
                        dilC(nn) = xtra & ",0,0,0"
                        nn = nn + 1
                        Exit For
                    End If
                Next bb
                Close #2
            Next aa
            Close #1
            ' create dilC.txt
            fpath = "D:\VBForums\dilC.txt"
            Open fpath For Output As #1
            For ii = 1 To nn - 1
                Write #1, dilC(ii)
            Next ii
            Close #1
    .. and got this ..

    Name:  dilPIC1.png
Views: 61
Size:  7.3 KB

    Here is the file

    Name:  dilCpic.png
Views: 60
Size:  5.7 KB

    There are a few loose ends
    1. assumes Year in both files are already "sorted" .. ie, chronological
    2. there are one or 2 branches that need attention
    3. I don't know why the created dilC.txt has leading and trailing " (quotation marks)


    But, it's pretty quick.

    PS. You marked this as RESOLVED but I did begin work on it before so .. reason for post #10 ..

    Spoo

  17. #17
    PowerPoster
    Join Date
    Jun 2013
    Posts
    3,200

    Re: CSV Full Outer Join - alternatives?

    Quote Originally Posted by dilettante View Post
    That seems very odd.

    When I run the Collection-based code I posted above it takes very little time. Certainly nothing as long as the other implementations do.
    You've included very small a.txt and b.txt files (only 70 lines and 45 lines) in your Collection-example-Zip...

    To exclude non-related overheads, the timings I've posted were done with 5000 lines in each file (leaving a few gaps in each - per Mod 10 and Mod 8 respectively).

    Already with that relative small LineCount the Collection-approach shows tendencies to become exponentially worse
    (above 10000 lines in each file it becomes unusable).

    The other approaches do not show such an expontial worsening (they scale linearily).

    64000 lines:


    640000 lines:


    Quote Originally Posted by dilettante View Post
    No big deal either way. An approach based on SQL Joins is probably easier to get working correctly for all inputs. Correct is always more important than fast.
    In case the input-files get larger, one might consider using SQLite and the RC5-FileParsing-Class cCSV (instead of an unknown MS-Tool),
    which is consistently about factor 3 as fast as the JET-ISAM-driver...
    That even whilst currently producing the output-text-file "by hand" (after the Outer-Join-Select, in an Rs.ValueMatrix(i,j) -Loop).
    If one decides to leave the imported CSV-data within the SQLite-DB (and use that File further instead of c.txt), then only
    about half the time again would be needed for the plain a,b imports...

    Olaf

  18. #18
    Fanatic Member
    Join Date
    Apr 2015
    Location
    Finland
    Posts
    569

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Google how to pivot unknown numbers.

  19. #19

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    It is all good feedback, and we have some good alternatives for anyone finding this thread by searching.

    I think one big weakness of my Collection-based approach is relying on binary searching. The cost is minimal for small sets of data but grows with larger sets of data. Databases normally do far better, and you can add an index to the "inputs" that will outstrip my "search and insert" approach.

    This originally came up in the context of fairly small files. If the files get large or you need to do a lot of this then performance does become an issue.


    I had been hoping for some utility or library that might do this sort of thing generically. No need to write careful code or craft any SQL: just supply the list of files and which field is the key for matching.

    The more I think about this though, hardly anyone ever needs this. If they did we'd have such a tool, or at least more databases would support Full Outer Joins. SQL Server has them, MySQL may I don't know. But embedded/desktop DMBSs don't seem to. Probably because they aren't needed very much. As seen above we can produce the same result with a little effort.

  20. #20

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Quote Originally Posted by Spooman View Post
    There are a few loose ends
    1. assumes Year in both files are already "sorted" .. ie, chronological
    2. there are one or 2 branches that need attention
    3. I don't know why the created dilC.txt has leading and trailing " (quotation marks)


    But, it's pretty quick.
    If they are already in sequence, great. I wasn't really assuming that though. Still, in some cases the assumption may be valid.

    You used Write # where you probably wanted Print #.

  21. #21
    Fanatic Member ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    720

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Hi,
    I know this is marked as resolved, but I had another go at this with a shema.ini

    this output isn't quit what you want, I think this method could be improved on.

    here the sample..
    Code:
    'create a schema.ini and place it with the
    'textfiles in Folder C:\Text_Test
    '[a.txt]
    'ColNameHeader = True
    'Format=Delimited(;)
    '
    '[b.txt]
    'ColNameHeader = True
    'Format=Delimited(;)
    
    'textfile a.txt looks like this
    'nYear;a1;a2;a3
    '1901;1;1;1
    '1902;2;2;2
    '1904;4;4;4
    '1906;6;6;6
    '1908;8;8;8
    
    'textfile b.txt looks like this
    'nYear;a4;a5;a6
    '1901;11;11;11
    '1902;12;12;12
    '1903;13;13;13
    '1904;14;14;14
    '1905;15;15;15
    '1906;16;16;16
    
    
    Const myFolder = "C:\Text_Test\"
    Const a_File = "a.txt"
    Const b_File = "b.txt"
    Const adOpenStatic = 3
    Const adLockOptimistic = 3
    Const adCmdText = &H1
    
    Private Sub Command1_Click()
    Dim rst
    Dim conn
    Dim fso
    Dim outPath
    Dim outStream
    Dim colCount, i
    Dim hdr()
    Set fso = CreateObject("Scripting.FileSystemObject")
    'Outpufile
    outPath = myFolder & "out.txt"
    If fso.FileExists(outPath) Then fso.DeleteFile outPath, True
    'Connection
    Set conn = CreateObject("ADODB.Connection")
    conn.provider = "Microsoft.Jet.OLEDB.4.0"
    conn.ConnectionString = "Data Source='" & myFolder & "';Extended Properties='text'" 'HDR=Yes;FMT=Delimited('';'')';"
    conn.Open
    Set rst = CreateObject("ADODB.recordset")
    rst.Open "SELECT t1.*, t2.* FROM " & a_File & " AS t1 INNER JOIN " & b_File & " AS t2 ON t1.nYear = t2.nYear", _
        conn, adOpenStatic, adLockOptimistic, adCmdText
    'Header
    colCount = rst.fields.Count
    ReDim hdr(colCount - 1)
    For i = 0 To colCount - 1
        hdr(i) = rst.fields(i).Name
    Next
    'Out-Stream
    Set outStream = fso.OpenTextFile(outPath, 2, True) ' 2 = ForWriting
    'write Header
    outStream.writeLine Join(hdr, ";")
    'write new file
    outStream.writeLine rst.GetString(2, , ";", vbCrLf)
    outStream.Close
    End Sub
    schema.ini..
    Name:  schema.JPG
Views: 46
Size:  13.3 KB

    output..
    Name:  test_Output.JPG
Views: 47
Size:  17.2 KB


    regards
    Chris
    Last edited by ChrisE; Sep 22nd, 2017 at 05:23 AM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  22. #22
    Fanatic Member Spooman's Avatar
    Join Date
    Mar 2017
    Posts
    869

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Quote Originally Posted by dilettante View Post
    If they are already in sequence, great. I wasn't really assuming that though. Still, in some cases the assumption may be valid.

    You used Write # where you probably wanted Print #.
    First off, Print did the trick ..

    Secondly, I've now dealt with "NOT in sequence" by doing a bubble sort.
    I started the timer before the sorting

    Code:
            QueryPerformanceFrequency msFreq    'get frequency
            QueryPerformanceCounter msTime1     'get first time
            Dim aaA(), aaB()
            ReDim aaA(1, nna)
            ReDim aaB(1, nnb)
            ' 1. unsorted a, populate array aaA
            fpath = "D:\VBForums\dilau.txt"
            Open fpath For Input As #1
            nn = 0
            Do While Not EOF(1)
                Line Input #1, xtr
                aaA(0, nn + 1) = xtr
                aaA(1, nn + 1) = xtr
                nn = nn + 1
            Loop
            Close #1
            ' 2. bubble sort a
            ns = 0
            For ii = nn To 1 Step -1
                didswap = 0
                For jj = 1 To ii - 1
                    yr0 = CInt(Left(aaA(1, jj), 4))
                    yr1 = CInt(Left(aaA(1, jj + 1), 4))
                    If yr0 > yr1 Then
                        swap = aaA(1, jj)
                        aaA(1, jj) = aaA(1, jj + 1)
                        aaA(1, jj + 1) = swap
                        didswap = 1
                        ns = ns + 1
                    End If
                Next jj
                If didswap = 0 Then
                    Exit For
                End If
            Next ii
            ' 3. display a sorted, write to file
            fpath = "D:\VBForums\dilAs.txt"
            Open fpath For Output As #1
            For ii = 1 To nna
                Label2(2).Caption = Label2(2).Caption & aaA(1, ii) & vbCrLf
                Print #1, aaA(1, ii)
            Next ii
            Close #1
            ' 4. unsorted b, populate array aaB
            fpath = "D:\VBForums\dilbu.txt"
            Open fpath For Input As #1
            nn = 0
            Do While Not EOF(1)
                Line Input #1, xtr
                aaB(0, nn + 1) = xtr
                aaB(1, nn + 1) = xtr
                nn = nn + 1
            Loop
            Close #1
            ' 5. bubble sort b
            ns = 0
            For ii = nn To 1 Step -1
                didswap = 0
                For jj = 1 To ii - 1
                    yr0 = CInt(Left(aaB(1, jj), 4))
                    yr1 = CInt(Left(aaB(1, jj + 1), 4))
                    If yr0 > yr1 Then
                        swap = aaB(1, jj)
                        aaB(1, jj) = aaB(1, jj + 1)
                        aaB(1, jj + 1) = swap
                        didswap = 1
                        ns = ns + 1
                    End If
                Next jj
                If didswap = 0 Then
                    Exit For
                End If
            Next ii
            ' 6. display b sorted, write to file
            fpath = "D:\VBForums\dilBs.txt"
            Open fpath For Output As #1
            For ii = 1 To nnb
                Label2(3).Caption = Label2(3).Caption & aaB(1, ii) & vbCrLf
                Print #1, aaB(1, ii)
            Next ii
            Close #1
            ' 7. join
            ReDim dilC(10)
            dilC(0) = "0000"                    ' populate element 0
            fpath = "D:\VBForums\dilAs.txt"
            Open fpath For Input As #1
            '
            Dim yrAdid, yrBdid
            nn = 1
            For aa = 1 To 10
                If EOF(1) Then
                    Exit For
                End If
                Line Input #1, xtra
                yra = CInt(Left(xtra, 4))
                fpath = "D:\VBForums\dilBs.txt"
                Open fpath For Input As #2
                For bb = 1 To 10
                    Line Input #2, xtrb
                    yrb = CInt(Left(xtrb, 4))
                    yrc = CInt(Left(dilC(nn - 1), 4))
                    ' 1. did it
                    If yrb <= yrc Then
                        b = nada
                    ' 2. insert new yrB before curr yrA
                    ElseIf yrb < yra Then
                        dilC(nn) = Left(xtrb, 4) & ",0,0,0" & Mid(xtrb, 5)
                        nn = nn + 1
                        yrBdid = yrb
                        ' do curr yrA
                        Line Input #2, xtrb
                        yrb = CInt(Left(xtrb, 4))
                        If yrb = yra Then
                            dilC(nn) = xtra & Mid(xtrb, 5)
                            nn = nn + 1
                            yrAdid = yra
                        Else
                            b = b
                        End If
                        Exit For
                    ' 3. same yr
                    ElseIf yrb = yra Then
                        dilC(nn) = xtra & Mid(xtrb, 5)
                        nn = nn + 1
                        yrAdid = yra
                        yrBdid = yrn
                        Exit For
                    ' 4. new yr after
                    ElseIf yrb > yra Then
                        ' do curr yrA
                        If yra > yrAdid Then
                            dilC(nn) = xtra & ",0,0,0"
                            nn = nn + 1
                            yrAdid = yra
                        End If
                        ' do new yrB
                        dilC(nn) = Left(xtrb, 4) & ",0,0,0" & Mid(xtrb, 5)
                        nn = nn + 1
                        yrBdid = yrb
                    End If
                    ' 5. no yrB
                    If EOF(2) Then
                        If yra > yrAdid Then
                            dilC(nn) = xtra & ",0,0,0"
                            nn = nn + 1
                            End If
                        Exit For
                    End If
                Next bb
                Close #2
            Next aa
            Close #1
            ' create dilC.txt
            fpath = "D:\VBForums\dilC.txt"
            Open fpath For Output As #1
            For ii = 1 To nn - 1
                Print #1, dilC(ii)
            Next ii
            Close #1
            '
            QueryPerformanceCounter msTime2         'get second time
            zz = (msTime2 - msTime1) / msFreq
    Here is the image

    Name:  dilPIC2.png
Views: 43
Size:  11.5 KB

    Here is the file

    Name:  dilCpic2.png
Views: 41
Size:  6.5 KB

    Comments
    1. 6 steps were added before "join" to do the sorting
      • arrays are introduced for sorting
      • I still used text files as well .. if only arrays were used, could be a bit quicker
    2. I added 2 more dates from your zip file, such that b.txt now goes beyond a.txt
    3. only one "loose end" .. may not even be an issue
    4. the 2nd former loose end (' 4. new yr after) is now resolved.
    5. It took a little longer, but still is pretty quick


    Any more challenges? ..

    Spoo

  23. #23

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    17,865

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Nice work.

  24. #24
    Fanatic Member Spooman's Avatar
    Join Date
    Mar 2017
    Posts
    869

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Merci

    Truth be told, there are at least 2 other loose ends ..

    1. the values of nna and nnb (lines 4 and 5 of the snippet)
      • they are set earlier in "set-up code" (which I didn't post)
      • I can post that bit if you'd like
    2. In 7. join, the loops For aa and For bb are from 1 to 10
      • 10 was quick and dirty for trial purposes
      • I'm sure you can make it more intelligent as needed


    EDIT-1

    Oh yeah, there are also the declarations used for timing
    • I owe those to Arno's link
    • I can post them as well if they would be helpful


    Spoo
    Last edited by Spooman; Sep 22nd, 2017 at 10:02 PM.

  25. #25
    Fanatic Member ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    720

    Re: [RESOLVED] CSV Full Outer Join - alternatives?

    Quote Originally Posted by Spooman View Post
    First off, Print did the trick ..

    Secondly, I've now dealt with "NOT in sequence" by doing a bubble sort.
    I started the timer before the sorting

    Code:
            QueryPerformanceFrequency msFreq    'get frequency
            QueryPerformanceCounter msTime1     'get first time
            Dim aaA(), aaB()
            ReDim aaA(1, nna)
            ReDim aaB(1, nnb)
            ' 1. unsorted a, populate array aaA
            fpath = "D:\VBForums\dilau.txt"
            Open fpath For Input As #1
            nn = 0
            Do While Not EOF(1)
                Line Input #1, xtr
                aaA(0, nn + 1) = xtr
                aaA(1, nn + 1) = xtr
                nn = nn + 1
            Loop
            Close #1
            ' 2. bubble sort a
            ns = 0
            For ii = nn To 1 Step -1
                didswap = 0
                For jj = 1 To ii - 1
                    yr0 = CInt(Left(aaA(1, jj), 4))
                    yr1 = CInt(Left(aaA(1, jj + 1), 4))
                    If yr0 > yr1 Then
                        swap = aaA(1, jj)
                        aaA(1, jj) = aaA(1, jj + 1)
                        aaA(1, jj + 1) = swap
                        didswap = 1
                        ns = ns + 1
                    End If
                Next jj
                If didswap = 0 Then
                    Exit For
                End If
            Next ii
            ' 3. display a sorted, write to file
            fpath = "D:\VBForums\dilAs.txt"
            Open fpath For Output As #1
            For ii = 1 To nna
                Label2(2).Caption = Label2(2).Caption & aaA(1, ii) & vbCrLf
                Print #1, aaA(1, ii)
            Next ii
            Close #1
            ' 4. unsorted b, populate array aaB
            fpath = "D:\VBForums\dilbu.txt"
            Open fpath For Input As #1
            nn = 0
            Do While Not EOF(1)
                Line Input #1, xtr
                aaB(0, nn + 1) = xtr
                aaB(1, nn + 1) = xtr
                nn = nn + 1
            Loop
            Close #1
            ' 5. bubble sort b
            ns = 0
            For ii = nn To 1 Step -1
                didswap = 0
                For jj = 1 To ii - 1
                    yr0 = CInt(Left(aaB(1, jj), 4))
                    yr1 = CInt(Left(aaB(1, jj + 1), 4))
                    If yr0 > yr1 Then
                        swap = aaB(1, jj)
                        aaB(1, jj) = aaB(1, jj + 1)
                        aaB(1, jj + 1) = swap
                        didswap = 1
                        ns = ns + 1
                    End If
                Next jj
                If didswap = 0 Then
                    Exit For
                End If
            Next ii
            ' 6. display b sorted, write to file
            fpath = "D:\VBForums\dilBs.txt"
            Open fpath For Output As #1
            For ii = 1 To nnb
                Label2(3).Caption = Label2(3).Caption & aaB(1, ii) & vbCrLf
                Print #1, aaB(1, ii)
            Next ii
            Close #1
            ' 7. join
            ReDim dilC(10)
            dilC(0) = "0000"                    ' populate element 0
            fpath = "D:\VBForums\dilAs.txt"
            Open fpath For Input As #1
            '
            Dim yrAdid, yrBdid
            nn = 1
            For aa = 1 To 10
                If EOF(1) Then
                    Exit For
                End If
                Line Input #1, xtra
                yra = CInt(Left(xtra, 4))
                fpath = "D:\VBForums\dilBs.txt"
                Open fpath For Input As #2
                For bb = 1 To 10
                    Line Input #2, xtrb
                    yrb = CInt(Left(xtrb, 4))
                    yrc = CInt(Left(dilC(nn - 1), 4))
                    ' 1. did it
                    If yrb <= yrc Then
                        b = nada
                    ' 2. insert new yrB before curr yrA
                    ElseIf yrb < yra Then
                        dilC(nn) = Left(xtrb, 4) & ",0,0,0" & Mid(xtrb, 5)
                        nn = nn + 1
                        yrBdid = yrb
                        ' do curr yrA
                        Line Input #2, xtrb
                        yrb = CInt(Left(xtrb, 4))
                        If yrb = yra Then
                            dilC(nn) = xtra & Mid(xtrb, 5)
                            nn = nn + 1
                            yrAdid = yra
                        Else
                            b = b
                        End If
                        Exit For
                    ' 3. same yr
                    ElseIf yrb = yra Then
                        dilC(nn) = xtra & Mid(xtrb, 5)
                        nn = nn + 1
                        yrAdid = yra
                        yrBdid = yrn
                        Exit For
                    ' 4. new yr after
                    ElseIf yrb > yra Then
                        ' do curr yrA
                        If yra > yrAdid Then
                            dilC(nn) = xtra & ",0,0,0"
                            nn = nn + 1
                            yrAdid = yra
                        End If
                        ' do new yrB
                        dilC(nn) = Left(xtrb, 4) & ",0,0,0" & Mid(xtrb, 5)
                        nn = nn + 1
                        yrBdid = yrb
                    End If
                    ' 5. no yrB
                    If EOF(2) Then
                        If yra > yrAdid Then
                            dilC(nn) = xtra & ",0,0,0"
                            nn = nn + 1
                            End If
                        Exit For
                    End If
                Next bb
                Close #2
            Next aa
            Close #1
            ' create dilC.txt
            fpath = "D:\VBForums\dilC.txt"
            Open fpath For Output As #1
            For ii = 1 To nn - 1
                Print #1, dilC(ii)
            Next ii
            Close #1
            '
            QueryPerformanceCounter msTime2         'get second time
            zz = (msTime2 - msTime1) / msFreq
    Here is the image

    Name:  dilPIC2.png
Views: 43
Size:  11.5 KB

    Here is the file

    Name:  dilCpic2.png
Views: 41
Size:  6.5 KB

    Comments
    1. 6 steps were added before "join" to do the sorting
      • arrays are introduced for sorting
      • I still used text files as well .. if only arrays were used, could be a bit quicker
    2. I added 2 more dates from your zip file, such that b.txt now goes beyond a.txt
    3. only one "loose end" .. may not even be an issue
    4. the 2nd former loose end (' 4. new yr after) is now resolved.
    5. It took a little longer, but still is pretty quick


    Any more challenges? ..

    Spoo
    nice one Spoo !!


    regards
    Chris
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width

Survey posted by VBForums.