dcsimg
Page 1 of 2 12 LastLast
Results 1 to 40 of 42

Thread: [RESOLVED] CSV files and a big problem with notes field!

  1. #1

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Resolved [RESOLVED] CSV files and a big problem with notes field!

    The only way to separate records in csv files is finding CR+LF in it.

    At the end of each record there is CR+LF.

    In CSV files there is a field with name of (Notes).

    In this terrible field there are more CR+LF (13+10) characters!

    With this terrible field i can't separate every record in CSV files

    I'm using Open binary command

    Excel can easily open csv files even with Notes field (with more CR+LF)!!!! i don't know why i can't find a logical map in csv files to open them normally like excel !!!?

    Please help me :'(

    tnx
    Last edited by Mahdi Jazini; Oct 23rd, 2014 at 11:51 PM.

  2. #2
    Fanatic Member
    Join Date
    Jan 2006
    Posts
    557

    Re: CSV files and a big problem with notes field!

    Very often, VB6 (and .net) programmers use line input and the split method, both of which are easy and quick to write and appropriate in many cases but not always.

    Programs such as Excel probably use stream input, one character at a time and follow parsing rules that are different from the quick and easy VB6 methods. In order to be able to differentiate those memo files, I bet that the memo field at least, and probably the other ones too, are enclosed in double quotes ("). Excel parses from a double quote pairs, anything else is a comma or a CrLf.

    That's the most likely explanation. Open your file in Notepad to confirm that is the case. If so.... you will have to write a one character at a time parsing routine.

  3. #3

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Navion View Post
    Very often, VB6 (and .net) programmers use line input and the split method, both of which are easy and quick to write and appropriate in many cases but not always.

    Programs such as Excel probably use stream input, one character at a time and follow parsing rules that are different from the quick and easy VB6 methods. In order to be able to differentiate those memo files, I bet that the memo field at least, and probably the other ones too, are enclosed in double quotes ("). Excel parses from a double quote pairs, anything else is a comma or a CrLf.

    That's the most likely explanation. Open your file in Notepad to confirm that is the case. If so.... you will have to write a one character at a time parsing routine.
    i know i can use (line input) or (input) or....

    but it's impossible because CSV structure is not so simple!

    let me show you a very simple and short map of CSV files:

    ===============================
    CSV FILE (START)
    ===============================
    "Title","First Name","Middle Name","Last Name","Mobile Phone","E-mail Address","Notes" {CRLF}
    "","Mehdi","","Jazini","091111111","email@mehdi.com","I'm Mehdi {CRLF} I'm a programmer" {CRLF}
    "","Navion","","Foster","092222222" {CRLF}
    ===============================
    CSV FILE (END)
    ===============================

    - in the above simple example if you count the titles, they are 7 titles.

    - the first item is (mehdi jazini) which has exactly 7 items. (equal with the titles and it's standard) and completely matched with the master template.

    - but if for example, you add a new contact with name of (Navion Foster) and you decide to save it without email and notes, the microsoft outlook or google decide to export it like the above example!!!! they don't add something like this at the end of the record: ("","")

    - at this time you can easily split all the elements by (comma) or by (double quotes) but it's not enough!!!!

    - you must split every record too not just commas or double quotes!!! every child in programming can easily split them by (input) or (line input) or even (binary) in vb6! but the real art is: you be able to split records...

    - in the above example we have 3 records. but no one can split them... because there is no a static format in the above example! there is no way!!! i think more... but i couldn't find excel plan to open them in seperated rows!!!

    - {CRLF} in the (Notes) field and also none standard format of every row made me crazy!

    - The question is: why excel can open this conflicted format but we can't !!!!
    Last edited by Mahdi Jazini; Oct 24th, 2014 at 04:14 AM. Reason: QUOTE

  4. #4
    Fanatic Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    948

    Re: CSV files and a big problem with notes field!

    The way to read is not so complicated.
    Numbers starting with no CRLF or " and spaces are skipped.
    Numbers ending to , or crlf
    Srtings started with " and ending with "
    double "" at the start is an empty string, double "" after is a chr$(34) only.
    When we found crlf or cr or lf or crcr or any <32 char we can change "record" or "line"
    So for each turn, we read numbers or strings maybe with some crlf inside, and maybe less or more from previus record
    Excel has no two dimensional array but a database where the key interpreted as a position in displayed table, of rows and columns. So we can do without excel. Just make a class as a holder for a variant type and use "a1" for the first cell an index in a collection of those objects. So you can read the variable item csv, maybe with multi line strings, in this collection. You can recreate the collection in the screen by giving the position "e12" end if not found (skip this error with on error) just display an empty cell.

    From the above learn the 3 stages...parsing data, storing. displaying. These are 3 different stages. You miss the two last?..then you have no idea how to parse...
    Last edited by georgekar; Oct 24th, 2014 at 06:32 AM.

  5. #5
    PowerPoster
    Join Date
    Jun 2001
    Location
    Trafalgar, IN
    Posts
    4,137

    Re: CSV files and a big problem with notes field!

    Have you tried using ADO to open the file and access the data?

  6. #6
    Frenzied Member
    Join Date
    Mar 2008
    Posts
    1,175

    Re: CSV files and a big problem with notes field!

    Something like this perhaps;
    Code:
    Private Sub Command1_Click()
    
        Dim a As String, b As String
        Dim FinalString As String
        Dim InQuotes As Boolean
    
        f = FreeFile
        Open "test.csv" For Input As f
            a = Input$(LOF(f), f)
        Close f
        
        For i = 1 To Len(a)
            b = Mid$(a, i, 1)
                Select Case b
                    Case """"
                        InQuotes = Not InQuotes
                    Case vbCr, vbLf
                        If Not InQuotes Then
                            FinalString = FinalString & b
                        End If
                    Case Else
                        FinalString = FinalString & b
                 End Select
            Next
        Close f
        
        Text1 = FinalString
    
    End Sub
    Note you should set the Multiline property of the TextBox to True at design time
    Last edited by Magic Ink; Oct 24th, 2014 at 07:03 AM. Reason: Note...

  7. #7
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Mehdi Jazini View Post
    - in the above example we have 3 records. but no one can split them... because there is no a static format in the above example! there is no way!!! i think more... but i couldn't find excel plan to open them in seperated rows!!!

    - {CRLF} in the (Notes) field and also none standard format of every row made me crazy!

    - The question is: why excel can open this conflicted format but we can't !!!!
    Your given example is valid and well-parsable CSV - as the Excel-Importer demonstrates -
    so the statement that "no one can split them" is obviously not true in case of Excel
    (and for other CSV-import-routines).

    Linefeeds within quoted strings are possible (and identifyable).
    Also commas (the Field-Separators) are possible within such double-quoted-Fields.
    And qouted parts within Fields (double-quoted or not) are possible as well.
    Empty Fields at the right-end of a line can be left out, since the LineFeed is enough as a "New-Record" identifier.

    CSV-Parsers which can handle these Inputs, work like a state-machine,
    as basically shown in the code posted by 'Magic Ink'.

    Using a File with the following text-content:
    Code:
    "A Title, with a "double-quoted" part","First Name","Middle Name","Last Name","Mobile Phone","E-mail Address","Notes" 
    "","Mehdi","","Jazini","091111111","email@mehdi.com","I'm Mehdi 
    I'm a programmer" 
    "","Navion","","Foster","092222222" 
    A Title with a "double-quoted" part, First Name, Middle Name, Last Name, Mobile Phone, E-mail Address, Notes
    My parser here produces the following output:
    Row_Index Col_Index Field_Value
    Code:
     0             0            A Title, with a "double-quoted" part
     0             1            First Name
     0             2            Middle Name
     0             3            Last Name
     0             4            Mobile Phone
     0             5            E-mail Address
     0             6            Notes
    
     1             0            
     1             1            Mehdi
     1             2            
     1             3            Jazini
     1             4            091111111
     1             5            email@mehdi.com
     1             6            I'm Mehdi 
    I'm a programmer
    
     2             0            
     2             1            Navion
     2             2            
     2             3            Foster
     2             4            092222222
    
     3             0            A Title with a "double-quoted" part
     3             1            First Name
     3             2            Middle Name
     3             3            Last Name
     3             4            Mobile Phone
     3             5            E-mail Address
     3             6            Notes
    Olaf
    Last edited by Schmidt; Oct 24th, 2014 at 08:14 AM.

  8. #8

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Code:
    Private Sub Command1_Click()
    
        Dim a As String, b As String
        Dim FinalString As String
        Dim InQuotes As Boolean
    
        f = FreeFile
        Open "test.csv" For Input As f
            a = Input$(LOF(f), f)
        Close f
        
        For i = 1 To Len(a)
            b = Mid$(a, i, 1)
                Select Case b
                    Case """"
                        InQuotes = Not InQuotes
                    Case vbCr, vbLf
                        If Not InQuotes Then
                            FinalString = FinalString & b
                        End If
                    Case Else
                        FinalString = FinalString & b
                 End Select
            Next
        Close f
        
        Text1 = FinalString
    
    End Sub
    Magic Ink can you describe about the red part of your code?

    is my guest true: is it processing that the CRLF is in Quotes or out of Quotes?!!! is it really possible?

  9. #9
    Frenzied Member
    Join Date
    Mar 2008
    Posts
    1,175

    Re: CSV files and a big problem with notes field!

    Code tested with the file source supplied to us by you in post #3.

  10. #10

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Magic Ink View Post
    Code tested with the file source supplied to us by you in post #3.
    Magic Ink this is a very good idea. thank you very much. i didn't know that we can process characters (in) or (out) of Quotes.

    But sometimes in a CSV file format there is no any Quote in the file... just with (data and comma and CRLF)

    i want to write a code for general use...

    the main problem is (Notes) field. in (Notes) field sometimes we have CRLF. and the only way to seperate ROWs is just CRLF

    is there a way to know: is CRLF in the (Notes) field or out of it?

    By this way we can separate each ROW line by line

    Microsoft Excel resolved it... i don't know how :'(

  11. #11
    Fanatic Member
    Join Date
    Jan 2006
    Posts
    557

    Re: CSV files and a big problem with notes field!

    Mehdi :

    I see a couple of pitfalls in the example you just posted about which I prefer not to get into (they may be just typos or something)

    but here is a simple line of code to change the embedded coded CrLf (in one line ONLY), replacing it with a real one

    Code:
    oneline$ = Replace(oneline$, "{CRLF}", vbCrLf)
    Last edited by Navion; Oct 24th, 2014 at 10:00 AM.

  12. #12

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Navion View Post
    Mehdi :

    I see a couple of pitfalls in the example you just posted about which I prefer not to get into (they may be just typos or something)

    but here is a simple line of code to change the embedded coded CrLf (in one line ONLY), replacing it with a real one

    Code:
    oneline$ = Replace(oneline$, "{CRLF}", vbCrLf)
    Navion!!!!!!!!!!!!!!!!!!!!

    {CRLF} = chr$(13) + chr$(10) = vbCrLf

    in the above example {CRLF} is just a symbol to tell programmers it's a chr$(13) + chr$(10)

  13. #13
    Fanatic Member
    Join Date
    Jan 2006
    Posts
    557

    Re: CSV files and a big problem with notes field!

    Well then post an actual .CSV file to get the guesswork out of the picture.!!!!!

  14. #14
    Frenzied Member
    Join Date
    Mar 2008
    Posts
    1,175

    Re: CSV files and a big problem with notes field!

    >But sometimes in a CSV file format there is no any Quote in the file
    In which case string values must not contain any characters which upset things like vbCr, vbLf, and commas (the latter being affected by the Decimal symbol and List separator character in Windows regional settings). So it can be that a character other than the comma is used to delimit columns (often semi-colon) and that decimalised numbers can contain a comma instead of a period to indicate the decimal point.

    >i want to write a code for general use...
    It is generally accepted that string values are always surrounded in double quotes in csv files. Try saving an Excel Sheet as a csv file and see how the result looks in NotePad.

  15. #15

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Navion View Post
    Well then post an actual .CSV file to get the guesswork out of the picture.!!!!!
    OK

    Quote Originally Posted by Magic Ink View Post
    >But sometimes in a CSV file format there is no any Quote in the file
    In which case string values must not contain any characters which upset things like vbCr, vbLf, and commas (the latter being affected by the Decimal symbol and List separator character in Windows regional settings). So it can be that a character other than the comma is used to delimit columns (often semi-colon) and that decimalised numbers can contain a comma instead of a period to indicate the decimal point.

    >i want to write a code for general use...
    It is generally accepted that string values are always surrounded in double quotes in csv files. Try saving an Excel Sheet as a csv file and see how the result looks in NotePad.
    Login to Gmail then click on the (Gmail) caption and change it to (contacts) then try to export your contact database to csv and see the result...

    i think CSV format of Google is better than Outlook. i can easily work on it. even with CRLF problem because in every row of it, number of commas are equal to the title.

    if in the title there are 96 commas in other rows there are 96 commas too. and it's better for me
    Last edited by Mahdi Jazini; Oct 24th, 2014 at 11:22 AM.

  16. #16
    PowerPoster
    Join Date
    Jun 2001
    Location
    Trafalgar, IN
    Posts
    4,137

    Re: CSV files and a big problem with notes field!

    I guess I'll ask again, have you tried ADO yet?

  17. #17

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by MarkT View Post
    I guess I'll ask again, have you tried ADO yet?
    Sorry i forgot to reply you.

    no i didn't

    have the ADO a separated special engine to just open CSV files? if yes, ADO can be a good choice for this situation, otherwise it can't open terrible map of (Old Outlook CSV) files

    can you make an example?

    tnx

  18. #18
    PowerPoster
    Join Date
    Jun 2001
    Location
    Trafalgar, IN
    Posts
    4,137

    Re: CSV files and a big problem with notes field!

    If you post a sample file I'd give it a try.

  19. #19

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by MarkT View Post
    If you post a sample file I'd give it a try.
    Copy the following texts to notepad and save it as ansi

    Code:
    First Name,Middle Name,Last Name,Title,Suffix,Initials,Web Page,Gender,Birthday,Anniversary,Location,Language,Internet Free Busy,Notes,E-mail Address,E-mail 2 Address,E-mail 3 Address,Primary Phone,Home Phone,Home Phone 2,Mobile Phone,Pager,Home Fax,Home Address,Home Street,Home Street 2,Home Street 3,Home Address PO Box,Home City,Home State,Home Postal Code,Home Country,Spouse,Children,Manager's Name,Assistant's Name,Referred By,Company Main Phone,Business Phone,Business Phone 2,Business Fax,Assistant's Phone,Company,Job Title,Department,Office Location,Organizational ID Number,Profession,Account,Business Address,Business Street,Business Street 2,Business Street 3,Business Address PO Box,Business City,Business State,Business Postal Code,Business Country,Other Phone,Other Fax,Other Address,Other Street,Other Street 2,Other Street 3,Other Address PO Box,Other City,Other State,Other Postal Code,Other Country,Callback,Car Phone,ISDN,Radio Phone,TTY/TDD Phone,Telex,User 1,User 2,User 3,User 4,Keywords,Mileage,Hobby,Billing Information,Directory Server,Sensitivity,Priority,Private,Categories
    Bill,,Gates,,,,,,,,,,,,,,,,,,0922449384,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,
    John,,Foster,,,,,,,,,,,"this is a test1
    this is a test2
    ",info@john.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,
    Mehdi,,Jazini,,,,,,,,,,,"this is a test for note1
    this is a test for note2
    test
    
    ",mr.jazini@gmail.com,,,,,,09354170163,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,
    Norton,,Commander,,,,,,,,,,,,,,,,,,9454857455,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,

  20. #20
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    18,993

    Re: CSV files and a big problem with notes field!

    I prefer to use ADO vs manually parsing a csv file. Way easier. If ADO can't do it then: 1) pretty shocked or 2) malformated csv format

    Also, if interested, I believe in the codebank section, you will find a csv parsing class/control submitted by Merri
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  21. #21
    PowerPoster
    Join Date
    Jun 2001
    Location
    Trafalgar, IN
    Posts
    4,137

    Re: CSV files and a big problem with notes field!

    Attached is a quick sample project for you.
    Attached Files Attached Files

  22. #22

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by MarkT View Post
    Attached is a quick sample project for you.
    OMGGGGGGGGGGGG

    Finally i have found the easiest and fastest wayyyyy to open any CSV file

    Thank you MarkT very much for your info...

    Now i can open any CSV with any format even none standard formats with ADO

    MarkT did you know i have wrote a comprehensive code for opening csv files with (Open Binary) !!! :'((((((((

    It can work good but it can't open none standard and old csv formats. but the ADO can do anything

    you can also change:

    Code:
    strPath = IIf(Right(App.Path, 1) = "\", App.Path, App.Path & "\")
    to this short code:

    Code:
    strPath = replace(App.Path & "\","\\","\")
    and thank you for your screen shot but no need because a saved project have saved the reference check mark in itself

    Thank you again
    Last edited by Mahdi Jazini; Oct 25th, 2014 at 12:27 PM. Reason: to hyperlink markt id

  23. #23
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,961

    Re: CSV files and a big problem with notes field!

    That isn't "using ADO" as such since ADO is merely used as the connector technology. The work is being done by the Text Driver, a creaky old ODBC Desktop Driver that ADO can only use by thunking through the OLEDB Provider for ODBC. This could be used other ways as well such as direct ODBC calls or RDO.

    A cleaner, faster, and more powerful approach would be to use Jet 4.0's Text IISAM for this, which offers a superset of the Text Driver's capabilities with less overhead. That can be used via either ADO or even DAO 3.6 if one insists.

    In the posted example the connection string bizarrely specifies HDR=No and yet processes headers. I assume that's fallout from a poorly-formed connection string having no effect (by failing silently), along with a registry default of HDR=Yes for this desktop driver. If that's why the example here (seems to) work it may well fail on other machines that have the default flipped to No.

    Flawed example.

  24. #24

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by dilettante View Post
    That isn't "using ADO" as such since ADO is merely used as the connector technology. The work is being done by the Text Driver, a creaky old ODBC Desktop Driver that ADO can only use by thunking through the OLEDB Provider for ODBC. This could be used other ways as well such as direct ODBC calls or RDO.

    A cleaner, faster, and more powerful approach would be to use Jet 4.0's Text IISAM for this, which offers a superset of the Text Driver's capabilities with less overhead. That can be used via either ADO or even DAO 3.6 if one insists.

    In the posted example the connection string bizarrely specifies HDR=No and yet processes headers. I assume that's fallout from a poorly-formed connection string having no effect (by failing silently), along with a registry default of HDR=Yes for this desktop driver. If that's why the example here (seems to) work it may well fail on other machines that have the default flipped to No.

    Flawed example.
    Did I understand your reply correctly?

    - With HDR=Yes i can use MarkT sample in any windows operating system with any personal computer

    - With HDR=No i can't use MarkT sample in all computers

  25. #25
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Mehdi Jazini View Post
    Did I understand your reply correctly?
    No, what dilettante meant was, that the Header-Setting in the used connection-string of the example
    had no effect. The driver was just treating the SourceFile in "contains a headerline"-mode due to a
    system-default-setting (which might not be the same on all systems your solution later on has to run).

    So, it's better to use a connection-string which makes the driver react to the HDR=YES/NO string-setting.

    Code:
    "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=""text;FMT=Delimited;CharacterSet=ANSI;HDR=YES"";Data Source=C:\temp"
    Also important...
    The above Cnn-string contains a CharacterSet= ... setting for ANSI explicitely (which is normally the
    default, but you cannot always be sure about these defaults).

    And since you mentioned, that you want to treat (import) files exported from WebApps...
    These usually export CSV-content as UTF-8.

    The JET-Text-Driver currently cannot handle UTF-8 input directly - though what it
    does understand is Unicode-content (files) which exist in 16bit-little-endian-format.

    So after downloading an UTF-8 CSV-file from e.g. a google-mail-export, you will
    have to transcode it from UTF8 to UTF16LE and save it back to disk (you can do
    that operation on the whole file-content in one go).
    After that you can force the JET-CSV-parser to treat it appropriately with:

    Code:
    "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=""text;FMT=Delimited;CharacterSet=UNICODE;HDR=YES"";Data Source=C:\temp"

    Olaf

  26. #26

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Schmidt View Post
    No, what dilettante meant was, that the Header-Setting in the used connection-string of the example
    had no effect. The driver was just treating the SourceFile in "contains a headerline"-mode due to a
    system-default-setting (which might not be the same on all systems your solution later on has to run).

    So, it's better to use a connection-string which makes the driver react to the HDR=YES/NO string-setting.

    Code:
    "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=""text;FMT=Delimited;CharacterSet=ANSI;HDR=YES"";Data Source=C:\temp"
    Also important...
    The above Cnn-string contains a CharacterSet= ... setting for ANSI explicitely (which is normally the
    default, but you cannot always be sure about these defaults).

    And since you mentioned, that you want to treat (import) files exported from WebApps...
    These usually export CSV-content as UTF-8.

    The JET-Text-Driver currently cannot handle UTF-8 input directly - though what it
    does understand is Unicode-content (files) which exist in 16bit-little-endian-format.

    So after downloading an UTF-8 CSV-file from e.g. a google-mail-export, you will
    have to transcode it from UTF8 to UTF16LE and save it back to disk (you can do
    that operation on the whole file-content in one go).
    After that you can force the JET-CSV-parser to treat it appropriately with:

    Code:
    "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=""text;FMT=Delimited;CharacterSet=UNICODE;HDR=YES"";Data Source=C:\temp"

    Olaf
    Actually in gmail contacts in the export option there are 2 options to export.

    1) Google CSV format
    2) Outlook CSV format

    i always used (Outlook CSV format) because i didn't have unicode characters in my contact info in gmail

    now i have tested it

    the above ADO sample can just open (Outlook CSV format) in just (ANSI) mode. if all your contact data is english and then you click on (Outlook CSV format) the output in the text file will be normal with an ansi format. but if all your contact data is english + unicode characters and then you click on (Outlook CSV format) the output in the text file will be something like this in the file: ???????????????

    the only export option that can export unicode characters very well is just (Google CSV format)

    but it seems the ADO is unable to open (Google CSV format) even with CharacterSet=UNICODE;HDR=YES !!!

    now, how to open (Google CSV format) ?! :'(

  27. #27
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Mehdi Jazini View Post
    ...now, how to open (Google CSV format) ?! :'(
    Already tried to explain that - when the Google-native exportformat is used,
    then it is (with very high probabilty) UTF8-format.

    And the ADO-setting CharSet=UNICODE expects *not* UTF8-format, but
    UTF16LE-format instead.

    So you need (after downloading the Google-CSV-File) to transcode it from UTF8
    into UTF16.

    You can do that per VB6-code - but also (for a fast test) with Notepad.exe
    (per File-Open-Dialog, since that one contains the option to specify 'UTF8' there).

    After successful loading of your UTF8-file - just save it (per Notepad.exe SaveAs-Dialog)
    with the 'Unicode'-option (which on Win-Systems is little endian by default).

    Then make sure, you pass the correct path/filename of your new created UTF16-file
    into the ADO-related CSV-routines.

    Olaf

  28. #28

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Schmidt View Post
    Already tried to explain that - when the Google-native exportformat is used,
    then it is (with very high probabilty) UTF8-format.

    And the ADO-setting CharSet=UNICODE expects *not* UTF8-format, but
    UTF16LE-format instead.

    So you need (after downloading the Google-CSV-File) to transcode it from UTF8
    into UTF16.

    You can do that per VB6-code - but also (for a fast test) with Notepad.exe
    (per File-Open-Dialog, since that one contains the option to specify 'UTF8' there).

    After successful loading of your UTF8-file - just save it (per Notepad.exe SaveAs-Dialog)
    with the 'Unicode'-option (which on Win-Systems is little endian by default).

    Then make sure, you pass the correct path/filename of your new created UTF16-file
    into the ADO-related CSV-routines.

    Olaf
    i know but now the problem is not encoding of the file. because after downloading the (unicode) csv file. it's encoding is [unicode] by default. it's not utf-8 or utf-16 or ansi or something like this...

    now the problem is: ADO is unable to open (Google CSV Format) and it's an unknown format for it. it can just open (Outlook CSV Format) in gmail export options!

    you can copy it to a notepad and save it as ansi then try to open by ADO. it's impossible:

    Code:
    Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory Server,Mileage,Occupation,Hobby,Sensitivity,Priority,Subject,Notes,Group Membership,E-mail 1 - Type,E-mail 1 - Value,Phone 1 - Type,Phone 1 - Value
    Bill Gates,Bill,,Gates,,,,,,,,,,,,,,,,,,,,,,,* My Contacts,,,Mobile,0922449384
    John Foster,John,,Foster,,,,,,,,,,,,,,,,,,,,,,"this is a test1
    this is a test2
    ",* My Contacts,* Home,info@john.com,,
    Mehdi Jazini,Mehdi,,Jazini,,,,,,,,,,,,,,,,,,,,,,"this is a test for note1
    this is a test for note2
    test
    
    ",* My Contacts,* Home,mr.jazini@gmail.com,Mobile,09354170163
    Norton Commander,Norton,,Commander,,,,,,,,,,,,,,,,,,,,,,,* My Contacts,,,Mobile,9454857455

  29. #29
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    6,222

    Re: CSV files and a big problem with notes field!

    Mehdi, just to add a bit more info (or maybe noise, not sure). But Unicode is not just one thing. There are several "Unicode Transformation Formats" or UTF's available within Unicode. UTF-16 is probably the most popular but UTF-8 is also quite popular. In Notepad, the Save As/Unicode option is actually UTF-16. There is also a UTF-8 option listed there. (There's also an issue of Endianness, but that's probably not something you need to worry about.)

    VB6 uses the UTF-16 version of Unicode. This is a nice version in that all of the characters are exactly 2 bytes (or at least in most cases, including ALL of VB6). In the UTF-8 version, the original ASCII characters are actually only one byte, and it's quite easy to have characters that are 3 or 4 bytes.

    For a text file, you can check the first two (or three) bytes to see how it's encoded.

    If the first two bytes are &hFF &hFE then it's a UTF-16 Unicode encoded file.
    If the first three bytes are &hEF &hBB &hBF then it's a UTF-8 Unicode encoded file.
    If the high bit of the first byte is off (i.e., the byte < &h80), then it's ASCII.
    All other conditions either other encodings or undefined.

    Reading an ASCII file into VB6 is trivial, and I won't go into it.

    Regarding reading a UTF-8 file into VB6, that'd be a bit tough, and I don't have anything to do it.

    If you wish to read a UTF-16 file into VB6, the following is the code to do it.

    None of this really addresses your parsing issues, but maybe it'll help you to read a file, re-write it a bit differently, and then parse it. However, if it were me, I'd bite the bullet and just write a parser in VB6. From the above, I'm only partially clear on what you're trying to do but I'd do it in two phases: 1) write a parser to parse it into lines, creating a string array with the lines. That way, you could do whatever tricks you need to to correctly handle your vbLF and vbCR characters. Phase 2, I would parses the individual lines into my fields. Again, you'd have to monitor for your Chr$(34) characters.

    Best of Luck.

    Code:
    Public Sub SaveStringToUnicodeFile(sData As String, sFileSpec As String)
        ' These are typically .TXT files.  They can be read with notepad.
        Dim iFle As Long
        '
        iFle = FreeFile
        Open sFileSpec For Binary As iFle
        Put iFle, , &HFEFF ' This is the Unicode header to a text file.  First byte = FF, second byte = FE.
        Put iFle, , UnicodeByteArrayFromString(sData)
        Close iFle
    End Sub
    
    Public Function LoadStringFromUnicodeFile(sFileSpec As String) As String
        ' These are typically .TXT files.  They can be read with notepad.
        Dim iFle As Long
        Dim bb() As Byte
        Dim i As Integer
        '
        iFle = FreeFile
        Open sFileSpec For Binary As iFle
        Get iFle, , i
        If i <> &HFEFF Then ' Unicode file header.  First byte = FF, second byte = FE.
            Close iFle
            Exit Function ' It's not a valid Unicode file.
        End If
        ReDim bb(1 To LOF(iFle) - 2&)
        Get iFle, , bb
        Close iFle
        LoadStringFromUnicodeFile = bb ' This directly copies the byte array to the Unicode string (no conversion).
        ' Note: If you try to directly read the file as a string, VB6 will attempt to convert the string from ASCII to Unicode.
    End Function
    
    Public Function UnicodeByteArrayFromString(s As String) As Byte()
        ' This directly copies the Unicode string into the byte array, using two bytes per character (i.e., Unicode).
        UnicodeByteArrayFromString = s
    End Function
    Last edited by Elroy; Oct 26th, 2014 at 12:57 PM.

  30. #30
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    I don't have a google-mail-account and cannot test this myself.

    Would be nice when you could send the original (as downloaded from google, not-manipulated)
    two types of files to me per mail (I have a normal E-Mail reply-address defined for my account) -
    or in case the content is no problem, then an zipped upload to the "forum-space" would be even better,
    so that others can take a look too.

    Olaf

  31. #31

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Re: CSV files and a big problem with notes field!

    Thank you Elroy for your info i'll read your post tomorrow carefully

    Quote Originally Posted by Schmidt View Post
    I don't have a google-mail-account and cannot test this myself.

    Would be nice when you could send the original (as downloaded from google, not-manipulated)
    two types of files to me per mail (I have a normal E-Mail reply-address defined for my account) -
    or in case the content is no problem, then an zipped upload to the "forum-space" would be even better,
    so that others can take a look too.

    Olaf
    Olaf I have attached its original file from gmail export
    Attached Files Attached Files

  32. #32
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    6,222

    Re: CSV files and a big problem with notes field!

    Mehdi,

    That file you attached to post #31 isn't Unicode. It's pure ASCII. Here's some code to do it for you. I just hammered it out so be sure to run it through some tests. Also, be sure to patch up the sFileSpec (or maybe do some common dialog and prompt for it). Also, with my above "read Unicode file" routine, this could be fairly easily adapted to a Unicode file, but I'm done with free coding on this thread. I hope you get it going.

    Take Care,
    Elroy

    Code:
    Private Sub Form_Load()
        Dim iFle As Long
        Dim sFileSpec As String
        Dim bInQuote As Boolean
        Dim sLine As String
        Dim bb As Byte
        Dim sLines() As String
        Dim iLineCount As Long
        Dim iFieldCount As Long
        Dim iLin As Long
        Dim iChr As Long
        Dim iFld As Long
        Dim sLinesAndFields() As String
        Dim sField As String
        Dim c As String
        '
        sFileSpec = "c:\users\elroy\desktop\google.csv"
        iFle = FreeFile
        Open sFileSpec For Binary As iFle
        '
        ' Read file and parse lines.
        Do
            If EOF(iFle) Then
                If Len(sLine) > 0 Then
                    iLineCount = iLineCount + 1
                    ReDim Preserve sLines(1 To iLineCount)
                    sLines(iLineCount) = sLine
                    sLine = ""
                End If
                Exit Do ' The only way out.
            End If
            Get iFle, , bb
            Select Case bb
            Case 34  ' Quote.
                bInQuote = Not bInQuote
                sLine = sLine & Chr$(bb)
            Case 10 ' LF.
                If bInQuote Then
                    sLine = sLine & Chr$(bb)
                Else
                    iLineCount = iLineCount + 1
                    ReDim Preserve sLines(1 To iLineCount)
                    sLines(iLineCount) = sLine
                    sLine = ""
                End If
            Case 13 ' CR.
                If bInQuote Then
                    sLine = sLine & Chr$(bb)
                End If ' Else ignore.
            Case Else
                sLine = sLine & Chr$(bb)
            End Select
        Loop
        Close iFle
        '
        ' Now parse fields.
        ReDim sLinesAndFields(1 To iLineCount, 1 To 1)
        For iLin = 1 To iLineCount
            iFieldCount = 0
            bInQuote = False
            For iChr = 1 To Len(sLines(iLin))
                bb = Asc(Mid$(sLines(iLin), iChr, 1))
                Select Case bb
                Case 34 ' Quote.
                    bInQuote = Not bInQuote ' But toss the quote mark.
                Case 44 ' Comma.
                    If bInQuote Then
                        sField = sField & Chr$(bb)
                    Else
                        iFieldCount = iFieldCount + 1
                        If iFieldCount > UBound(sLinesAndFields, 2) Then ReDim Preserve sLinesAndFields(1 To iLineCount, 1 To iFieldCount)
                        sLinesAndFields(iLin, iFieldCount) = sField
                        sField = ""
                    End If
                Case Else
                    sField = sField & Chr$(bb)
                End Select
            Next iChr
            ' Create one more field with whatever's in sField.
            iFieldCount = iFieldCount + 1
            If iFieldCount > UBound(sLinesAndFields, 2) Then ReDim Preserve sLinesAndFields(1 To iLineCount, 1 To iFieldCount)
            sLinesAndFields(iLin, iFieldCount) = sField
            sField = ""
        Next iLin
        '
        ' Dump it to see if it worked.
        For iLin = 1 To iLineCount
            Debug.Print "Line " & Format$(iLin)
            For iFld = 1 To UBound(sLinesAndFields, 2)
                Debug.Print "  (" & Format$(iFld) & ") ";
                For iChr = 1 To Len(sLinesAndFields(iLin, iFld))
                    bb = Asc(Mid$(sLinesAndFields(iLin, iFld), iChr, 1))
                    Select Case bb
                    Case 10 ' LF
                        Debug.Print "{LF}";
                    Case 13
                        Debug.Print "{CR}";
                    Case Else
                        Debug.Print Chr$(bb);
                    End Select
                Next iChr
            Next iFld
            Debug.Print
        Next iLin
    End Sub
    p.s. If you wind up using this code, I'd like to think that you'd think through it and learn something from it.
    Last edited by Elroy; Oct 26th, 2014 at 03:03 PM.

  33. #33

    Thread Starter
    Lively Member Mahdi Jazini's Avatar
    Join Date
    Feb 2014
    Location
    Iran / Tehran
    Posts
    89

    Red face Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Elroy View Post
    Mehdi, just to add a bit more info (or maybe noise, not sure). But Unicode is not just one thing. There are several "Unicode Transformation Formats" or UTF's available within Unicode. UTF-16 is probably the most popular but UTF-8 is also quite popular. In Notepad, the Save As/Unicode option is actually UTF-16. There is also a UTF-8 option listed there. (There's also an issue of Endianness, but that's probably not something you need to worry about.)

    VB6 uses the UTF-16 version of Unicode. This is a nice version in that all of the characters are exactly 2 bytes (or at least in most cases, including ALL of VB6). In the UTF-8 version, the original ASCII characters are actually only one byte, and it's quite easy to have characters that are 3 or 4 bytes.

    For a text file, you can check the first two (or three) bytes to see how it's encoded.

    If the first two bytes are &hFF &hFE then it's a UTF-16 Unicode encoded file.
    If the first three bytes are &hEF &hBB &hBF then it's a UTF-8 Unicode encoded file.
    If the high bit of the first byte is off (i.e., the byte < &h80), then it's ASCII.
    All other conditions either other encodings or undefined.

    Reading an ASCII file into VB6 is trivial, and I won't go into it.

    Regarding reading a UTF-8 file into VB6, that'd be a bit tough, and I don't have anything to do it.

    If you wish to read a UTF-16 file into VB6, the following is the code to do it.

    None of this really addresses your parsing issues, but maybe it'll help you to read a file, re-write it a bit differently, and then parse it. However, if it were me, I'd bite the bullet and just write a parser in VB6. From the above, I'm only partially clear on what you're trying to do but I'd do it in two phases: 1) write a parser to parse it into lines, creating a string array with the lines. That way, you could do whatever tricks you need to to correctly handle your vbLF and vbCR characters. Phase 2, I would parses the individual lines into my fields. Again, you'd have to monitor for your Chr$(34) characters.

    Best of Luck.

    Code:
    Public Sub SaveStringToUnicodeFile(sData As String, sFileSpec As String)
        ' These are typically .TXT files.  They can be read with notepad.
        Dim iFle As Long
        '
        iFle = FreeFile
        Open sFileSpec For Binary As iFle
        Put iFle, , &HFEFF ' This is the Unicode header to a text file.  First byte = FF, second byte = FE.
        Put iFle, , UnicodeByteArrayFromString(sData)
        Close iFle
    End Sub
    
    Public Function LoadStringFromUnicodeFile(sFileSpec As String) As String
        ' These are typically .TXT files.  They can be read with notepad.
        Dim iFle As Long
        Dim bb() As Byte
        Dim i As Integer
        '
        iFle = FreeFile
        Open sFileSpec For Binary As iFle
        Get iFle, , i
        If i <> &HFEFF Then ' Unicode file header.  First byte = FF, second byte = FE.
            Close iFle
            Exit Function ' It's not a valid Unicode file.
        End If
        ReDim bb(1 To LOF(iFle) - 2&)
        Get iFle, , bb
        Close iFle
        LoadStringFromUnicodeFile = bb ' This directly copies the byte array to the Unicode string (no conversion).
        ' Note: If you try to directly read the file as a string, VB6 will attempt to convert the string from ASCII to Unicode.
    End Function
    
    Public Function UnicodeByteArrayFromString(s As String) As Byte()
        ' This directly copies the Unicode string into the byte array, using two bytes per character (i.e., Unicode).
        UnicodeByteArrayFromString = s
    End Function
    Thank you Elroy for your useful info.

    I have no knowledge to read and import UTF-8 or Unicode characters to VB6 objects like textbox and listbox and etc. so it's better to make the following conditions to avoid importing none ansi files:

    Asc=255 and Asc=254 is UTF-16 or Unicode {i guess it's &hFF &hFE}
    Asc=239 and Asc=187 and Asc=191 is UTF-8 {i guess it's &hEF &hBB &hBF}
    Asc=254 and Asc=255 and Asc=0 is Unicode big endian {and i don't guess anything just tested}

    if the codes detected the above ascii codes at the first of the binary file, it returns this error: "this program is unable to import none ansi files. please convert them to ansi then try again"

    Quote Originally Posted by Elroy View Post
    Mehdi,

    That file you attached to post #31 isn't Unicode. It's pure ASCII. Here's some code to do it for you. I just hammered it out so be sure to run it through some tests. Also, be sure to patch up the sFileSpec (or maybe do some common dialog and prompt for it). Also, with my above "read Unicode file" routine, this could be fairly easily adapted to a Unicode file, but I'm done with free coding on this thread. I hope you get it going.

    Take Care,
    Elroy

    Code:
    Private Sub Form_Load()
        Dim iFle As Long
        Dim sFileSpec As String
        Dim bInQuote As Boolean
        Dim sLine As String
        Dim bb As Byte
        Dim sLines() As String
        Dim iLineCount As Long
        Dim iFieldCount As Long
        Dim iLin As Long
        Dim iChr As Long
        Dim iFld As Long
        Dim sLinesAndFields() As String
        Dim sField As String
        Dim c As String
        '
        sFileSpec = "c:\users\elroy\desktop\google.csv"
        iFle = FreeFile
        Open sFileSpec For Binary As iFle
        '
        ' Read file and parse lines.
        Do
            If EOF(iFle) Then
                If Len(sLine) > 0 Then
                    iLineCount = iLineCount + 1
                    ReDim Preserve sLines(1 To iLineCount)
                    sLines(iLineCount) = sLine
                    sLine = ""
                End If
                Exit Do ' The only way out.
            End If
            Get iFle, , bb
            Select Case bb
            Case 34  ' Quote.
                bInQuote = Not bInQuote
                sLine = sLine & Chr$(bb)
            Case 10 ' LF.
                If bInQuote Then
                    sLine = sLine & Chr$(bb)
                Else
                    iLineCount = iLineCount + 1
                    ReDim Preserve sLines(1 To iLineCount)
                    sLines(iLineCount) = sLine
                    sLine = ""
                End If
            Case 13 ' CR.
                If bInQuote Then
                    sLine = sLine & Chr$(bb)
                End If ' Else ignore.
            Case Else
                sLine = sLine & Chr$(bb)
            End Select
        Loop
        Close iFle
        '
        ' Now parse fields.
        ReDim sLinesAndFields(1 To iLineCount, 1 To 1)
        For iLin = 1 To iLineCount
            iFieldCount = 0
            bInQuote = False
            For iChr = 1 To Len(sLines(iLin))
                bb = Asc(Mid$(sLines(iLin), iChr, 1))
                Select Case bb
                Case 34 ' Quote.
                    bInQuote = Not bInQuote ' But toss the quote mark.
                Case 44 ' Comma.
                    If bInQuote Then
                        sField = sField & Chr$(bb)
                    Else
                        iFieldCount = iFieldCount + 1
                        If iFieldCount > UBound(sLinesAndFields, 2) Then ReDim Preserve sLinesAndFields(1 To iLineCount, 1 To iFieldCount)
                        sLinesAndFields(iLin, iFieldCount) = sField
                        sField = ""
                    End If
                Case Else
                    sField = sField & Chr$(bb)
                End Select
            Next iChr
            ' Create one more field with whatever's in sField.
            iFieldCount = iFieldCount + 1
            If iFieldCount > UBound(sLinesAndFields, 2) Then ReDim Preserve sLinesAndFields(1 To iLineCount, 1 To iFieldCount)
            sLinesAndFields(iLin, iFieldCount) = sField
            sField = ""
        Next iLin
        '
        ' Dump it to see if it worked.
        For iLin = 1 To iLineCount
            Debug.Print "Line " & Format$(iLin)
            For iFld = 1 To UBound(sLinesAndFields, 2)
                Debug.Print "  (" & Format$(iFld) & ") ";
                For iChr = 1 To Len(sLinesAndFields(iLin, iFld))
                    bb = Asc(Mid$(sLinesAndFields(iLin, iFld), iChr, 1))
                    Select Case bb
                    Case 10 ' LF
                        Debug.Print "{LF}";
                    Case 13
                        Debug.Print "{CR}";
                    Case Else
                        Debug.Print Chr$(bb);
                    End Select
                Next iChr
            Next iFld
            Debug.Print
        Next iLin
    End Sub
    p.s. If you wind up using this code, I'd like to think that you'd think through it and learn something from it.
    Elroyyyyy did you really write this code yourself?

    I have read it line by line... it's a very smart and professional code...

    i didn't know this trick in vb6:

    Code:
    bInQuote = Not bInQuote
    so by the above trick we can switch between true and false in every process... wow good idea

    Code:
    ReDim Preserve sLines(1 To iLineCount)
    sLines(iLineCount) = sLine
    but sometimes we can't use arrays in the code because some CSV files have at least 100,000 or more records in a file and in my opinion ram memory of personal computers can't handle this big data

    i'll change your code to a code without arrays... i think random files can be best choice instead of arrays

    Code:
                Case 44 ' Comma.
                    If bInQuote Then
                        sField = sField & Chr$(bb)
                    Else
    omg.......
    you also think about comma problem....
    by using your professional code we can also use comma between quote in note filed...
    amazingggggg thank you for your time

    actually we can use your professional code instead of ADO... goodbye ADO

    i have decided to send you 2 rates for your posts but i have got error in 2nd rate.

    i'll send the 2nd rate to you later

    tnx

  34. #34
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Mehdi Jazini View Post
    ...actually we can use your professional code instead of ADO... goodbye ADO
    I wouldn't ditch it that fast - since it can reliably handle quite a few more things than Elroys code
    does currently - which was perhaps thought as a fast hacked example, to be adapted on your own -
    but then you already got a few of those also from others...

    The code as it currently is, doesn't handle Unicode - it will choke on *.csv Files which have
    line-separators different from vbCRLF, it tends to get progressively slower, the larger the files get
    (prolonging a String-Array per Redim-Preserve will work Ok for files below 10000 lines or so -
    but can be handled better) - also reading FileContent "Byte by Byte" is not really a good idea IMO...

    Here's a site where you can download some real-world CSV-samples:
    http://support.spatialkey.com/spatia...mple-csv-data/

    And e.g. on the first downloadable sample-file (the insurance-data, which contains about 36000 records)
    the code you got from Elroy would need about 10seconds or so (just to parse it) - whereas the JET-driver
    will hand out the results over to you in a nice format (a Recordset) after only about 500msecs (~factor 20 faster).

    If you really plan to import larger files (with 100,000 lines and more), then such performance-considerations
    might play a role (depending on your or your customers needs/expectations).

    To finish what I started - here's a small UTF8-UTF16 File-Transcoding routine, which (as it is currently) -
    should work for transcoding of Utf8-CSV-Files of up to ~100MB - above that you would need to change
    to a different approach (e.g. the csv-parser in the vbRichClient5-library - which is even faster than ADO/JET,
    the codebank actually has an example for it: http://www.vbforums.com/showthread.p...-and-ADOJet%29 ).

    Ok, here the Transcoding-Routine - and how it works in conjunction with UTF8-encoded files
    and the JET-TextDriver (+ an MSHFlexGrid, to finally show the retrieved Recordset-Content).

    Into a Form (needs an instance MSHFlexGrid1 and a Command1 on it):
    Code:
    Option Explicit
     
    Private Declare Function MultiByteToWideChar& Lib "kernel32" (ByVal CodePage&, ByVal dwFlags&, MultiBytes As Any, ByVal cBytes&, ByVal pWideChars&, ByVal cWideChars&)
    
    Private Sub Command1_Click()
    'UTF8-transcoding related code
    Dim CsvPath As String, CsvSrcFile As String, CsvDstFile As String
      CsvPath = "c:\temp\"
      CsvSrcFile = "FL_insurance_sample.csv" 'downloadable from here (first example): http://support.spatialkey.com/spatialkey-sample-csv-data/
      CsvDstFile = "FL_insurance_sampleUTF16.csv"
      TranscodeUTF8toUTF16 CsvPath & CsvSrcFile, CsvPath & CsvDstFile
    
    'ADO-Csv-import related stuff
    Dim Cnn As New ADODB.Connection, Rs As ADODB.Recordset, T!
      T = Timer
      Cnn.CursorLocation = adUseClient
      Cnn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=""text;FMT=Delimited;CharacterSet=UNICODE;HDR=YES"";Data Source=" & CsvPath
      
      Set Rs = Cnn.Execute("Select * From [" & CsvDstFile & "]")
      Set Rs.ActiveConnection = Nothing 'disconnect the Rs from the DataSource
      Caption = Rs.RecordCount & " Records imported in " & Timer - T
      
      Set MSHFlexGrid1.DataSource = Rs 'fill a FlexGrid with the Recordsets content
      Cnn.Close
    End Sub
     
     
    Sub TranscodeUTF8toUTF16(SrcFileName As String, DstFileName As String)
    Dim FNr&, BLen&, B() As Byte, Offs&, WLen&, BW() As Byte
    
    On Error GoTo 1
      BLen = FileLen(SrcFileName)
      If BLen = 0 Then Err.Raise vbObjectError, , "FileLen=0, nothing to transcode"
      
      FNr = FreeFile: Open SrcFileName For Binary Access Read As FNr
      ReDim B(0 To BLen - 1)
      Get FNr, , B 'read the UTF8-Bytes
      If BLen > 2 Then Offs = IIf(B(0) = 239 And B(1) = 187 And B(2) = 191, 3, 0)
      If BLen > Offs Then
        WLen = MultiByteToWideChar(65001, 0, B(Offs), BLen - Offs, 0, 0)
        ReDim BW(0 To WLen * 2 - 1)
        MultiByteToWideChar 65001, 0, B(Offs), BLen - Offs, VarPtr(BW(0)), WLen
      End If
      Close FNr
      
      FNr = FreeFile: Open DstFileName For Binary Access Write As FNr
      Put FNr, , BW 'write the WideChars
    1:
      If FNr Then Close FNr
      If Err Then Err.Raise Err.Number, Err.Source & ".TranscodeUTF8toUTF16", Err.Description
    End Sub
    Olaf

  35. #35
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Elroy View Post
    That file you attached to post #31 isn't Unicode. It's pure ASCII.
    The fact that a file has "pure ASCII-content" doesn't necessarily mean, that its content
    wasn't produced by an UTF8-encoding Export-Routine (at the server-end of the gmail-
    WebApp or by other services).

    When there's no explicit UTF8-BOM in the file, then you can not tell an UTF8-exported
    File (in case it contains only "english Char-Values" < 128) from a "real ASCII-file" ...
    UTF8 is backward-compatible to ASCII.

    @Mehdi - I'd try with a GMail-exported CSV, which contains contact-data of people
    from a few diffferent countries.

    Olaf

  36. #36
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    6,222

    Re: CSV files and a big problem with notes field!

    hahahaha, geez, my code is already taking the hits. And YES, I wrote it yesterday Mehdi. I didn't have anything better to do and I'm having fun on these forums. Also, I've been at this a LOOONNNGGG time. It took me something between one and two hours, but now I truly am bragging.

    Well, you've got 4gig of memory to play with when using VB6, so unless you're files are VERY large, I wouldn't worry about memory. But yes, the fact that I read the Ascii/Unicode file one byte at a time, it may get a bit slow. I've got other routines to create buffers for reading binary, but that would take more work.

    Also, Schmidt, if you study my code, you'll see that I focused on the {LF} only as a line terminator (just tossing the {CR} that weren't surrounded by quotes). That should make it handle Unix files just as well as Windows files.

    Also, as previously stated, it wouldn't take too much to use my previously posted LoadStringFromUnicodeFile function (post #29) and adapt the entire parsing routine to work with Unicode. Hardly anything would need to be changed except that it gets its bytes (errr, I mean, characters) from the loaded string rather than directly from disk. I'll admit that that would be getting several copies of the file in memory, but all the pieces are there for a pretty nice routine.

    EDIT: I guess, beyond minimal work, the only thing it won't handle is a UTF8 file. Those things are beasts, and I've never really messed with them that much.
    Last edited by Elroy; Oct 27th, 2014 at 11:23 AM.

  37. #37
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Elroy View Post
    Well, you've got 4gig of memory to play with when using VB6,
    Nah - typically it's more like 1.8GB (from the theoretical 2GB) we have available with VB6-Apps,
    until the VB-Mem-allocator starts choking on larger (or already midsized) allocations.

    Quote Originally Posted by Elroy View Post
    But yes, the fact that I read the Ascii/Unicode file one byte at a time, it may get a bit slow.
    Well, I'd say factor 20 is not only "a bit slower" than it could be - and what contributes to the
    performance-penalty is also the char-by-char-appending when gathering the Line-content.
    Then you parse for fields a second time (unnecessarily) - and that also one char at a time -
    so there's quite some room for further improvements on that front.
    Also your simple negation in case of quotes is currently error-prone, since there's no reset
    of the Flag to 'False' in case you encounter a (non-quoted) comma... a single double-quote
    is often used to specify inches on (otherwise unquoted) number-fields as a 'units'-specifier.

    How other CSV-parsers treat certain "hard to parse" fields, is e.g. pointed out here:
    http://www.xbeat.net/vbspeed/c_ParseCSV.php


    Quote Originally Posted by Elroy View Post
    ... you'll see that I focused on the {LF} only as a line terminator (just tossing the {CR} that weren't surrounded by quotes).
    That should make it handle Unix files just as well as Windows files.
    Right you are - was a bit imprecise there - though CSV-files which contain vbCR-only are not uncommon -
    e.g. the examples from the link I've posted: http://support.spatialkey.com/spatia...mple-csv-data/
    come in that category.

    Quote Originally Posted by Elroy View Post
    ... the only thing it won't handle is a UTF8 file.
    A quite common format these days - it's (due to its compactness) quite common in WebApps -
    whereas UTF16-format is not that often encountered in the wild.

    Quote Originally Posted by Elroy View Post
    Those things are beasts, and I've never really messed with them that much.
    It's not really handled very differently from ANSI-content (just another CodePage-EnumValue) -
    in either case the MultiByteToWideChar-API is used (too bad, that VBs StrConv-function
    only supports the different ANSI-Codepages over its last optional LCID-param).

    Though it's only a few lines of code - once you have the necessary API-Declaration in place.

    The Transcoding-Routine I've posted already contains the necessary bits - and a few other
    UTF8-to-UTF16-routines were recently posted and discussed as well here:
    http://www.vbforums.com/showthread.p...y-Line-by-Line

    Olaf

  38. #38
    Frenzied Member
    Join Date
    Mar 2008
    Posts
    1,175

    Re: CSV files and a big problem with notes field!

    And I guess it all turns to mush if we try to read a csv file created on a computer in e.g. France with one created in e.g. England unless we have prior knowledge of the Regional settings active when/ where it was created; and then I think we would need to parse the whole file character by character.

  39. #39
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    6,222

    Re: CSV files and a big problem with notes field!

    Hmmm, Schmidt, I suppose I sort of agree and disagree about the memory thing. 32 bit applications (which a VB6 program is) have a 4gig virtual address space. And, in fact, with the understanding of how forms, classes, and user controls are actually overlays in the EXE, you can have binary code that's even much larger than that so long as you don't try to pull it all into memory at the same time (with Load Form and Set MyObj = New SomeClass statements). However, you're right in that pretty much all variables and arrays (depending on the situation) are capped at a max of 2gig.

    If we're truly dealing with Ascii/Unicode type files, this is still one HECK of a lot of space. I'm not sure I've ever seen an ASCII or Unicode file that's over a few meg, but I'm sure they're out there. Many years ago, I used to be like Mehdi and afraid to use memory. However, machines I make installations on these days have a minimum of 2gig of RAM (usually more like 8), so it's just not a problem.

    Now I did recently post a binary file search routine here, and it was certainly disk based, but I view ASCII/Unicode files a bit differently.

    I'm trying to be done with this thread, but, considering how boolean variables are initialized as false, I'm pretty sure I handle everything except for one situation, which I did think about. And that's an escaped quote mark inside of quotes. Some CSV formats have a way of doing that, possibly with another escape character. But the following is handled: 1) a line opening a quote but not closing it before EOL, 2) a line terminating in a comma, 3) a null string such as ,"", 4) repeating commas such as ,,,, 4) a comma inside of quotes (which should be treated as part of the field).

    Regarding the speed, like I said, I hammered this out in an hour or so. It'd be fairly easy to write a buffer procedure for reading the data faster. It'd also be fairly easy to do some Mid$(s, iSpot, 1) = Chr$(bb) work to speed up string processing. Hmmm, you know what? You won't hurt my feelings one teensy weensy bit if you take my work, make those improvements to it, and re-post.

    Ahhh, and thanks for the thread. It had some interesting info in it. One thing that was fascinating (sort of expanding a bit on what it said) was to automate Notepad to use it as a UTF-8 to UTF-16 converter. It's certainly built into all later versions of Notepad. There is some other interesting code on that thread though, and I'll be studying it. I certainly wouldn't mind having a bulletproof UTF-8 to UTF-16 converter in my toolbox.

    Schmidt, you take care,
    Elroy

    EDIT: Note to Mehdi. Yes, if you do pull out any of the variables in that routine, and making their scope "module" or "global" then you probably do want to reset their initial values to False, "", and 0. And Erase all the arrays.
    Last edited by Elroy; Oct 27th, 2014 at 02:40 PM.

  40. #40
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,885

    Re: CSV files and a big problem with notes field!

    Quote Originally Posted by Elroy View Post
    Hmmm, Schmidt, I suppose I sort of agree and disagree about the memory thing.
    ...However, you're right in that pretty much all variables and arrays (depending on the situation) are capped at a max of 2gig.
    Thought that's what was relevant - and what we were discussing...

    I was talking from practical experience - and as for the theoretical limit of 2GB for
    mem-allocations in a 32Bit-process, we can reach only about 1.6 - 1.8 GB in practice
    (depending on fragmentation).

    The below code chokes here on my machine, after reaching 169 allocated chunks of 10MB each.
    (about 1.7 GB).

    When we change the chunksize to a larger one - e.g. 100MB - then the loop is
    exited already at 13 successfully allocated chunks (1.3 GB) here on a Win8-machine with 8GB Ram.

    Code:
    Option Explicit
    
    Private Type tMemAllocTest
      B() As Byte
    End Type
    
    Const MemoryChunks As Long = 200
    
    Private MemAllocTest(1 To MemoryChunks) As tMemAllocTest
    
    Private Sub Form_Load()
    Dim i As Long
    Const ByteArrSize As Long = 10& * 1024& * 1024& '10MB per Chunk
    
    On Error GoTo 1
     For i = 1 To MemoryChunks 'let's see how many 10MB-memory-chunks we can allocate
       ReDim MemAllocTest(i).B(1 To ByteArrSize)
     Next
    1: MsgBox "the system was able, to allocate only " & i - 1 & " 10MB-chunks"
    End Sub
    Quote Originally Posted by Elroy View Post
    Hmmm, you know what? You won't hurt my feelings one teensy weensy bit if you take my work, make those improvements to it, and re-post.
    Nah - I leave that to you - I have my 2 cents of CSV-parsing made available to the community
    in compiled form (allowing filesizes above 4GB, as well as Unicode-Filenames) - and its usage would look like this:

    Code:
    Option Explicit
    
    Implements ICSVCallback
    Private CSV As cCSV
    
    Private Sub Form_Load()
        Set CSV = New_c.CSV
            CSV.ColSepChar = ","
            CSV.QuotChar = Chr$(34)
        CSV.ParseFile "c:\temp\google.csv", Me
    End Sub
    
    Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, _
                                           ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
      Debug.Print RowNr, ColNr, CSV.GetStringValue(B, BValStartPos, BValLen)
    End Function
    Quote Originally Posted by Elroy View Post
    EDIT: Note to Mehdi. Yes, if you do pull out any of the variables in that routine, and making their scope "module" or "global" then you probably do want to reset their initial values to False, "", and 0. And Erase all the arrays.
    As the example above shows - a class-encapsulation in conjunction with an interface-based callback
    allows for a quite generic and fast "SAX-like" Field-parsing - just a suggestion...

    Olaf

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width