[Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
I have a tab delimited file. One record contains double quotes.
I'm reading the text TSV file using ADODB and writing the results to MS Access. That double quote is really screwing things up. It appears to end the recordset!
The only answer I see from Google is to read the TSV file using VBA, not ADODB. But this really isn't an answer -- it avoids the problem. I want to use ADODB. It's hard to believe there's no way to read TSV files with a double quote in them. Any ideas?
This is the code that opens the ADODB connection and recordset:
Code:
Public Sub openADODBConnection()
Set c_cn = New ADODB.Connection
c_cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=""" & c_crFolder.path & """;" & _
"Extended Properties=""Text;HDR=Yes;FMT=Delimited"""
If c_cn.State <> adStateOpen Then
Call Me.closeDBConnection(exit_failure, "DB Connection Failed.")
Call MsgBox("ADO Connection failed with: " & c_crFolder.path, , "Connection Err")
Exit Sub
End If
End Sub
Public Sub openADODBRecordset(table As Tables)
Dim file As String, sql As String
Set c_rs = New ADODB.Recordset
If table = Tables.bulkpor Then
file = "FFIEC CDR Call Bulk POR " & Me.dateString & ".txt"
ElseIf table = Tables.rc Then
file = "FFIEC CDR Call Schedule RC " & Me.dateString & ".txt"
End If
sql = "SELECT * FROM [" & file & "]"
c_rs.Open sql, Me.dbConnection, adOpenForwardOnly, adLockReadOnly, adCmdText
If c_rs.State <> adStateOpen Then Call Me.closeDBRecordSet(exit_failure, "Recordset Err")
End Sub
Additionally, I use a schema.init file to force Access to read the file using tab delimiters.
Code:
[FFIEC CDR Call Schedule RCRII 06302014(1 of 2).txt]
Format=TabDelimited
There surely has to be a way to handle this, right? If I had to use a ham-fisted approach, I'd probably pre-process the files to remove the double quotes physically rather than switch to reading the TSV files with a VBA file read/split operation.
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
I don't know if this will work, but Chr(34) is the ASCII code for double quotes. I know you can use that in VBA. So I'm understanding you correctly you want to search for the double quotes. You can replaced them with Chr(34).
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
IS there a reason to use ADO for this? Why not just open the file, read each line, split on the TAB and write it out to the database?
I suspect it's because the parser isn't expecting the quote identifier to be in the middle of the data. I know with CSV files, I have to indicate if the fields are quote identified so it knows the difference between a comma in the middle of a quote (part of the value) vs outside the quote (separating values).
You don't show it, but what about the next record in the file... did it get read in OK?
Also, a couple things to try to help narrow down the issue - put double quotes around the whole field, replacing the double quotes with single quotes, and removing the quotes all together.
I'm curious as to what the results would be. I have my suspicions.
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
i was easily able to duplicate your problem
if a field is enclosed in quotes they are stripped out, no problem, unless the quotes are required to be kept
but if a field contains a quote within the text, the quote and the rest of that record are terminated, the next record is fine (up to any quote)
i found no solution within ADO
the only simple solution, to keep using the TSV as a recordset, is to, first, open the file for i/o and replace all quotes with some other character, i did test, single quotes do not give the same problem
I don't think it's possible to read TSV files with ADO without a proper schema.ini.
there is a registry setting, that can be changed
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
I have a toy problem that illustrates the behavior which I've attached. The zip file contains:
an empty Access 2010 file with only a VBA module (just run the only procedure)
a data file (data.txt)
a readme file describing the problem
a schema file (schema.ini) which is necessary for reading TSV files
I've also investigated the problem a little more and learned a bit more about its behavior. This feels like a bug because the behavior is inconsistent. Not only is the rest of the given recordset null (the rest of the recordsets are unaffected and normal) but what shows up depends on where the quoted material is.
When the quoted material is at the beginning of the record, the quoted material is printed.
When the quoted material is in the middle of the record, the beginning unquoted material is printed.
When the quoted material is at the end of the record, the non-quoted material is printed.
Case A) Entire field is quoted: No problem here.
Input: A<TAB>"hello and goodbye"<TAB>C
Field 0: "A"
Field 1: "hello and goodbye"
Field 2: "C"
Case B) Beginning material is quoted: Problem exists.
Input: A<TAB>"hello" and goodbye<TAB>C
Field 0: A
Field 1: hello
Field 2: Null
Case B) Middle material is quoted: Problem exists.
Input: A<TAB>hello "and" goodbye<TAB>C
Field 0: A
Field 1: hello
Field 2: Null
Case B) End material is quoted: Problem exists.
Input: A<TAB>hello and "goodbye"<TAB>C
Field 0: A
Field 1: hello and
Field 2: Null
In case you don't have Access 2010, I'm quoting my code here so you can cut and paste into whatever version of Access you have (don't forget to include a reference to Microsoft ActiveX Data Objects 2.x.
Code:
Option Compare Database
Option Explicit
''' Requires a reference to Microsoft ActiveX Data Objects 2.x DLL
'''
Public Sub test()
Dim cn As ADODB.Connection, rs As ADODB.Recordset
Dim path As String, sql As String, vTmp As Variant, s As String
Set cn = New ADODB.Connection
Set rs = New ADODB.Recordset
path = CurrentProject.path
' Open the connection
'
cn.Open _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=""" & path & """;" & _
"Extended Properties=""Text;HDR=Yes;FMT=Delimited"""
If cn.State <> ADODB.ObjectStateEnum.adStateOpen Then
Call MsgBox("ADO Connection failed", , "Error: ADO Connection")
GoTo DONE
End If
' Open the recordset
'
sql = "SELECT * FROM [data.txt]"
rs.Open sql, cn, adOpenForwardOnly, adLockReadOnly, adCmdText
If rs.State <> ADODB.ObjectStateEnum.adStateOpen Then
Call MsgBox("ADO Recordset failed", , "Error: ADO Recordset")
GoTo DONE
End If
' Print recordsets
'
Do While rs.EOF = False
s = vbNullString
For Each vTmp In rs.Fields
s = s & vTmp & ","
Next vTmp
Debug.Print s
rs.MoveNext
Loop
DONE:
On Error Resume Next
rs.Close
Set rs = Nothing
cn.Close
Set rs = Nothing
On Error GoTo 0
End Sub
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
You certainly don't remember me, but when I was a developer at Fitch Ratings you saved my butt a few times, and if I have semi-decent VBA coding skills, you're definitely to blame for that. Thank you!
Originally Posted by westconn1
i was easily able to duplicate your problem
if a field is enclosed in quotes they are stripped out, no problem, unless the quotes are required to be kept
but if a field contains a quote within the text, the quote and the rest of that record are terminated, the next record is fine (up to any quote)
That's pretty much what I found, too.
Originally Posted by westconn1
i was easily able to duplicate your problem
i found no solution within ADO
the only simple solution, to keep using the TSV as a recordset, is to, first, open the file for i/o and replace all quotes with some other character, i did test, single quotes do not give the same problem
D'oh! I was afraid it would come to that, but I was crossing my fingers there was some other way to side-step the problem than to do I/O before reading the file with ADO.
Originally Posted by westconn1
i was easily able to duplicate your problem
there is a registry setting, that can be changed
Yeah, I came across that in my Googling on how to read TSV files (I've never worked with them before), but since the people I work with aren't tech savvy I guess I wasn't considering that. My bad!
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
Well after my earlier foot in keyboard moment , I did some digging and found this
The Format option in Schema.ini specifies the format of the text file. The Text IISAM can read the format automatically from most character-delimited files. You can use any single character as a delimiter in the file except the double quotation mark (").
Source: https://msdn.microsoft.com/en-us/lib...=vs.85%29.aspx
What this does not say is, is that if you must have a double-quote between the delimiters then they must be double double-quote characters the same as you would do in your VBA code to insert a double-quote into a string constant.
Warning! The following is meant as a joke.
Since you appear adamant to use ADODB to accomplish this, there is hope.
Second warning! This may make you sick.
Read it line by line with ADODB and split the lines.
Code:
conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Extended Properties='Text;';" & _
"Data Source='" & path & "';"
Dim rs As ADODB.Recordset
Set rs = New ADODB.Recordset
rs.Open "Select * From [Tabbed.txt]", conn, adOpenForwardOnly, adLockReadOnly, adCmdText
Dim i As Integer
Dim fieldNames() As String
Dim dataFields() As String
While Not rs.EOF
fieldNames = Split(rs(0), vbTab)
rs.MoveNext
While Not rs.EOF
dataFields = Split(rs(0), vbTab)
For i = 0 To UBound(fieldNames)
Debug.Print fieldNames(i) & ";" & vbTab & dataFields(i)
Next i
rs.MoveNext
Wend
Wend
The "Schema.ini" would look like this:
Code:
[Tabbed.txt]
ColNameHeader=False
MaxScanRows=0
Format=FixedLength
CharacterSet=OEM
Rem set the Width to be longer than you need.
Col1=F1 Text Width 1000
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
Hey TnTinMN!
I think I see what you're doing. You're basically gulping the entire line (treating it as a fixed width data file) using an ADODB.Recordset and doing the splitting yourself via Split (and moving one past the header).
I think maybe that's the best solution here. Preprocessing all the files would be really wasteful. So be it. Thanks for the idea!
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
Originally Posted by caffeine
I think maybe that's the best solution here.
You jest I hope.
In Access VBA IDE, open the Help system (F1) and search for "Line Input" to see an example of reading a file. Then you can use the logic that I outlined to parse each line using the Split function.
Re: [Access 2010] Quotes are Screwing Up Recordset of Tab Delimited Data
depending on the size of the file
Code:
open "somepath\filename" for input as 1
strfile = input(lof(1),#1)
close 1
open "somepath\filename" for output as 1 ' change filename not to overwrite original if desired
print #1, replace(strfile, """", "'") 'replace quotes with single quote
close 1
of course the file may be too big for this simplistic solution, but you have not specified anywhere
if there are multiple file to process, you could convert them all first, or in turn before opening as recordset
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete