|
-
May 28th, 2009, 07:39 AM
#1
Thread Starter
Fanatic Member
[RESOLVED] seriously wacky left() behaviour
I have a program which takes the html from a web page, removes the formatting and leaves a list of conjugations for the verb. A sample of the result is here:
Barrer : sweep
*present indicative
Yo barro
Tú barres
Él/usted barre
Nosotros barremos
Vosotros barréis
Ellos/ustedes barren
*Imperfect:
Yo barría
Tú barrías
Él/usted barría
Nosotros barríamos
Vostros barríais
Ellos/ustedes barrían
*preterite:
Yo barrí
Etc . . .
etc . . .
I entered the "*"s because I want to ignore them when I write them to the database.
It starts off fine with the first record "barrer : sweep"
for the second record (the first one encountered which needs to be ignored), iPos is 1, sFirst is "*" and it executes the code in CASE "*"
so far so good.
It also works fine for the next 6 lines that I need and adds them to the database record.
The problem comes with "*Imperfect:"
iPos = 3, sItem = "" and sFirst = "".
The same happens with every line from then on which starts with "*" and iPos =3 for all of them.
WHY 3!!!!!!
What am I missing?
PLEASE HELP This is driving me batty!
Code:
Option Explicit
Private cn As ADODB.Connection
Private rs As ADODB.Recordset
Private sHTML As String
Private sItem() As String
Code:
Private Sub LoadDatabase()
Dim strConn As String
Dim iBasePtr As Integer
Dim iLoopCtr As Integer
Dim sFirst As String
Dim iPos As Integer
Set cn = New ADODB.Connection
' strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Users\hp\Documents\VBProgs\Get From Web\SpanishVerbs.mdb;Persist Security Info=False"
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Users\hp\Documents\VBProgs\Get From Web\SpanishVerbs.mdb;Persist Security Info=False"
Set rs = New ADODB.Recordset
rs.Open "verblist", cn, adOpenKeyset, adLockPessimistic, adCmdTable
rs.AddNew 'adding new record
'here we need two counts so that when we encounter a field with a * at the beginning
'it doesn't add it to the database
iBasePtr = 1
For iLoopCtr = 0 To UBound(sItem)
iPos = InStr(1, sItem(iLoopCtr), "*", vbTextCompare)
If iPos > 1 Then
sItem(iLoopCtr) = Left(sItem(iLoopCtr), iPos - 1)
End If
sFirst = Left(sItem(iLoopCtr), 1)
Select Case sFirst
Case "*"
iBasePtr = iBasePtr 'do nothing and DONT increase iBaseCtr
Case Else
rs.Fields(iBasePtr).Value = sItem(iLoopCtr)
iBasePtr = iBasePtr + 1
End Select
Next iLoopCtr
rs.Update 'this updates the recordset
rs.Close
Set rs = Nothing
Set cn = Nothing
End Sub
-
May 28th, 2009, 08:07 AM
#2
Re: seriously wacky left() behaviour
Where do you fill sItem? I see no code assigning anything to it, and the problem is very likely to be in the code that parses the HTML: there may be an extra character such as a newline that breaks your code.
As for InStr with vbTextCompare, it is unnecessary when you are looking for a special character such as "*". Just use vbBinaryCompare, it is a lot faster.
-
May 28th, 2009, 02:20 PM
#3
Thread Starter
Fanatic Member
Re: seriously wacky left() behaviour
sorry, the code to parse the HTML is here. I can't see there is a problem with it pulling a Newline because I also write the list to a listbox and it all looks fine in there.
Code:
Private Sub GetList()
'Tags 1 and 2 mark the text at the beginning and end of the area that I want
Const Tag1 = "<CENTER><FONT color=#0033ff size=4><STRONG>"
Const Tag2 = "<CENTER><STRONG><FONT color=#ff0000>"
Dim i As Long, sWhole As String
sItem = Split(sHTML, Tag1) 'split the string based on the start of the block
For i = 1 To UBound(sItem) '-- (i - 1) is used to remove the first item (0)
'- effectively chops off the start of the string
'using this method allows for more than one instance of the required string
sItem(i - 1) = Split(sItem(i), Tag2, 2)(0)
Next
ReDim Preserve sItem(UBound(sItem) - 1) '-- one item short to effectively remove the last part of the string
sWhole = sItem(0) 'We only get one instance for this application
sWhole = Replace(sWhole, "<BR><BR>", "<BR>")
sWhole = Replace(sWhole, "<STRONG><FONT color=#ff0000>", "")
sWhole = Replace(sWhole, "</STRONG>", "")
sWhole = Replace(sWhole, "<STRONG>", "*") 'used to mark the headers which will be ignored
sWhole = Replace(sWhole, "</FONT>", "")
sWhole = Replace(sWhole, "<FONT color=#0033ff size=4>", "")
sWhole = Replace(sWhole, "</CENTER>", "")
sWhole = Replace(sWhole, "</TD>", "")
sWhole = Replace(sWhole, "</TR>", "")
sWhole = Replace(sWhole, "<TR>", "")
sWhole = Replace(sWhole, "<TD 50%?? 2??>", "")
sWhole = Replace(sWhole, "<TD 25%??>", "")
sWhole = Replace(sWhole, "</TABLE>", "")
sWhole = Replace(sWhole, "</DIV>", "")
sWhole = Replace(sWhole, "</TBODY>", "")
sWhole = Replace(sWhole, "<BR>", "$") 'put a placemarker in between the lines
sItem = Split(sWhole, "$") 'then split it up
ReDim Preserve sItem(UBound(sItem))
For i = 0 To UBound(sItem)
If i = 1 Then 'it's complicated format info. Easier to ignore it
List1.AddItem "*present indicative"
sItem(i) = "*present indicative"
Else
List1.AddItem sItem(i) 'Add it to the list
End If
Next
LoadDatabase
End Sub
Last edited by Españolita; May 28th, 2009 at 02:23 PM.
-
May 28th, 2009, 02:34 PM
#4
Re: seriously wacky left() behaviour
I don't see what the problem is right away. But if you are curious as to what the 1st 2 characters are when iPos = 3:
Code:
If iPos = 3 Then
Debug.Print Asc(Left$(sItem(iLoopCtr),1)), Asc(Mid$(sItem(iLoopCtr),2,1))
End If
-
May 28th, 2009, 02:53 PM
#5
Thread Starter
Fanatic Member
Re: seriously wacky left() behaviour
hi LaVolpe,
the first two characters are 13 & 10 every time
*scratches head* isn't that CR, newline?
Last edited by Españolita; May 28th, 2009 at 02:57 PM.
-
May 28th, 2009, 03:05 PM
#6
Thread Starter
Fanatic Member
Re: seriously wacky left() behaviour
well, that's sorted that out.
I used replace on those two characters and it solved the problem.
so just out of interest, why didn't it show up when I put them in the listbox?
-
May 28th, 2009, 03:06 PM
#7
Re: seriously wacky left() behaviour
13 and 10 equate the vbCrLF or vbNewLine. That does explain things I would think. Maybe you may want to strip those out of your sHTML string before you start processing them.
Edited: I see we posted about the same time. Glad you resolved it. ListBoxes do not display carriage returns, do they, but they should have displayed 2 vertical bars, one for each character (13 & 10).
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|