Getting stuff from html code

**Stealthrt** · Apr 20th, 2008, 12:39 PM

Hey all i am in need of some help trying to figure out how to go about getting what i need out of some html code:

Code:

<tr>
	<td style="padding: 2px 2px 2px 5px"><span class="modForm">IN</span></td>
	<td align="right" class="">91&nbsp;&nbsp;</td>
	<td align="right" class="">6&nbsp;&nbsp;</td>
</tr>	
<tr>
        <td style="padding: 2px 2px 2px 5px"><span class="modForm">Other</span></td>
	<td align="right" class="">9&nbsp;&nbsp;</td>
        <td align="right" class="">11&nbsp;&nbsp;</td>
</tr>

The above code is just some of the overall html code that will be put inside a text box. There are a total of 8 "<td align="right" class="">" tages that i need to find. After finding those, i need just the part after that to the "</td>". So in the example above:

Code:

<td align="right" class="">91&nbsp;&nbsp;</td>

I would only need:

Code:

Each time it finds that i will set a count (Do until X=8) and place it into a text file. So the text file should end up looking like this:

Code:

91,6,9,11,...

Any help would be great!

David

**LaVolpe** · Apr 20th, 2008, 12:56 PM

Have you searched this forum for "parsing html"? Lots of good examples.
Also, if counting the td tags (think td means table data), you may not want to Do Until, rather do a combination of that and also looking for the tr flag (think tr means table row). This way if you get to /tr before x=8, you know to abort the do loop.

**Stealthrt** · Apr 20th, 2008, 01:19 PM

Ok LaVolpe this is what i have now:

Code:

iStart = InStr(Text1.Text, "<td align=""right"" class="""">") + 26

theNumber1 = Replace(Mid$(Text1.Text, iStart, 10), ">", "")
theNumber1 = Replace(theNumber1, "n", "")
theNumber1 = Replace(theNumber1, "b", "")
theNumber1 = Replace(theNumber1, "s", "")
theNumber1 = Replace(theNumber1, "p", "")
theNumber1 = Replace(theNumber1, ";", "")
theNumber1 = Replace(theNumber1, "&", "")

It gets the first number just fine now, but like i said i have 8 of them total.. so how would i set up the loop for that using my code above?

Thanks,

David

**LaVolpe** · Apr 20th, 2008, 01:26 PM

I think you are going about it incorrectly. As I mentioned there are some good posts on this site, I believe DigiRev posted one in the code bank section. Anyway, you want to search for the tag beginning, i.e., "<td ". Once you find that, look for the "</td ". You will want everything after the 1st > character found from position of "<td >" and ending on position before the "</td ".

Recommend placing the parsing in a subroutine. Pass the function the offsets where <td and </td were found. Let the function return the value parsed out. Then you'd simply call this function within a loop and exit the loop when you get 8 items or cannot find another <td tag

HTML tags: <tag>data</tag>

**Stealthrt** · Apr 20th, 2008, 01:28 PM

But LaVolpe.... there are some 100+ <td within the full HTML code... thats just a little part of the HTML code that i posted. However, the <td align=""right"" class=""""> is only present in the full HTML code 8 times.

I just need to know how to do a loop so that it doesnt just keep finding the same <td align=""right"" class=""""> (the 91 value) everytime i call that code.

David

**Stealthrt** · Apr 20th, 2008, 01:31 PM

And for the record...

Code:

91&nbsp;&
91&bsp;&
91&sp;&
91&p;&
91&;&
91&&
91

Thats what the code below does....

Code:

theNumber1 = Mid$(Text1.Text, iStart, 10)
theNumber1 = Replace(theNumber1, "n", "")
theNumber1 = Replace(theNumber1, "b", "")
theNumber1 = Replace(theNumber1, "s", "")
theNumber1 = Replace(theNumber1, "p", "")
theNumber1 = Replace(theNumber1, ";", "")
theNumber1 = Replace(theNumber1, "&", "")

David

**DigiRev** · Apr 20th, 2008, 02:54 PM

Lavolpe's suggestion is exactly how I'd do it.

Also, you may want to just make a special trim function...ex:

Code:

Private Sub Form_Load()
    Dim strHTML As String
    
    strHTML = "&nbsp;&nbsp; 92   &nbsp;"
    strHTML = HTMLTrim(strHTML)
    MsgBox strHTML, , Len(strHTML)
End Sub

Private Function HTMLTrim(ByRef Expression As String) As String
    HTMLTrim = Trim$(Replace(Expression, "&nbsp;", ""))
End Function

I attached a quick example of extracting out the numbers.

**Stealthrt** · Apr 23rd, 2008, 09:43 AM

How can i use the code below to loop through the entire HTML code to get all the values associated with it?

Code:

s = GetSubstring(Text1.text, "<td align='right' class=''>", "</td>")

Code:

Private Function GetSubstring(ByVal sDoc As String, ByVal openingTag As String, ByVal closingTag As String) As String
    Dim i As Long
    
    If Len(sDoc) > 0 And Len(openingTag) > 0 And Len(closingTag) > 0 Then
        If InStr(sDoc, openingTag) > 0 Then
            i = InStr(sDoc, openingTag)
            sDoc = Right(sDoc, Len(sDoc) - i - Len(openingTag) + 1)
            If InStr(sDoc, closingTag) > 0 Then
                i = InStr(sDoc, closingTag)
                GetSubstring = Left(sDoc, i - 1)
            End If
        End If
    End If
End Function

Any help would be great!

David

**MarkT** · Apr 23rd, 2008, 01:38 PM

See if this is helpful

Code:

Option Explicit
'requires that you add a reference to the Microsoft HTML Object Library

Private Function GetHTML() As String
Dim strHTML(18) As String

    strHTML(0) = "<html>"
    strHTML(1) = "<head>"
    strHTML(2) = "<title>Sample</title>"
    strHTML(3) = "</head>"
    strHTML(4) = "<body>"
    strHTML(5) = "<table>"
    strHTML(6) = "<tr>"
    strHTML(7) = "<td style='padding: 2px 2px 2px 5px'><span class='modForm'>IN</span></td>"
    strHTML(8) = "<td align='right' class=''>91&nbsp;&nbsp;</td>"
    strHTML(9) = "<td align='right' class=''>6&nbsp;&nbsp;</td>"
    strHTML(10) = "</tr>"
    strHTML(11) = "<tr>"
    strHTML(12) = "<td style='padding: 2px 2px 2px 5px'><span class='modForm'>Other</span></td>"
    strHTML(13) = "<td align='right' class=''>9&nbsp;&nbsp;</td>"
    strHTML(14) = "<td align='right' class=''>11&nbsp;&nbsp;</td>"
    strHTML(15) = "</tr>"
    strHTML(16) = "</table>"
    strHTML(17) = "</body>"
    strHTML(18) = "</html>"
    
    GetHTML = Join(strHTML, vbCrLf)
    
End Function

Private Sub Command1_Click()
Dim td As IHTMLElementCollection
Dim attr As IHTMLAttributeCollection
Dim objDoc As MSHTML.HTMLDocument
Dim i As Integer
    
    'Create the document and load the html
    Set objDoc = New MSHTML.HTMLDocument
    objDoc.body.innerHTML = GetHTML

    'Wait for the html to load
    Do
        DoEvents
    Loop Until objDoc.readyState = "complete"

    'Create a collection of td elements
    Set td = objDoc.getElementsByTagName("TD")
    'Loop through the collection
    For i = 0 To td.length - 1
 '       Debug.Print td.Item(i).outerHTML
 
        'Check if the attributes are correct
        Set attr = td.Item(i).Attributes
        If attr.Item("class").Value = "" And attr.Item("align").Value = "right" Then
            Debug.Print td.Item(i).innerText
        End If
    Next i

    Set td = Nothing
    Set objDoc = Nothing
End Sub

**Stealthrt** · Apr 26th, 2008, 08:32 PM

Alrighty im trying my best in trying to solve my current problem! Here is the code:

Code:

sHTML = Replace$(Text1.Text, """", "'")
    sTag = "class='fname'"
    s = GetSubstring(sHTML, sTag, "class='flinks'>")
    'sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
        
    If InStr(s, "class='fstatus'") = 0 Then
    Else
        sTag = "class='fstatus'> "
        Do While InStr(sHTML, sTag) > 0
            s = GetSubstring(sHTML, sTag, "</span>")
            sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
            theStatusSaying = s & vbCr
                
            s = GetSubstring(sHTML, "class='fupdt'>", "</div>")
            sHTML = Replace$(sHTML, "class='fupdt'>", vbNullString, 1, 1)
            theStatus(x) = Replace(theStatusSaying, "&#039;", "'") & s
            x = x + 1
        Loop
        
    Loading.XP_ProgressBar1.Value = 95
    DoEvents
    sTag = ".jpg' alt='"
    x = 0
    Do While InStr(sHTML, sTag) > 0
        s = GetSubstring(sHTML, sTag, "' style='width:200px;'")
        s = Replace(s, "'", "")
        sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
        
        If InStr(s, "<img") = 0 And s <> "" Then
            If theStatus(x) = "" Then
                theStatus(x) = "No updates"
            End If
            CreateBalloon Me.PicFBFriends(pictureCount), PicFBFriends(pictureCount).hwnd, theStatus(x), szBalloon, False, s, etiInfo
            pictureCount = pictureCount + 1
            x = x + 1
            DoEvents
        End If
    Loop
    End If

The problem is that if at first the "If InStr(s, "class='fstatus'") = 0 Then" does = 0 then it exits out of everything and doesnt keep going to the next instance of the string.. I need it to do this but i am unable to figure out a way of doing it! It only happens like that if, and only, the first string doesnt contain the "class='fstatus".

Any help would be great!

David

**Caskbill** · Apr 26th, 2008, 10:48 PM

I think you're making this more complicated than need be. A Do-Loop and InStr will do it all. I've done plenty of 'reading' HTML codes and getting what I need.

The key is you need to find what is common to every element you want to capture. For example, does every number you want to capture, is it followed by &nbsp? If so, you can key for that. Then just back up several charactes to get the number. You can do subsequent keys on "</tr> or whatever else takes you away from the last capture and gets you set to find the next one.

If all are exactly the same except the first one, put in a flag (boolean) and for the first one look for string A$, but after that only look for string B$.

I generally paste the HTML code into Notepad, then use CTRL-F to search for key phrases, testing what they find in the code, until I narrow down to a key that gets my desired elements only.

Can you just paste the full HTML code into a notepad file and attach it here so we can take a look at it?

Thread: Getting stuff from html code

Thread Tools

Display

Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Re: Getting stuff from html code

Posting Permissions