Results 1 to 11 of 11

Thread: Getting stuff from html code

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Sep 2003
    Posts
    227

    Question Getting stuff from html code

    Hey all i am in need of some help trying to figure out how to go about getting what i need out of some html code:
    Code:
    <tr>
    	<td style="padding: 2px 2px 2px 5px"><span class="modForm">IN</span></td>
    	<td align="right" class="">91&nbsp;&nbsp;</td>
    	<td align="right" class="">6&nbsp;&nbsp;</td>
    </tr>	
    <tr>
            <td style="padding: 2px 2px 2px 5px"><span class="modForm">Other</span></td>
    	<td align="right" class="">9&nbsp;&nbsp;</td>
            <td align="right" class="">11&nbsp;&nbsp;</td>
    </tr>
    The above code is just some of the overall html code that will be put inside a text box. There are a total of 8 "<td align="right" class="">" tages that i need to find. After finding those, i need just the part after that to the "</td>". So in the example above:
    Code:
    <td align="right" class="">91&nbsp;&nbsp;</td>
    I would only need:
    Code:
    91
    Each time it finds that i will set a count (Do until X=8) and place it into a text file. So the text file should end up looking like this:
    Code:
    91,6,9,11,...
    Any help would be great!

    David

  2. #2
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Getting stuff from html code

    Have you searched this forum for "parsing html"? Lots of good examples.
    Also, if counting the td tags (think td means table data), you may not want to Do Until, rather do a combination of that and also looking for the tr flag (think tr means table row). This way if you get to /tr before x=8, you know to abort the do loop.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Sep 2003
    Posts
    227

    Re: Getting stuff from html code

    Ok LaVolpe this is what i have now:
    Code:
    iStart = InStr(Text1.Text, "<td align=""right"" class="""">") + 26
    
    theNumber1 = Replace(Mid$(Text1.Text, iStart, 10), ">", "")
    theNumber1 = Replace(theNumber1, "n", "")
    theNumber1 = Replace(theNumber1, "b", "")
    theNumber1 = Replace(theNumber1, "s", "")
    theNumber1 = Replace(theNumber1, "p", "")
    theNumber1 = Replace(theNumber1, ";", "")
    theNumber1 = Replace(theNumber1, "&", "")
    It gets the first number just fine now, but like i said i have 8 of them total.. so how would i set up the loop for that using my code above?

    Thanks,

    David

  4. #4
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Getting stuff from html code

    I think you are going about it incorrectly. As I mentioned there are some good posts on this site, I believe DigiRev posted one in the code bank section. Anyway, you want to search for the tag beginning, i.e., "<td ". Once you find that, look for the "</td ". You will want everything after the 1st > character found from position of "<td >" and ending on position before the "</td ".

    Recommend placing the parsing in a subroutine. Pass the function the offsets where <td and </td were found. Let the function return the value parsed out. Then you'd simply call this function within a loop and exit the loop when you get 8 items or cannot find another <td tag

    HTML tags: <tag>data</tag>
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Sep 2003
    Posts
    227

    Re: Getting stuff from html code

    But LaVolpe.... there are some 100+ <td within the full HTML code... thats just a little part of the HTML code that i posted. However, the <td align=""right"" class=""""> is only present in the full HTML code 8 times.

    I just need to know how to do a loop so that it doesnt just keep finding the same <td align=""right"" class=""""> (the 91 value) everytime i call that code.

    David

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Sep 2003
    Posts
    227

    Re: Getting stuff from html code

    And for the record...
    Code:
    91&nbsp;&
    91&bsp;&
    91&sp;&
    91&p;&
    91&;&
    91&&
    91
    Thats what the code below does....
    Code:
    theNumber1 = Mid$(Text1.Text, iStart, 10)
    theNumber1 = Replace(theNumber1, "n", "")
    theNumber1 = Replace(theNumber1, "b", "")
    theNumber1 = Replace(theNumber1, "s", "")
    theNumber1 = Replace(theNumber1, "p", "")
    theNumber1 = Replace(theNumber1, ";", "")
    theNumber1 = Replace(theNumber1, "&", "")
    David

  7. #7
    "Digital Revolution"
    Join Date
    Mar 2005
    Posts
    4,471

    Re: Getting stuff from html code

    Lavolpe's suggestion is exactly how I'd do it.

    Also, you may want to just make a special trim function...ex:

    Code:
    Private Sub Form_Load()
        Dim strHTML As String
        
        strHTML = "&nbsp;&nbsp; 92   &nbsp;"
        strHTML = HTMLTrim(strHTML)
        MsgBox strHTML, , Len(strHTML)
    End Sub
    
    Private Function HTMLTrim(ByRef Expression As String) As String
        HTMLTrim = Trim$(Replace(Expression, "&nbsp;", ""))
    End Function
    I attached a quick example of extracting out the numbers.
    Attached Files Attached Files

  8. #8

    Thread Starter
    Addicted Member
    Join Date
    Sep 2003
    Posts
    227

    Re: Getting stuff from html code

    How can i use the code below to loop through the entire HTML code to get all the values associated with it?
    Code:
    s = GetSubstring(Text1.text, "<td align='right' class=''>", "</td>")
    Code:
    Private Function GetSubstring(ByVal sDoc As String, ByVal openingTag As String, ByVal closingTag As String) As String
        Dim i As Long
        
        If Len(sDoc) > 0 And Len(openingTag) > 0 And Len(closingTag) > 0 Then
            If InStr(sDoc, openingTag) > 0 Then
                i = InStr(sDoc, openingTag)
                sDoc = Right(sDoc, Len(sDoc) - i - Len(openingTag) + 1)
                If InStr(sDoc, closingTag) > 0 Then
                    i = InStr(sDoc, closingTag)
                    GetSubstring = Left(sDoc, i - 1)
                End If
            End If
        End If
    End Function
    Any help would be great!

    David

  9. #9
    PowerPoster
    Join Date
    Jun 2001
    Location
    Trafalgar, IN
    Posts
    4,141

    Re: Getting stuff from html code

    See if this is helpful
    Code:
    Option Explicit
    'requires that you add a reference to the Microsoft HTML Object Library
    
    Private Function GetHTML() As String
    Dim strHTML(18) As String
    
        strHTML(0) = "<html>"
        strHTML(1) = "<head>"
        strHTML(2) = "<title>Sample</title>"
        strHTML(3) = "</head>"
        strHTML(4) = "<body>"
        strHTML(5) = "<table>"
        strHTML(6) = "<tr>"
        strHTML(7) = "<td style='padding: 2px 2px 2px 5px'><span class='modForm'>IN</span></td>"
        strHTML(8) = "<td align='right' class=''>91&nbsp;&nbsp;</td>"
        strHTML(9) = "<td align='right' class=''>6&nbsp;&nbsp;</td>"
        strHTML(10) = "</tr>"
        strHTML(11) = "<tr>"
        strHTML(12) = "<td style='padding: 2px 2px 2px 5px'><span class='modForm'>Other</span></td>"
        strHTML(13) = "<td align='right' class=''>9&nbsp;&nbsp;</td>"
        strHTML(14) = "<td align='right' class=''>11&nbsp;&nbsp;</td>"
        strHTML(15) = "</tr>"
        strHTML(16) = "</table>"
        strHTML(17) = "</body>"
        strHTML(18) = "</html>"
        
        GetHTML = Join(strHTML, vbCrLf)
        
    End Function
    
    Private Sub Command1_Click()
    Dim td As IHTMLElementCollection
    Dim attr As IHTMLAttributeCollection
    Dim objDoc As MSHTML.HTMLDocument
    Dim i As Integer
        
        'Create the document and load the html
        Set objDoc = New MSHTML.HTMLDocument
        objDoc.body.innerHTML = GetHTML
    
        'Wait for the html to load
        Do
            DoEvents
        Loop Until objDoc.readyState = "complete"
    
        'Create a collection of td elements
        Set td = objDoc.getElementsByTagName("TD")
        'Loop through the collection
        For i = 0 To td.length - 1
     '       Debug.Print td.Item(i).outerHTML
     
            'Check if the attributes are correct
            Set attr = td.Item(i).Attributes
            If attr.Item("class").Value = "" And attr.Item("align").Value = "right" Then
                Debug.Print td.Item(i).innerText
            End If
        Next i
    
        Set td = Nothing
        Set objDoc = Nothing
    End Sub

  10. #10

    Thread Starter
    Addicted Member
    Join Date
    Sep 2003
    Posts
    227

    Re: Getting stuff from html code

    Alrighty im trying my best in trying to solve my current problem! Here is the code:
    Code:
    sHTML = Replace$(Text1.Text, """", "'")
        sTag = "class='fname'"
        s = GetSubstring(sHTML, sTag, "class='flinks'>")
        'sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
            
        If InStr(s, "class='fstatus'") = 0 Then
        Else
            sTag = "class='fstatus'> "
            Do While InStr(sHTML, sTag) > 0
                s = GetSubstring(sHTML, sTag, "</span>")
                sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
                theStatusSaying = s & vbCr
                    
                s = GetSubstring(sHTML, "class='fupdt'>", "</div>")
                sHTML = Replace$(sHTML, "class='fupdt'>", vbNullString, 1, 1)
                theStatus(x) = Replace(theStatusSaying, "&#039;", "'") & s
                x = x + 1
            Loop
            
        Loading.XP_ProgressBar1.Value = 95
        DoEvents
        sTag = ".jpg' alt='"
        x = 0
        Do While InStr(sHTML, sTag) > 0
            s = GetSubstring(sHTML, sTag, "' style='width:200px;'")
            s = Replace(s, "'", "")
            sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
            
            If InStr(s, "<img") = 0 And s <> "" Then
                If theStatus(x) = "" Then
                    theStatus(x) = "No updates"
                End If
                CreateBalloon Me.PicFBFriends(pictureCount), PicFBFriends(pictureCount).hwnd, theStatus(x), szBalloon, False, s, etiInfo
                pictureCount = pictureCount + 1
                x = x + 1
                DoEvents
            End If
        Loop
        End If
    The problem is that if at first the "If InStr(s, "class='fstatus'") = 0 Then" does = 0 then it exits out of everything and doesnt keep going to the next instance of the string.. I need it to do this but i am unable to figure out a way of doing it! It only happens like that if, and only, the first string doesnt contain the "class='fstatus".

    Any help would be great!

    David
    Last edited by Stealthrt; Apr 26th, 2008 at 08:52 PM.

  11. #11
    Hyperactive Member
    Join Date
    Oct 2007
    Location
    Indiana
    Posts
    295

    Re: Getting stuff from html code

    I think you're making this more complicated than need be. A Do-Loop and InStr will do it all. I've done plenty of 'reading' HTML codes and getting what I need.

    The key is you need to find what is common to every element you want to capture. For example, does every number you want to capture, is it followed by &nbsp? If so, you can key for that. Then just back up several charactes to get the number. You can do subsequent keys on "</tr> or whatever else takes you away from the last capture and gets you set to find the next one.

    If all are exactly the same except the first one, put in a flag (boolean) and for the first one look for string A$, but after that only look for string B$.

    I generally paste the HTML code into Notepad, then use CTRL-F to search for key phrases, testing what they find in the code, until I narrow down to a key that gets my desired elements only.

    Can you just paste the full HTML code into a notepad file and attach it here so we can take a look at it?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width