The above code is just some of the overall html code that will be put inside a text box. There are a total of 8 "<td align="right" class="">" tages that i need to find. After finding those, i need just the part after that to the "</td>". So in the example above:
Code:
<td align="right" class="">91 </td>
I would only need:
Code:
91
Each time it finds that i will set a count (Do until X=8) and place it into a text file. So the text file should end up looking like this:
Have you searched this forum for "parsing html"? Lots of good examples.
Also, if counting the td tags (think td means table data), you may not want to Do Until, rather do a combination of that and also looking for the tr flag (think tr means table row). This way if you get to /tr before x=8, you know to abort the do loop.
Insomnia is just a byproduct of, "It can't be done"
I think you are going about it incorrectly. As I mentioned there are some good posts on this site, I believe DigiRev posted one in the code bank section. Anyway, you want to search for the tag beginning, i.e., "<td ". Once you find that, look for the "</td ". You will want everything after the 1st > character found from position of "<td >" and ending on position before the "</td ".
Recommend placing the parsing in a subroutine. Pass the function the offsets where <td and </td were found. Let the function return the value parsed out. Then you'd simply call this function within a loop and exit the loop when you get 8 items or cannot find another <td tag
HTML tags: <tag>data</tag>
Insomnia is just a byproduct of, "It can't be done"
But LaVolpe.... there are some 100+ <td within the full HTML code... thats just a little part of the HTML code that i posted. However, the <td align=""right"" class=""""> is only present in the full HTML code 8 times.
I just need to know how to do a loop so that it doesnt just keep finding the same <td align=""right"" class=""""> (the 91 value) everytime i call that code.
Also, you may want to just make a special trim function...ex:
Code:
Private Sub Form_Load()
Dim strHTML As String
strHTML = " 92 "
strHTML = HTMLTrim(strHTML)
MsgBox strHTML, , Len(strHTML)
End Sub
Private Function HTMLTrim(ByRef Expression As String) As String
HTMLTrim = Trim$(Replace(Expression, " ", ""))
End Function
I attached a quick example of extracting out the numbers.
How can i use the code below to loop through the entire HTML code to get all the values associated with it?
Code:
s = GetSubstring(Text1.text, "<td align='right' class=''>", "</td>")
Code:
Private Function GetSubstring(ByVal sDoc As String, ByVal openingTag As String, ByVal closingTag As String) As String
Dim i As Long
If Len(sDoc) > 0 And Len(openingTag) > 0 And Len(closingTag) > 0 Then
If InStr(sDoc, openingTag) > 0 Then
i = InStr(sDoc, openingTag)
sDoc = Right(sDoc, Len(sDoc) - i - Len(openingTag) + 1)
If InStr(sDoc, closingTag) > 0 Then
i = InStr(sDoc, closingTag)
GetSubstring = Left(sDoc, i - 1)
End If
End If
End If
End Function
Option Explicit
'requires that you add a reference to the Microsoft HTML Object Library
Private Function GetHTML() As String
Dim strHTML(18) As String
strHTML(0) = "<html>"
strHTML(1) = "<head>"
strHTML(2) = "<title>Sample</title>"
strHTML(3) = "</head>"
strHTML(4) = "<body>"
strHTML(5) = "<table>"
strHTML(6) = "<tr>"
strHTML(7) = "<td style='padding: 2px 2px 2px 5px'><span class='modForm'>IN</span></td>"
strHTML(8) = "<td align='right' class=''>91 </td>"
strHTML(9) = "<td align='right' class=''>6 </td>"
strHTML(10) = "</tr>"
strHTML(11) = "<tr>"
strHTML(12) = "<td style='padding: 2px 2px 2px 5px'><span class='modForm'>Other</span></td>"
strHTML(13) = "<td align='right' class=''>9 </td>"
strHTML(14) = "<td align='right' class=''>11 </td>"
strHTML(15) = "</tr>"
strHTML(16) = "</table>"
strHTML(17) = "</body>"
strHTML(18) = "</html>"
GetHTML = Join(strHTML, vbCrLf)
End Function
Private Sub Command1_Click()
Dim td As IHTMLElementCollection
Dim attr As IHTMLAttributeCollection
Dim objDoc As MSHTML.HTMLDocument
Dim i As Integer
'Create the document and load the html
Set objDoc = New MSHTML.HTMLDocument
objDoc.body.innerHTML = GetHTML
'Wait for the html to load
Do
DoEvents
Loop Until objDoc.readyState = "complete"
'Create a collection of td elements
Set td = objDoc.getElementsByTagName("TD")
'Loop through the collection
For i = 0 To td.length - 1
' Debug.Print td.Item(i).outerHTML
'Check if the attributes are correct
Set attr = td.Item(i).Attributes
If attr.Item("class").Value = "" And attr.Item("align").Value = "right" Then
Debug.Print td.Item(i).innerText
End If
Next i
Set td = Nothing
Set objDoc = Nothing
End Sub
Alrighty im trying my best in trying to solve my current problem! Here is the code:
Code:
sHTML = Replace$(Text1.Text, """", "'")
sTag = "class='fname'"
s = GetSubstring(sHTML, sTag, "class='flinks'>")
'sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
If InStr(s, "class='fstatus'") = 0 Then
Else
sTag = "class='fstatus'> "
Do While InStr(sHTML, sTag) > 0
s = GetSubstring(sHTML, sTag, "</span>")
sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
theStatusSaying = s & vbCr
s = GetSubstring(sHTML, "class='fupdt'>", "</div>")
sHTML = Replace$(sHTML, "class='fupdt'>", vbNullString, 1, 1)
theStatus(x) = Replace(theStatusSaying, "'", "'") & s
x = x + 1
Loop
Loading.XP_ProgressBar1.Value = 95
DoEvents
sTag = ".jpg' alt='"
x = 0
Do While InStr(sHTML, sTag) > 0
s = GetSubstring(sHTML, sTag, "' style='width:200px;'")
s = Replace(s, "'", "")
sHTML = Replace$(sHTML, sTag, vbNullString, 1, 1)
If InStr(s, "<img") = 0 And s <> "" Then
If theStatus(x) = "" Then
theStatus(x) = "No updates"
End If
CreateBalloon Me.PicFBFriends(pictureCount), PicFBFriends(pictureCount).hwnd, theStatus(x), szBalloon, False, s, etiInfo
pictureCount = pictureCount + 1
x = x + 1
DoEvents
End If
Loop
End If
The problem is that if at first the "If InStr(s, "class='fstatus'") = 0 Then" does = 0 then it exits out of everything and doesnt keep going to the next instance of the string.. I need it to do this but i am unable to figure out a way of doing it! It only happens like that if, and only, the first string doesnt contain the "class='fstatus".
Any help would be great!
David
Last edited by Stealthrt; Apr 26th, 2008 at 08:52 PM.
I think you're making this more complicated than need be. A Do-Loop and InStr will do it all. I've done plenty of 'reading' HTML codes and getting what I need.
The key is you need to find what is common to every element you want to capture. For example, does every number you want to capture, is it followed by  ? If so, you can key for that. Then just back up several charactes to get the number. You can do subsequent keys on "</tr> or whatever else takes you away from the last capture and gets you set to find the next one.
If all are exactly the same except the first one, put in a flag (boolean) and for the first one look for string A$, but after that only look for string B$.
I generally paste the HTML code into Notepad, then use CTRL-F to search for key phrases, testing what they find in the code, until I narrow down to a key that gets my desired elements only.
Can you just paste the full HTML code into a notepad file and attach it here so we can take a look at it?