Not sure which way to go Parsing html
Ok, wow, not sure how to go about asking the question ...
So let me explain my delima .... I am getting two types of html source ... then again just might be going about this wrong ...
Ok I started out by going to http://www.edmunds.com/flipper/do/Me...rstNav=Gallery
Then I grabbed page source ... by right clicking and from the context menu getting page source. SO far so good right
Well looking at the page source ... I see right off the data I am needing which is this
Code:
{
id: '20245103',
thumb: '/pictures/VEHICLE/2009/Ford/2009.ford.f-150.20245103-ST.jpg',
full: '/pictures/VEHICLE/2009/Ford/2009.ford.f-150.20245103-E.jpg',
caption: '2009 Ford F-150 FX4 Extended Cab Shown',
credits: '(Photo courtesy of Ford Motor Company)',
desc: '2009 Ford F-150 FX4 Extended Cab Shown',
title: '2010 Ford F-150'
}
,{
id: '20245121',
thumb: '/pictures/VEHICLE/2009/Ford/2009.ford.f-150.20245121-ST.jpg',
full: '/pictures/VEHICLE/2009/Ford/2009.ford.f-150.20245121-E.jpg',
caption: '2009 Ford F-150 Platinum Crew Cab Shown',
credits: '(Photo courtesy of Ford Motor Company)',
desc: '2009 Ford F-150 Platinum Crew Cab Shown',
title: '2010 Ford F-150'
The lines that begin with full: well I need to get the image url ... well ultimately I want to download into a folder every one of the full: images in the source code
But this source code is different from what I am used to ...
Normally I would use a webbrowser control
Then basically do something like this
Code:
web1.Navigate("http://www.edmunds.com/flipper/do/MediaNav/make=50/model=F-150/firstNav=Gallery")
Do Until web1.ReadyState = WebBrowserReadyState.Complete
Application.DoEvents()
Loop
Then once the page is loaded I would try to get the table in which the data was located like this for example
Find the table
Code:
Dim myTable As HtmlElement = wb2.Document.All("dgCurrent")
dgcurrent in this case is from another page I parsed over a year ago
Once I have the table then long story short
Code:
'MAKE SURE WE GOT A VALID TABLE OBJECT
If myTable IsNot Nothing Then
'LOOP ALL ROW ELEMENTS IN TABLE
For Each MyElement As HtmlElement In myTable.GetElementsByTagName("TR")
'THERE SHOULD BE 2 TD ELEMENTS PER ROW
Dim myTDTags As HtmlElementCollection = MyElement.GetElementsByTagName("TD")
'IF WE GOT 2 TD ELEMENTS, WRITE THEM TO 2 ARRAYLIST
If myTDTags.Count = 2 Then
myPubAr.Add(myTDTags(0).InnerText)
myLocAr.Add(myTDTags(1).InnerText)
End If
Next
End If
' THEN DISPLAY IT INA A RICHTEXTBOX
However ... above code was used to parse a page that had the html tags and such
IN this case I am not sure how to get this
All though I did stumble onto something that did change the structure of the html source by simply loooking at the innerhtml
But I am not sure how to parse innerhtml
Looking at this page in a webbrowser control and simply doing this
Code:
Dim myHtml As String
myHtml = web1.Document.Body.InnerHtml
rtb1.Text = myHtml
What is displayed in the RTB is the same page but the source code is more proper and looks like I could parse it but for the life of me I cant figure out how to parse the innerhtml
Guys I need help
WHat I am wanting to accomplish here is get all the full size images and DL them to a folder
Any help is much appreciated