I am trying to determine if websites i have exchanged links with are still linking back to my site.
the link back to me could be on any page so i need to search every page until found
i have all the main sites url's but how can i find all the other pages ?
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
Basically you just need to get the html from the site and scan through it there are a few ways to do this, one of which is the trusty INSTR() function. I think there are some other functions that you can use the web browser control for but those I have never really looked at. Never had much desire nor need to do any web scraping and I use 3rd party tools for any HTTP or FTP functions in my own programs so I can't be of much help beyond this. Maybe someone else can chime in and give more info.
i bought a program a few yrs ago that did what i needed, but now it crashes with a type mismatch and get no response from the vendor. I bought another a few days, but already got a refund because of errors and crashes Most of the programs do not scan the whole remote site, but want you to supply the url of where the link is located.
At this point I would be happy just to get all the page urls on a wesite
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
All that does is extract links from the web site beginning at the first pages referenced.
For example, if you used http://www.vbforums.com as the argument for the first page it will get all links found on that page and very page referenced by the links
It's easy to use. Just download the ActiveX control, pop it on your Form, run the code example.
Here's is that code in a VB Project. You will need to get your own ActiveX and install it so it will be able to use with the project in the zip file.
Last edited by jmsrickland; Feb 19th, 2013 at 05:06 PM.
Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.
All that does is extract links from the web site beginning at the first pages referenced.
For example, if you used http://www.vbforums.com as the argument for the first page it will get all links found on that page and very page referenced by the links
It's easy to use. Just download the ActiveX control, pop it on your Form, run the code example.
Here's is that code in a VB Project. You will need to get your own ActiveX and install it so it will be able to use with the project in the zip file.
thanks for adding that. that would work if i could count on their main sites having links to all their pages, but i know that won't work. The spider will work sending it one page at a time and check for links back to me. i know it would have to run a long time as one i purchased ran all night. some sites i link to have over 6500 pages.
The deal i have with linking websites is:
You link to me and i will link to you. You remove my link and i will remove yours. some of the links i have had for over 10 years, but need to check them.
Too bad the software i purchased worked. The method worked perfect for me.
You give it the url to your links page and it collects all the outgoing links and scans each site and gives a report
1. Not found
2.Direct link
3.Timed out
4.404 page not found
and a few more
Last edited by isnoend07; Feb 19th, 2013 at 06:22 PM.
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
What do you mean too bad the software you purchased worked. If I purchased any software I would want it to work.
Anyway, are you satisfied with using that ActiveX on the sample I posted. It's basically the same as what you linked to I just changed it to work for me and for testing I used this site to start with. I can't use my own sites as they have no links to other sites but you can figure out what and where to begin.
Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.
What do you mean too bad the software you purchased worked. If I purchased any software I would want it to work.
That was a typo I meant too bad it did not work. If it did work this post would have never happened, did not realize it was so difficult to get all the page urls when you know the Home pages url
Anyway, are you satisfied with using that ActiveX on the sample I posted. It's basically the same as what you linked to I just changed it to work for me and for testing I used this site to start with. I can't use my own sites as they have no links to other sites but you can figure out what and where to begin.
That was a typo I meant too bad it did not work. If it did work this post would have never happened, did not realize it was so difficult to get all the page urls when you know the Home pages url
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
So the website you gave, which website (Link) would you like to find out of it? If you can work on a better example, we might be able to help.
Meanwhile I am working an example i could give you as soon as i can finish it.
Run the code and see it will find the page, but this is far from done
what i mean far from done is it only check the main Url and the link off of the main Url
you could remove duplicates from listbox and keep going with the listbox's items URL and grab all links from that url
but this is where it becomes harder because there is alot of links that will be the same
for example, the home link
I guess you could make an array with all the links you have visited (grabbed the links off) and check once in a while to make sure the listbox does not contain any of the sites already visited
thanks for adding that
The problem with adding a html page. It adds another step in link building process and the link may have been moved to another page.
I have found this; http://www.planet-source-code.com/vb...32760&lngWId=1
with some tweaking i think it will work, if not i will try your code, thanks for adding it
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
Run the code and see it will find the page, but this is far from done
what i mean far from done is it only check the main Url and the link off of the main Url
you could remove duplicates from listbox and keep going with the listbox's items URL and grab all links from that url
but this is where it becomes harder because there is alot of links that will be the same
for example, the home link
I guess you could make an array with all the links you have visited (grabbed the links off) and check once in a while to make sure the listbox does not contain any of the sites already visited
i will try to work on this a bit more tomorrow
thanks for adding that
you are right about duplicates, but it did find "gardenbuildings.com"
does it ever stop?
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
thanks for adding that
you are right about duplicates, but it did find "gardenbuildings.com"
does it ever stop?
I added this by Hack:
Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
(ByVal hwnd As Long, _
ByVal wMsg As Long, _
ByVal wParam As Long, _
lParam As Any) As Long
Private Const LB_FINDSTRINGEXACT = &H1A2
Private Sub AddToListbox(strListBItem As String, LB As ListBox)
Dim lngListIndex As Long
' lngListIndex is the ListIndex if the item is found
lngListIndex = SendMessage(LB.hwnd, LB_FINDSTRINGEXACT, -1, ByVal strListBItem)
If lngListIndex = -1 Then
LB.AddItem strListBItem
Else
MsgBox strListBItem & " is already in the Listbox"
Exit Sub
End If
End Sub
Private Sub Command1_Click()
AddToListBox Text1.Text, List1
End Sub
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
Run the code and see it will find the page, but this is far from done
what i mean far from done is it only check the main Url and the link off of the main Url
you could remove duplicates from listbox and keep going with the listbox's items URL and grab all links from that url
but this is where it becomes harder because there is alot of links that will be the same
for example, the home link
I guess you could make an array with all the links you have visited (grabbed the links off) and check once in a while to make sure the listbox does not contain any of the sites already visited
i will try to work on this a bit more tomorrow
I am fiddling with your code this seems to be what i need and seems to work
I am adding to not include duplicates+
added another listbox with about 50 sites i am linking to that get filled from my sites links page.
Plan on adding a report at the end of each sites results
Thanks for adding that code
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
I will be here all night and access to pc.
So if you want to ask questions it is the time.
I will be working on it a bit again, hopefully i can get it to work a bit better.
My project is not complete!
If you notice it does work good, but it only looks at the main page (gets all links) and also it looks at the links from the main page... so all the links collected from main page that are from the same website (see listbox, it filters the website if you noticed, it only add the links that are related to main page)... for example if its www.google.com if the link is www.vbforums.com/register it will filter that link because it is not related it is not google it is vbforums. So anyways, like i said it gets the main site "www.google.com" links and from all those links collected it will browse to them and get the links from those pages.
Now the only problem left is keep going through the listbox (filtered) and keep open those links and grabbing all the links from those page. You will notice some duplicates in there, as you said your working on.
I thought, all the links you already visited and grabbed links from, perhaps could go into an array or another listbox and maybe every 3-4 pages you could loop through listbox removing duplicates AND removing links that were already visited. Why would we want to do this?? Because if you dont then you would look at www.google.com about 100 times (just example) because some webpage have a "Home" button and every page will have that "Home" button, meaning it will grab that link every time. You will keep grabbing links from the "home" page every time and keep making duplicates of the links.
Why is it working right now? It's because normally if you are linked to a website (maybe not all the time) your link will not be hidding behind 10 pages, it should be very close like in Partner Links off of the main page, meaning only one link away from the main page!
Sorry if this is a little confusing i am just explaining what my program does so you understand and can modify it, but I'll be glad to help you through this as much as i can, since i am not an expert in programming myself!!
Last edited by Max187Boucher; Feb 20th, 2013 at 07:01 PM.
I will be here all night and access to pc.
So if you want to ask questions it is the time.
I will be working on it a bit again, hopefully i can get it to work a bit better.
My project is not complete!
If you notice it does work good, but it only looks at the main page (gets all links) and also it looks at the links from the main page... so all the links collected from main page that are from the same website (see listbox, it filters the website if you noticed, it only add the links that are related to main page)... for example if its www.google.com if the link is www.vbforums.com/register it will filter that link because it is not related it is not google it is vbforums. So anyways, like i said it gets the main site "www.google.com" links and from all those links collected it will browse to them and get the links from those pages.
Now the only problem left is keep going through the listbox (filtered) and keep open those links and grabbing all the links from those page. You will notice some duplicates in there, as you said your working on.
I thought, all the links you already visited and grabbed links from, perhaps could go into an array or another listbox and maybe every 3-4 pages you could loop through listbox removing duplicates AND removing links that were already visited. Why would we want to do this?? Because if you dont then you would look at www.google.com about 100 times (just example) because some webpage have a "Home" button and every page will have that "Home" button, meaning it will grab that link every time. You will keep grabbing links from the "home" page every time and keep making duplicates of the links.
Why is it working right now? It's because normally if you are linked to a website (maybe not all the time) your link will not be hidding behind 10 pages, it should be very close like in Partner Links off of the main page, meaning only one link away from the main page!
Sorry if this is a little confusing i am just explaining what my program does so you understand and can modify it, but I'll be glad to help you through this as much as i can, since i am not an expert in programming myself!!
thanks for your help on this
I have been away a few hours,but am now getting back to the code. i have handled the duplicates(see post #17)
i notice that a message box opens when found, but what if it's not found
or some links i have had for years what if they no longer exist or a timed out.
I have changed your code here:
For i = 0 To UBound(ArrTemp)
If InStr(1, ArrTemp(i), Replace(Replace(txtURL.Text, "http://", ""), "www.", "")) <> 0 Then
AddToListbox ArrTemp(i), lstLinks
Private Sub AddToListbox(strListBItem As String, LB As ListBox)
Dim lngListIndex As Long
' lngListIndex is the ListIndex if the item is found
lngListIndex = SendMessage(LB.hwnd, LB_FINDSTRINGEXACT, -1, ByVal strListBItem)
If lngListIndex = -1 Then
LB.AddItem strListBItem
mPagesScanned = mPagesScanned + 1
lblPagesScanned = "Pages Scanned " & mPagesScanned
Else
'MsgBox strListBItem & " is already in the Listbox"
Exit Sub
End If
End Sub
at this point I am not sure what i need, i think this part of the code needs moving until all the pages are don scanning:
Set LinkCollection = Nothing
Set Element = Nothing
Set objDoc = Nothing
Set objHtml = Nothing
Yes ?
Waiting for a full featured smart phone with out marrying a provider
Go Android
Go raiders
i fixed code but i did not add anything else... i didnt work on this to much but hopefully tomorrow.
Code:
Option Explicit
Const MyLink = "gardenbuildings.com"
'NO "http://www."
'OR ".com/otherstuffhere" ....
'EXAMPLE: "google.com", "vbforums.com"
'AND Make sure letters are all LowerCase :)
Private IE As Object
Private TimerSeconds As Integer
Private TimerMinutes As Integer
Private iUbound As Long
Private WebPageName As String
Private PageLinks() As String
Private MainPageLinks() As String
Private Function HtmlObject_Search(sUrl As String) As String()
Dim i As Integer
Dim ArrTemp() As String
Dim objHtml As MSHTML.HTMLDocument
Dim objDoc As MSHTML.HTMLDocument
Dim LinkCollection As MSHTML.IHTMLElementCollection
Dim Element As MSHTML.HTMLInputElement
If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
Set objHtml = New HTMLDocument
Set objDoc = objHtml.createDocumentFromUrl(sUrl, vbNullString)
Do While objDoc.readyState <> "complete"
DoEvents
Loop
If WebPageName = "" Then WebPageName = objDoc.location
Set LinkCollection = objDoc.getElementsByTagName("a")
ReDim ArrTemp(LinkCollection.length)
For Each Element In LinkCollection
ArrTemp(i) = Element
If InStr(1, ArrTemp(i), MyLink) <> 0 Then
MsgBox "Found Link" & vbNewLine & ArrTemp(i)
lblStatus.Caption = "Found Website"
Exit Function
End If
i = i + 1
Next Element
HtmlObject_Search = ArrTemp
Set LinkCollection = Nothing
Set Element = Nothing
Set objDoc = Nothing
Set objHtml = Nothing
ChangeStatus ""
Erase PageLinks
Exit Function
ErrExit:
ChangeStatus "Error while searching (" & Err.Description & ")"
End Function
Private Function InternetExplorer_Search(sUrl As String) As String()
Dim i As Integer
Dim ArrTemp() As String
Dim LinkCollection As MSHTML.IHTMLElementCollection
Dim Element As MSHTML.HTMLInputElement
If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
IE.navigate sUrl
Do While IE.readyState <> 4
DoEvents
Loop
If WebPageName = "" Then WebPageName = IE.LocationURL
Set LinkCollection = IE.document.getElementsByTagName("a")
ReDim ArrTemp(LinkCollection.length)
For Each Element In LinkCollection
DoEvents
ArrTemp(i) = Element
If InStr(1, ArrTemp(i), MyLink) <> 0 Then
MsgBox "Found Link" & vbNewLine & ArrTemp(i)
lblStatus.Caption = "Found Website"
Exit Function
End If
i = i + 1
Next Element
InternetExplorer_Search = ArrTemp
ChangeStatus ""
Set LinkCollection = Nothing
Set Element = Nothing
Erase PageLinks
Exit Function
ErrExit:
ChangeStatus "Error while searching (" & Err.Description & ")"
End Function
Private Sub ContinueSearch()
End Sub
Private Sub cmdStart_Click()
Dim i As Integer
Dim k As Integer
Dim ArrLinks() As String
Dim ArrTemp() As String
txtURL.Text = Replace(txtURL.Text, "https://", "")
txtURL.Text = Replace(txtURL.Text, "http://", "")
txtURL.Text = Replace(txtURL.Text, "www.", "")
txtURL.Text = Replace(txtURL.Text, "/", "")
Timer1.Enabled = True
WebPageName = ""
If optInternetExplorer.Value = True Then
ChangeStatus "Loading IE Object......"
Set IE = CreateObject("InternetExplorer.Application")
IE.silent = True
ArrLinks = InternetExplorer_Search("http://www." & txtURL.Text & "/")
For k = 0 To UBound(ArrLinks)
ChangeStatus ArrLinks(k)
ArrTemp = InternetExplorer_Search(ArrLinks(k))
lstLinks.Visible = False
If lblStatus.Caption = "Found Website" Then Exit For
For i = 0 To UBound(ArrTemp)
If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
lstLinks.AddItem ArrTemp(i)
If LCase(ArrTemp(i)) = LCase(WebPageName) Then
lstLinks.RemoveItem lstLinks.ListCount - 1
End If
End If
Next i
lstLinks.Visible = True
Next k
IE.Quit
Set IE = Nothing
Else
ChangeStatus "Loading HTML Object......"
ArrLinks = HtmlObject_Search("http://www." & txtURL.Text & "/")
For k = 0 To UBound(ArrLinks)
ChangeStatus ArrLinks(k)
ArrTemp = HtmlObject_Search(ArrLinks(k))
lstLinks.Visible = False
If lblStatus.Caption = "Found Website" Then Exit For
For i = 0 To UBound(ArrTemp)
If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
lstLinks.AddItem ArrTemp(i)
If LCase(ArrTemp(i)) = LCase(WebPageName) Then
lstLinks.RemoveItem lstLinks.ListCount - 1
End If
End If
Next i
lstLinks.Visible = True
Next k
End If
Timer1.Enabled = False
ChangeStatus lblStatus.Caption & Space(3) & "Total Time: " & CStr(TimerMinutes) & "min " & CStr(TimerSeconds) & "sec"
Debug.Print ""
TimerSeconds = 0
TimerMinutes = 0
End Sub
Private Sub ChangeStatus(sText As String)
lblStatus.Caption = sText
End Sub
Private Sub Timer1_Timer()
TimerSeconds = TimerSeconds + 1
If TimerSeconds = 60 Then
TimerSeconds = 0
TimerMinutes = TimerMinutes + 1
End If
End Sub
i fixed code but i did not add anything else... i didnt work on this to much but hopefully tomorrow.
Code:
Option Explicit
Const MyLink = "gardenbuildings.com"
'NO "http://www."
'OR ".com/otherstuffhere" ....
'EXAMPLE: "google.com", "vbforums.com"
'AND Make sure letters are all LowerCase :)
Private IE As Object
Private TimerSeconds As Integer
Private TimerMinutes As Integer
Private iUbound As Long
Private WebPageName As String
Private PageLinks() As String
Private MainPageLinks() As String
Private Function HtmlObject_Search(sUrl As String) As String()
Dim i As Integer
Dim ArrTemp() As String
Dim objHtml As MSHTML.HTMLDocument
Dim objDoc As MSHTML.HTMLDocument
Dim LinkCollection As MSHTML.IHTMLElementCollection
Dim Element As MSHTML.HTMLInputElement
If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
Set objHtml = New HTMLDocument
Set objDoc = objHtml.createDocumentFromUrl(sUrl, vbNullString)
Do While objDoc.readyState <> "complete"
DoEvents
Loop
If WebPageName = "" Then WebPageName = objDoc.location
Set LinkCollection = objDoc.getElementsByTagName("a")
ReDim ArrTemp(LinkCollection.length)
For Each Element In LinkCollection
ArrTemp(i) = Element
If InStr(1, ArrTemp(i), MyLink) <> 0 Then
MsgBox "Found Link" & vbNewLine & ArrTemp(i)
lblStatus.Caption = "Found Website"
Exit Function
End If
i = i + 1
Next Element
HtmlObject_Search = ArrTemp
Set LinkCollection = Nothing
Set Element = Nothing
Set objDoc = Nothing
Set objHtml = Nothing
ChangeStatus ""
Erase PageLinks
Exit Function
ErrExit:
ChangeStatus "Error while searching (" & Err.Description & ")"
End Function
Private Function InternetExplorer_Search(sUrl As String) As String()
Dim i As Integer
Dim ArrTemp() As String
Dim LinkCollection As MSHTML.IHTMLElementCollection
Dim Element As MSHTML.HTMLInputElement
If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
IE.navigate sUrl
Do While IE.readyState <> 4
DoEvents
Loop
If WebPageName = "" Then WebPageName = IE.LocationURL
Set LinkCollection = IE.document.getElementsByTagName("a")
ReDim ArrTemp(LinkCollection.length)
For Each Element In LinkCollection
DoEvents
ArrTemp(i) = Element
If InStr(1, ArrTemp(i), MyLink) <> 0 Then
MsgBox "Found Link" & vbNewLine & ArrTemp(i)
lblStatus.Caption = "Found Website"
Exit Function
End If
i = i + 1
Next Element
InternetExplorer_Search = ArrTemp
ChangeStatus ""
Set LinkCollection = Nothing
Set Element = Nothing
Erase PageLinks
Exit Function
ErrExit:
ChangeStatus "Error while searching (" & Err.Description & ")"
End Function
Private Sub ContinueSearch()
End Sub
Private Sub cmdStart_Click()
Dim i As Integer
Dim k As Integer
Dim ArrLinks() As String
Dim ArrTemp() As String
txtURL.Text = Replace(txtURL.Text, "https://", "")
txtURL.Text = Replace(txtURL.Text, "http://", "")
txtURL.Text = Replace(txtURL.Text, "www.", "")
txtURL.Text = Replace(txtURL.Text, "/", "")
Timer1.Enabled = True
WebPageName = ""
If optInternetExplorer.Value = True Then
ChangeStatus "Loading IE Object......"
Set IE = CreateObject("InternetExplorer.Application")
IE.silent = True
ArrLinks = InternetExplorer_Search("http://www." & txtURL.Text & "/")
For k = 0 To UBound(ArrLinks)
ChangeStatus ArrLinks(k)
ArrTemp = InternetExplorer_Search(ArrLinks(k))
lstLinks.Visible = False
If lblStatus.Caption = "Found Website" Then Exit For
For i = 0 To UBound(ArrTemp)
If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
lstLinks.AddItem ArrTemp(i)
If LCase(ArrTemp(i)) = LCase(WebPageName) Then
lstLinks.RemoveItem lstLinks.ListCount - 1
End If
End If
Next i
lstLinks.Visible = True
Next k
IE.Quit
Set IE = Nothing
Else
ChangeStatus "Loading HTML Object......"
ArrLinks = HtmlObject_Search("http://www." & txtURL.Text & "/")
For k = 0 To UBound(ArrLinks)
ChangeStatus ArrLinks(k)
ArrTemp = HtmlObject_Search(ArrLinks(k))
lstLinks.Visible = False
If lblStatus.Caption = "Found Website" Then Exit For
For i = 0 To UBound(ArrTemp)
If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
lstLinks.AddItem ArrTemp(i)
If LCase(ArrTemp(i)) = LCase(WebPageName) Then
lstLinks.RemoveItem lstLinks.ListCount - 1
End If
End If
Next i
lstLinks.Visible = True
Next k
End If
Timer1.Enabled = False
ChangeStatus lblStatus.Caption & Space(3) & "Total Time: " & CStr(TimerMinutes) & "min " & CStr(TimerSeconds) & "sec"
Debug.Print ""
TimerSeconds = 0
TimerMinutes = 0
End Sub
Private Sub ChangeStatus(sText As String)
lblStatus.Caption = sText
End Sub
Private Sub Timer1_Timer()
TimerSeconds = TimerSeconds + 1
If TimerSeconds = 60 Then
TimerSeconds = 0
TimerMinutes = TimerMinutes + 1
End If
End Sub
and you could add your dupe code in there
Hi
Thanks for your help
I have put your code in a loop, but keep getting errors like this:
Private Sub cmdStart_Click()
Dim i As Integer
For i = 0 To List1.ListCount - 1
Call Start(List1.List(i))
lstLinks.Clear
lblPagesScanned = ""
Next i
End Sub