Results 1 to 24 of 24

Thread: Copy all websites page names from vb 6

  1. #1

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Copy all websites page names from vb 6

    I am trying to determine if websites i have exchanged links with are still linking back to my site.
    the link back to me could be on any page so i need to search every page until found
    i have all the main sites url's but how can i find all the other pages ?
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  2. #2
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,206

    Re: Copy all websites page names from vb 6

    Scan the page for links that have the same domain and put the matches in a list. Repeat for all pages found this way.

  3. #3

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    thanks
    How would i do that ?
    Example:
    http://www.diyplas.co.uk/
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  4. #4

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    I found this but don't know how to use it ;
    http://www.example-code.com/vb/spider_simpleCrawler.asp
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  5. #5
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,206

    Re: Copy all websites page names from vb 6

    Basically you just need to get the html from the site and scan through it there are a few ways to do this, one of which is the trusty INSTR() function. I think there are some other functions that you can use the web browser control for but those I have never really looked at. Never had much desire nor need to do any web scraping and I use 3rd party tools for any HTTP or FTP functions in my own programs so I can't be of much help beyond this. Maybe someone else can chime in and give more info.

  6. #6

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    i bought a program a few yrs ago that did what i needed, but now it crashes with a type mismatch and get no response from the vendor. I bought another a few days, but already got a refund because of errors and crashes Most of the programs do not scan the whole remote site, but want you to supply the url of where the link is located.
    At this point I would be happy just to get all the page urls on a wesite
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  7. #7
    PowerPoster
    Join Date
    Jan 2008
    Posts
    11,074

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by isnoend07 View Post
    I found this but don't know how to use it ;
    http://www.example-code.com/vb/spider_simpleCrawler.asp
    All that does is extract links from the web site beginning at the first pages referenced.

    For example, if you used http://www.vbforums.com as the argument for the first page it will get all links found on that page and very page referenced by the links

    It's easy to use. Just download the ActiveX control, pop it on your Form, run the code example.

    Here's is that code in a VB Project. You will need to get your own ActiveX and install it so it will be able to use with the project in the zip file.
    Attached Files Attached Files
    Last edited by jmsrickland; Feb 19th, 2013 at 05:06 PM.


    Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.

  8. #8

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by jmsrickland View Post
    All that does is extract links from the web site beginning at the first pages referenced.

    For example, if you used http://www.vbforums.com as the argument for the first page it will get all links found on that page and very page referenced by the links

    It's easy to use. Just download the ActiveX control, pop it on your Form, run the code example.

    Here's is that code in a VB Project. You will need to get your own ActiveX and install it so it will be able to use with the project in the zip file.
    thanks for adding that. that would work if i could count on their main sites having links to all their pages, but i know that won't work. The spider will work sending it one page at a time and check for links back to me. i know it would have to run a long time as one i purchased ran all night. some sites i link to have over 6500 pages.
    The deal i have with linking websites is:
    You link to me and i will link to you. You remove my link and i will remove yours. some of the links i have had for over 10 years, but need to check them.
    Too bad the software i purchased worked. The method worked perfect for me.
    You give it the url to your links page and it collects all the outgoing links and scans each site and gives a report
    1. Not found
    2.Direct link
    3.Timed out
    4.404 page not found
    and a few more
    Last edited by isnoend07; Feb 19th, 2013 at 06:22 PM.
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  9. #9
    PowerPoster
    Join Date
    Jan 2008
    Posts
    11,074

    Re: Copy all websites page names from vb 6

    What do you mean too bad the software you purchased worked. If I purchased any software I would want it to work.

    Anyway, are you satisfied with using that ActiveX on the sample I posted. It's basically the same as what you linked to I just changed it to work for me and for testing I used this site to start with. I can't use my own sites as they have no links to other sites but you can figure out what and where to begin.


    Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.

  10. #10

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by jmsrickland View Post
    What do you mean too bad the software you purchased worked. If I purchased any software I would want it to work.
    That was a typo I meant too bad it did not work. If it did work this post would have never happened, did not realize it was so difficult to get all the page urls when you know the Home pages url

    Anyway, are you satisfied with using that ActiveX on the sample I posted. It's basically the same as what you linked to I just changed it to work for me and for testing I used this site to start with. I can't use my own sites as they have no links to other sites but you can figure out what and where to begin.
    That was a typo I meant too bad it did not work. If it did work this post would have never happened, did not realize it was so difficult to get all the page urls when you know the Home pages url
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  11. #11
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Copy all websites page names from vb 6

    Did you get it working?

    So the website you gave, which website (Link) would you like to find out of it? If you can work on a better example, we might be able to help.
    Meanwhile I am working an example i could give you as soon as i can finish it.

  12. #12
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Copy all websites page names from vb 6

    i just started.. looks easier than it is of course

    so for the link you gave us, look what in red is in Related Link (Page 2)

    Code:
    Option Explicit
    
    Const MyLink = "gardenbuildings.com"
    Run the code and see it will find the page, but this is far from done
    what i mean far from done is it only check the main Url and the link off of the main Url
    you could remove duplicates from listbox and keep going with the listbox's items URL and grab all links from that url

    but this is where it becomes harder because there is alot of links that will be the same
    for example, the home link
    I guess you could make an array with all the links you have visited (grabbed the links off) and check once in a while to make sure the listbox does not contain any of the sites already visited

    i will try to work on this a bit more tomorrow
    Attached Files Attached Files

  13. #13
    Banned
    Join Date
    Nov 2012
    Posts
    1,171

    Re: Copy all websites page names from vb 6

    sorry posted edited removed my codes ,dun wana wast time posting anymore
    sing u ignore main thing for ur app , do it hard way
    Last edited by ladoo; Feb 20th, 2013 at 06:09 PM.

  14. #14

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    thanks for adding that
    The problem with adding a html page. It adds another step in link building process and the link may have been moved to another page.
    I have found this;
    http://www.planet-source-code.com/vb...32760&lngWId=1
    with some tweaking i think it will work, if not i will try your code, thanks for adding it
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  15. #15

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by Max187Boucher View Post
    i just started.. looks easier than it is of course

    so for the link you gave us, look what in red is in Related Link (Page 2)

    Code:
    Option Explicit
    
    Const MyLink = "gardenbuildings.com"
    Run the code and see it will find the page, but this is far from done
    what i mean far from done is it only check the main Url and the link off of the main Url
    you could remove duplicates from listbox and keep going with the listbox's items URL and grab all links from that url

    but this is where it becomes harder because there is alot of links that will be the same
    for example, the home link
    I guess you could make an array with all the links you have visited (grabbed the links off) and check once in a while to make sure the listbox does not contain any of the sites already visited

    i will try to work on this a bit more tomorrow
    thanks for adding that
    you are right about duplicates, but it did find "gardenbuildings.com"
    does it ever stop?
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  16. #16

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    for some reason i have stopped getting email notifactions on this thread. I will investigate
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  17. #17

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by isnoend07 View Post
    thanks for adding that
    you are right about duplicates, but it did find "gardenbuildings.com"
    does it ever stop?
    I added this by Hack:

    Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
    (ByVal hwnd As Long, _
    ByVal wMsg As Long, _
    ByVal wParam As Long, _
    lParam As Any) As Long

    Private Const LB_FINDSTRINGEXACT = &H1A2

    Private Sub AddToListbox(strListBItem As String, LB As ListBox)

    Dim lngListIndex As Long

    ' lngListIndex is the ListIndex if the item is found
    lngListIndex = SendMessage(LB.hwnd, LB_FINDSTRINGEXACT, -1, ByVal strListBItem)
    If lngListIndex = -1 Then
    LB.AddItem strListBItem
    Else
    MsgBox strListBItem & " is already in the Listbox"
    Exit Sub
    End If
    End Sub

    Private Sub Command1_Click()
    AddToListBox Text1.Text, List1
    End Sub
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  18. #18

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by Max187Boucher View Post
    i just started.. looks easier than it is of course

    so for the link you gave us, look what in red is in Related Link (Page 2)

    Code:
    Option Explicit
    
    
    
    
    
    
    
    
    Const MyLink = "gardenbuildings.com"
    Run the code and see it will find the page, but this is far from done
    what i mean far from done is it only check the main Url and the link off of the main Url
    you could remove duplicates from listbox and keep going with the listbox's items URL and grab all links from that url

    but this is where it becomes harder because there is alot of links that will be the same
    for example, the home link
    I guess you could make an array with all the links you have visited (grabbed the links off) and check once in a while to make sure the listbox does not contain any of the sites already visited

    i will try to work on this a bit more tomorrow
    I am fiddling with your code this seems to be what i need and seems to work
    I am adding to not include duplicates+
    added another listbox with about 50 sites i am linking to that get filled from my sites links page.
    Plan on adding a report at the end of each sites results
    Thanks for adding that code
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  19. #19
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Copy all websites page names from vb 6

    I will be here all night and access to pc.
    So if you want to ask questions it is the time.
    I will be working on it a bit again, hopefully i can get it to work a bit better.
    My project is not complete!

    If you notice it does work good, but it only looks at the main page (gets all links) and also it looks at the links from the main page... so all the links collected from main page that are from the same website (see listbox, it filters the website if you noticed, it only add the links that are related to main page)... for example if its www.google.com if the link is www.vbforums.com/register it will filter that link because it is not related it is not google it is vbforums. So anyways, like i said it gets the main site "www.google.com" links and from all those links collected it will browse to them and get the links from those pages.

    Now the only problem left is keep going through the listbox (filtered) and keep open those links and grabbing all the links from those page. You will notice some duplicates in there, as you said your working on.
    I thought, all the links you already visited and grabbed links from, perhaps could go into an array or another listbox and maybe every 3-4 pages you could loop through listbox removing duplicates AND removing links that were already visited. Why would we want to do this?? Because if you dont then you would look at www.google.com about 100 times (just example) because some webpage have a "Home" button and every page will have that "Home" button, meaning it will grab that link every time. You will keep grabbing links from the "home" page every time and keep making duplicates of the links.

    Why is it working right now? It's because normally if you are linked to a website (maybe not all the time) your link will not be hidding behind 10 pages, it should be very close like in Partner Links off of the main page, meaning only one link away from the main page!

    Sorry if this is a little confusing i am just explaining what my program does so you understand and can modify it, but I'll be glad to help you through this as much as i can, since i am not an expert in programming myself!!
    Last edited by Max187Boucher; Feb 20th, 2013 at 07:01 PM.

  20. #20

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by Max187Boucher View Post
    I will be here all night and access to pc.
    So if you want to ask questions it is the time.
    I will be working on it a bit again, hopefully i can get it to work a bit better.
    My project is not complete!

    If you notice it does work good, but it only looks at the main page (gets all links) and also it looks at the links from the main page... so all the links collected from main page that are from the same website (see listbox, it filters the website if you noticed, it only add the links that are related to main page)... for example if its www.google.com if the link is www.vbforums.com/register it will filter that link because it is not related it is not google it is vbforums. So anyways, like i said it gets the main site "www.google.com" links and from all those links collected it will browse to them and get the links from those pages.

    Now the only problem left is keep going through the listbox (filtered) and keep open those links and grabbing all the links from those page. You will notice some duplicates in there, as you said your working on.
    I thought, all the links you already visited and grabbed links from, perhaps could go into an array or another listbox and maybe every 3-4 pages you could loop through listbox removing duplicates AND removing links that were already visited. Why would we want to do this?? Because if you dont then you would look at www.google.com about 100 times (just example) because some webpage have a "Home" button and every page will have that "Home" button, meaning it will grab that link every time. You will keep grabbing links from the "home" page every time and keep making duplicates of the links.

    Why is it working right now? It's because normally if you are linked to a website (maybe not all the time) your link will not be hidding behind 10 pages, it should be very close like in Partner Links off of the main page, meaning only one link away from the main page!

    Sorry if this is a little confusing i am just explaining what my program does so you understand and can modify it, but I'll be glad to help you through this as much as i can, since i am not an expert in programming myself!!
    thanks for your help on this
    I have been away a few hours,but am now getting back to the code. i have handled the duplicates(see post #17)
    i notice that a message box opens when found, but what if it's not found
    or some links i have had for years what if they no longer exist or a timed out.
    I have changed your code here:
    For i = 0 To UBound(ArrTemp)
    If InStr(1, ArrTemp(i), Replace(Replace(txtURL.Text, "http://", ""), "www.", "")) <> 0 Then
    AddToListbox ArrTemp(i), lstLinks

    Private Sub AddToListbox(strListBItem As String, LB As ListBox)

    Dim lngListIndex As Long

    ' lngListIndex is the ListIndex if the item is found
    lngListIndex = SendMessage(LB.hwnd, LB_FINDSTRINGEXACT, -1, ByVal strListBItem)
    If lngListIndex = -1 Then
    LB.AddItem strListBItem
    mPagesScanned = mPagesScanned + 1
    lblPagesScanned = "Pages Scanned " & mPagesScanned
    Else
    'MsgBox strListBItem & " is already in the Listbox"
    Exit Sub
    End If
    End Sub
    at this point I am not sure what i need, i think this part of the code needs moving until all the pages are don scanning:
    Set LinkCollection = Nothing
    Set Element = Nothing
    Set objDoc = Nothing
    Set objHtml = Nothing
    Yes ?
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  21. #21
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Copy all websites page names from vb 6

    i fixed code but i did not add anything else... i didnt work on this to much but hopefully tomorrow.

    Code:
    Option Explicit
    
    Const MyLink = "gardenbuildings.com"
    'NO   "http://www."
    'OR   ".com/otherstuffhere" ....
    'EXAMPLE: "google.com", "vbforums.com"
    'AND   Make sure letters are all LowerCase  :)
    
    Private IE As Object
    Private TimerSeconds As Integer
    Private TimerMinutes As Integer
    Private iUbound As Long
    Private WebPageName As String
    Private PageLinks() As String
    Private MainPageLinks() As String
    
    
    Private Function HtmlObject_Search(sUrl As String) As String()
    Dim i As Integer
    Dim ArrTemp() As String
    Dim objHtml As MSHTML.HTMLDocument
    Dim objDoc As MSHTML.HTMLDocument
    Dim LinkCollection As MSHTML.IHTMLElementCollection
    Dim Element As MSHTML.HTMLInputElement
    
    If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
           
        Set objHtml = New HTMLDocument
        Set objDoc = objHtml.createDocumentFromUrl(sUrl, vbNullString)
          
          Do While objDoc.readyState <> "complete"
            DoEvents
          Loop
          
        If WebPageName = "" Then WebPageName = objDoc.location
      
        Set LinkCollection = objDoc.getElementsByTagName("a")
        
            ReDim ArrTemp(LinkCollection.length)
    
              For Each Element In LinkCollection
                ArrTemp(i) = Element
                
                If InStr(1, ArrTemp(i), MyLink) <> 0 Then
                  MsgBox "Found Link" & vbNewLine & ArrTemp(i)
                  lblStatus.Caption = "Found Website"
                  Exit Function
                End If
                i = i + 1
              Next Element
        
        HtmlObject_Search = ArrTemp
        
        Set LinkCollection = Nothing
        Set Element = Nothing
        Set objDoc = Nothing
        Set objHtml = Nothing
          
      ChangeStatus ""
      
        Erase PageLinks
        Exit Function
        
    ErrExit:
      ChangeStatus "Error while searching (" & Err.Description & ")"
    End Function
    
    
    
    Private Function InternetExplorer_Search(sUrl As String) As String()
    Dim i As Integer
    Dim ArrTemp() As String
    Dim LinkCollection As MSHTML.IHTMLElementCollection
    Dim Element As MSHTML.HTMLInputElement
    
      If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
     
          IE.navigate sUrl
          
          Do While IE.readyState <> 4
            DoEvents
          Loop
          
          If WebPageName = "" Then WebPageName = IE.LocationURL
      
          Set LinkCollection = IE.document.getElementsByTagName("a")
          
            ReDim ArrTemp(LinkCollection.length)
              For Each Element In LinkCollection
              DoEvents
                ArrTemp(i) = Element
                If InStr(1, ArrTemp(i), MyLink) <> 0 Then
                  MsgBox "Found Link" & vbNewLine & ArrTemp(i)
                  lblStatus.Caption = "Found Website"
                  Exit Function
                End If
                i = i + 1
              Next Element
    
          InternetExplorer_Search = ArrTemp
          
          ChangeStatus ""
          
          Set LinkCollection = Nothing
          Set Element = Nothing
          Erase PageLinks
          
        Exit Function
        
    ErrExit:
      ChangeStatus "Error while searching (" & Err.Description & ")"
    End Function
    
    
    Private Sub ContinueSearch()
    
    End Sub
    
    
    Private Sub cmdStart_Click()
    Dim i As Integer
    Dim k As Integer
    Dim ArrLinks() As String
    Dim ArrTemp() As String
      
      txtURL.Text = Replace(txtURL.Text, "https://", "")
      txtURL.Text = Replace(txtURL.Text, "http://", "")
      txtURL.Text = Replace(txtURL.Text, "www.", "")
      txtURL.Text = Replace(txtURL.Text, "/", "")
      
      Timer1.Enabled = True
      WebPageName = ""
      
        If optInternetExplorer.Value = True Then
          ChangeStatus "Loading IE Object......"
          Set IE = CreateObject("InternetExplorer.Application")
          IE.silent = True
          
          ArrLinks = InternetExplorer_Search("http://www." & txtURL.Text & "/")
    
            For k = 0 To UBound(ArrLinks)
              
              ChangeStatus ArrLinks(k)
              ArrTemp = InternetExplorer_Search(ArrLinks(k))
    
                lstLinks.Visible = False
                If lblStatus.Caption = "Found Website" Then Exit For
                    
                    For i = 0 To UBound(ArrTemp)
                      If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
                          lstLinks.AddItem ArrTemp(i)
                            If LCase(ArrTemp(i)) = LCase(WebPageName) Then
                                lstLinks.RemoveItem lstLinks.ListCount - 1
                            End If
                      End If
                    Next i
                    
                lstLinks.Visible = True
                
            Next k
            IE.Quit
            Set IE = Nothing
            
        Else
        
          ChangeStatus "Loading HTML Object......"
          ArrLinks = HtmlObject_Search("http://www." & txtURL.Text & "/")
            For k = 0 To UBound(ArrLinks)
              
              ChangeStatus ArrLinks(k)
              ArrTemp = HtmlObject_Search(ArrLinks(k))
    
                lstLinks.Visible = False
                If lblStatus.Caption = "Found Website" Then Exit For
                    
                    For i = 0 To UBound(ArrTemp)
                      If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
                          lstLinks.AddItem ArrTemp(i)
                            If LCase(ArrTemp(i)) = LCase(WebPageName) Then
                                lstLinks.RemoveItem lstLinks.ListCount - 1
                              End If
                      End If
                    Next i
                    
                lstLinks.Visible = True
                
            Next k
    
      
        End If
      Timer1.Enabled = False
      
      ChangeStatus lblStatus.Caption & Space(3) & "Total Time: " & CStr(TimerMinutes) & "min " & CStr(TimerSeconds) & "sec"
      Debug.Print ""
      TimerSeconds = 0
      TimerMinutes = 0
    End Sub
    
    Private Sub ChangeStatus(sText As String)
      lblStatus.Caption = sText
    End Sub
    
    Private Sub Timer1_Timer()
      TimerSeconds = TimerSeconds + 1
        If TimerSeconds = 60 Then
          TimerSeconds = 0
          TimerMinutes = TimerMinutes + 1
        End If
    End Sub
    and you could add your dupe code in there

  22. #22

    Thread Starter
    PowerPoster isnoend07's Avatar
    Join Date
    Feb 2007
    Posts
    3,237

    Re: Copy all websites page names from vb 6

    Quote Originally Posted by Max187Boucher View Post
    i fixed code but i did not add anything else... i didnt work on this to much but hopefully tomorrow.

    Code:
    Option Explicit
    
    Const MyLink = "gardenbuildings.com"
    'NO   "http://www."
    'OR   ".com/otherstuffhere" ....
    'EXAMPLE: "google.com", "vbforums.com"
    'AND   Make sure letters are all LowerCase  :)
    
    Private IE As Object
    Private TimerSeconds As Integer
    Private TimerMinutes As Integer
    Private iUbound As Long
    Private WebPageName As String
    Private PageLinks() As String
    Private MainPageLinks() As String
    
    
    Private Function HtmlObject_Search(sUrl As String) As String()
    Dim i As Integer
    Dim ArrTemp() As String
    Dim objHtml As MSHTML.HTMLDocument
    Dim objDoc As MSHTML.HTMLDocument
    Dim LinkCollection As MSHTML.IHTMLElementCollection
    Dim Element As MSHTML.HTMLInputElement
    
    If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
           
        Set objHtml = New HTMLDocument
        Set objDoc = objHtml.createDocumentFromUrl(sUrl, vbNullString)
          
          Do While objDoc.readyState <> "complete"
            DoEvents
          Loop
          
        If WebPageName = "" Then WebPageName = objDoc.location
      
        Set LinkCollection = objDoc.getElementsByTagName("a")
        
            ReDim ArrTemp(LinkCollection.length)
    
              For Each Element In LinkCollection
                ArrTemp(i) = Element
                
                If InStr(1, ArrTemp(i), MyLink) <> 0 Then
                  MsgBox "Found Link" & vbNewLine & ArrTemp(i)
                  lblStatus.Caption = "Found Website"
                  Exit Function
                End If
                i = i + 1
              Next Element
        
        HtmlObject_Search = ArrTemp
        
        Set LinkCollection = Nothing
        Set Element = Nothing
        Set objDoc = Nothing
        Set objHtml = Nothing
          
      ChangeStatus ""
      
        Erase PageLinks
        Exit Function
        
    ErrExit:
      ChangeStatus "Error while searching (" & Err.Description & ")"
    End Function
    
    
    
    Private Function InternetExplorer_Search(sUrl As String) As String()
    Dim i As Integer
    Dim ArrTemp() As String
    Dim LinkCollection As MSHTML.IHTMLElementCollection
    Dim Element As MSHTML.HTMLInputElement
    
      If Me.chkIgnoreErrors.Value = vbChecked Then On Error Resume Next Else On Error GoTo ErrExit
     
          IE.navigate sUrl
          
          Do While IE.readyState <> 4
            DoEvents
          Loop
          
          If WebPageName = "" Then WebPageName = IE.LocationURL
      
          Set LinkCollection = IE.document.getElementsByTagName("a")
          
            ReDim ArrTemp(LinkCollection.length)
              For Each Element In LinkCollection
              DoEvents
                ArrTemp(i) = Element
                If InStr(1, ArrTemp(i), MyLink) <> 0 Then
                  MsgBox "Found Link" & vbNewLine & ArrTemp(i)
                  lblStatus.Caption = "Found Website"
                  Exit Function
                End If
                i = i + 1
              Next Element
    
          InternetExplorer_Search = ArrTemp
          
          ChangeStatus ""
          
          Set LinkCollection = Nothing
          Set Element = Nothing
          Erase PageLinks
          
        Exit Function
        
    ErrExit:
      ChangeStatus "Error while searching (" & Err.Description & ")"
    End Function
    
    
    Private Sub ContinueSearch()
    
    End Sub
    
    
    Private Sub cmdStart_Click()
    Dim i As Integer
    Dim k As Integer
    Dim ArrLinks() As String
    Dim ArrTemp() As String
      
      txtURL.Text = Replace(txtURL.Text, "https://", "")
      txtURL.Text = Replace(txtURL.Text, "http://", "")
      txtURL.Text = Replace(txtURL.Text, "www.", "")
      txtURL.Text = Replace(txtURL.Text, "/", "")
      
      Timer1.Enabled = True
      WebPageName = ""
      
        If optInternetExplorer.Value = True Then
          ChangeStatus "Loading IE Object......"
          Set IE = CreateObject("InternetExplorer.Application")
          IE.silent = True
          
          ArrLinks = InternetExplorer_Search("http://www." & txtURL.Text & "/")
    
            For k = 0 To UBound(ArrLinks)
              
              ChangeStatus ArrLinks(k)
              ArrTemp = InternetExplorer_Search(ArrLinks(k))
    
                lstLinks.Visible = False
                If lblStatus.Caption = "Found Website" Then Exit For
                    
                    For i = 0 To UBound(ArrTemp)
                      If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
                          lstLinks.AddItem ArrTemp(i)
                            If LCase(ArrTemp(i)) = LCase(WebPageName) Then
                                lstLinks.RemoveItem lstLinks.ListCount - 1
                            End If
                      End If
                    Next i
                    
                lstLinks.Visible = True
                
            Next k
            IE.Quit
            Set IE = Nothing
            
        Else
        
          ChangeStatus "Loading HTML Object......"
          ArrLinks = HtmlObject_Search("http://www." & txtURL.Text & "/")
            For k = 0 To UBound(ArrLinks)
              
              ChangeStatus ArrLinks(k)
              ArrTemp = HtmlObject_Search(ArrLinks(k))
    
                lstLinks.Visible = False
                If lblStatus.Caption = "Found Website" Then Exit For
                    
                    For i = 0 To UBound(ArrTemp)
                      If InStr(1, ArrTemp(i), txtURL.Text) <> 0 Then
                          lstLinks.AddItem ArrTemp(i)
                            If LCase(ArrTemp(i)) = LCase(WebPageName) Then
                                lstLinks.RemoveItem lstLinks.ListCount - 1
                              End If
                      End If
                    Next i
                    
                lstLinks.Visible = True
                
            Next k
    
      
        End If
      Timer1.Enabled = False
      
      ChangeStatus lblStatus.Caption & Space(3) & "Total Time: " & CStr(TimerMinutes) & "min " & CStr(TimerSeconds) & "sec"
      Debug.Print ""
      TimerSeconds = 0
      TimerMinutes = 0
    End Sub
    
    Private Sub ChangeStatus(sText As String)
      lblStatus.Caption = sText
    End Sub
    
    Private Sub Timer1_Timer()
      TimerSeconds = TimerSeconds + 1
        If TimerSeconds = 60 Then
          TimerSeconds = 0
          TimerMinutes = TimerMinutes + 1
        End If
    End Sub
    and you could add your dupe code in there
    Hi
    Thanks for your help
    I have put your code in a loop, but keep getting errors like this:

    Private Sub cmdStart_Click()
    Dim i As Integer

    For i = 0 To List1.ListCount - 1
    Call Start(List1.List(i))
    lstLinks.Clear
    lblPagesScanned = ""
    Next i
    End Sub

    The items in the listbox are in this format:
    http://www.solarlightsplusmore.com/


    Sub Start(URL As String)
    Ihave changed this line
    'ArrLinks = InternetExplorer_Search("http://www." & txtURL.Text & "/")

    to this
    ArrLinks = HtmlObject_Search(URL)

    I have tried both engines IE and HTML with the same results

    I think some variables may need resetting between calls
    Yes ?
    Waiting for a full featured smart phone with out marrying a provider
    Go Android
    Go raiders

  23. #23
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Copy all websites page names from vb 6

    the problem is in the address i would think... what's in the URL variable?

  24. #24
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Copy all websites page names from vb 6

    it would help me to see whole project.. i do not know whats in list1 either

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width