Results 1 to 13 of 13

Thread: [RESOLVED] How to Extract this text from html

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Resolved [RESOLVED] How to Extract this text from html

    Hi Dears,
    is there a way to extract this text from the current page:

    222.92.45.228:20861
    86.12.56.187:7212
    212.12.185.115:8080
    62.150.77.94:27640
    24.6.236.232:8081
    203.70.47.83:4355
    80.55.8.227:23176
    66.226.34.68:31733
    69.210.211.186:22788
    68.53.26.248:7212
    24.75.91.18:29122
    58.145.97.93:50050
    59.14.44.110:50050
    58.236.22.118:50050
    59.187.231.124:50050
    61.32.111.51:50050
    61.40.64.46:50050
    61.38.147.214:50050
    61.101.5.200:50050
    61.106.84.127:50050
    68.87.66.101:553
    69.147.39.38:553
    69.147.27.250:553
    61.32.75.186:8002
    24.255.32.185:8081
    125.250.185.108:9597
    203.133.27.128:15802
    203.160.1.170:553
    211.194.117.204:50050
    211.213.153.162:50050
    211.236.210.87:50050
    211.172.142.151:50050
    210.114.183.194:50050
    217.216.149.200:3382

    I try to get list of proxy from some pages
    Last edited by _Conan_; Mar 21st, 2006 at 02:37 PM.
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  2. #2
    KrisSiegel.com Kasracer's Avatar
    Join Date
    Jul 2003
    Location
    USA, Maryland
    Posts
    4,985

    Re: How to Extract this text from html

    Um, you could use a WebBrowser object and grab the Text through one of it's properties. Or you could use a Regex to try and extract the text.

    Honestly, it would be better if you generate these IP addresses and put them into an XML file (Unless you did not generate these).

    If you didn't generate these, then this is not a good idea. Many proxies require special agreements (even if they're free) and some things on some may be against the rules on others. Plus, you never know when the HTML will change which could throw off your parsing by alot. The site could even go down.
    KrisSiegel.com - My Personal Website with my blog and portfolio
    Don't Forget to Rate Posts!

    Free Icons: FamFamFam, VBCorner, VBAccelerator
    Useful Links: System.Security.SecureString Managed DPAPI Overview Part 1 Managed DPAPI Overview Part 2 MSDN, MSDN2, Comparing the Timer Classes

  3. #3
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: How to Extract this text from html

    If you can read the HTML into one long string, then you can use Regex in order to find all of the matches inside of the string... somthing like...
    VB Code:
    1. Dim TestString As String = "this 192.168.1.32:84 is a test 192.43.234.43:54"
    2.         Dim Regex As New System.Text.RegularExpressions.Regex("\d*.\d*.\d*.\d*:\d*")
    3.         Dim MyMatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(TestString)
    4.         For Each Match As System.Text.RegularExpressions.Match In MyMatches
    5.             MessageBox.Show(Match.Value) 'shows each IP address
    6.         Next
    Replacing "TestString" with your html string....

  4. #4

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: How to Extract this text from html

    many thanx Dears and thanx again to GigemBoy it was so sweet Code.
    but:
    when i do this to the current page:
    VB Code:
    1. Dim wc As New Net.WebClient
    2.         Dim sSource As String = ""
    3.         Dim reader As New IO.StreamReader(wc.OpenRead("http://www.vbforums.com/showthread.php?t=394163"))
    4.         sSource = reader.ReadToEnd
    5.         Dim Regex As New System.Text.RegularExpressions.Regex("\d*.\d*.\d*.\d*:\d*")
    6.         Dim MyMatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(sSource)
    7.         For Each Match As System.Text.RegularExpressions.Match In MyMatches
    8.             MessageBox.Show(Match.Value) 'shows each IP address
    9.         Next

    i got MsgBoxes for strings like this:

    htt:
    bst:
    ssc:

    =
    so how can i fix this


    Edit: is there a way to check if the string is a Proxy.
    Last edited by _Conan_; Mar 20th, 2006 at 08:33 PM.
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: How to Extract this text from html

    I tried this and it work (not bad) but sure it's not good:

    VB Code:
    1. Dim wc As New Net.WebClient
    2.         Dim sSource As String = ""
    3.         Dim reader As New IO.StreamReader(wc.OpenRead("http://www.vbforums.com/showthread.php?t=394163"))
    4.         sSource = reader.ReadToEnd
    5.         Dim Regex As New System.Text.RegularExpressions.Regex("\d*.\d*.\d*.\d*:\d*")
    6.         Dim MyMatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(sSource)
    7.         For Each Match As System.Text.RegularExpressions.Match In MyMatches
    8.             If Not Match.Value.Length < 15 Then
    9.                 MsgBox(Match.Value)
    10.             End If
    11.         Next

    may any1 help
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: How to Extract this text from html

    Finally i Did it .. May any1 one tell me if i'm true or not

    VB Code:
    1. Dim wc As New Net.WebClient
    2.         Dim sSource As String = ""
    3.         Dim reader As New IO.StreamReader(wc.OpenRead("http://www.vbforums.com/showthread.php?t=394163"))
    4.         sSource = reader.ReadToEnd
    5.         Dim Regex As New System.Text.RegularExpressions.Regex("[0-9]+.[0-9]+.[0-9]+.[0-9]+:[0-9]+")
    6.         Dim MyMatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(sSource)
    7.         For Each Match As System.Text.RegularExpressions.Match In MyMatches
    8.             'If Not Match.Value.Length < 15 Then
    9.             MsgBox(Match.Value)
    10.             'End If
    11.         Next
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  7. #7

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: [RESOLVED] How to Extract this text from html

    LOOL ... when i tried the code with few sites i Return back again with this :


    12.150.244.9:8080
    216.50.61.168:8000
    216.120.43.88:8080
    213.25.170.98:8080
    213.162.13.82:8080
    02/11/27 03:06
    02/11/27 03:07
    02/11/27 03:07
    02/11/27 03:10
    02/11/27 03:14
    02/11/27 03:15
    02/11/27 03:15
    02/11/27 03:16
    02/11/27 03:16
    02/11/27 03:19
    02/11/27 03:21
    02/11/27 03:21
    02/11/27 03:22
    02/11/27 03:23
    02/11/27 03:24
    02/11/27 03:25
    02/11/27 03:26
    02/11/27 03:26
    02/11/27 03:27
    02/11/27 03:35
    02/11/27 03:35
    02/11/27 03:36
    02/11/27 03:37
    02/11/27 03:39
    02/11/27 03:41
    02/11/27 03:43
    03-19-2006 11:50
    02-22-2006 10:18
    03-18-2006 12:04
    03-18-2006 05:13
    03-01-2006_20:04
    16-11-2005_10:27

    ===

    So may any1 help
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  8. #8
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: [Un-RESOLVED] How to Extract this text from html

    My original post did it for my example, but when testing it on this page, I see it didnt work. All that was needed was an escape slash in front of the periods, like below:
    VB Code:
    1. Dim wc As New Net.WebClient
    2.         Dim sSource As String = ""
    3.         Dim reader As New IO.StreamReader(wc.OpenRead("http://www.vbforums.com/showthread.php?t=394163"))
    4.         sSource = reader.ReadToEnd
    5.         Dim Regex As New System.Text.RegularExpressions.Regex("\d*\.\d*\.\d*\.\d*:\d*")
    6.         Dim MyMatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(sSource)
    7.         For Each Match As System.Text.RegularExpressions.Match In MyMatches
    8.            MessageBox.Show(Match.Value)
    9.         Next

  9. #9

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: [Un-RESOLVED] How to Extract this text from html

    Many Thanx Dude

    edit:
    dude is there a way to use a proxy with WebClient
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  10. #10
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: [RESOLVED] How to Extract this text from html

    I actually made a post about the proxy question a while back, and was able to find it

    http://www.vbforums.com/showthread.php?t=378043

    The guy never replied back to see if it actually worked, however...

  11. #11

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: [RESOLVED] How to Extract this text from html

    i tried but it doesnt come,
    i think the only way is moving to vs.net 2005,
    cuz i read somewhere WebClient has Proxy Property.

    Many thanx Gigem i tried to Rate ur post but i got:

    You must spread some Reputation around before giving it to gigemboy again.

    .. Cheer
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  12. #12
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: [RESOLVED] How to Extract this text from html

    Well I think that example gets the proxy settings that are set up for Internet Explorer now that I look at it...

  13. #13

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: [RESOLVED] How to Extract this text from html

    The way i used is to Change the MS internet explorer setting and reset it when the function finished. and it work well .

    But Mr. Gigemboy
    is there a way to Extract only the HyperLinks From The Source of the page
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width