Results 1 to 8 of 8

Thread: [RESOLVED] [2005] Reading Webpages

  1. #1

    Thread Starter
    Fanatic Member Jumpercables's Avatar
    Join Date
    Jul 2005
    Location
    Colorado
    Posts
    592

    Resolved [RESOLVED] [2005] Reading Webpages

    Is there anyway given a website url can I read the webpage source (ie. HTML, etc code). And report back how many times each word was used?

    C# - .NET 1.1 / .NET 2.0

    "Take everything I say with a grain of salt, sometimes I'm right, sometimes I'm wrong but in the end we've both learned something."
    _____________________
    Regular Expressions Library
    Connection String
    API Functions
    Database FAQ & Tutorial

  2. #2
    Hyperactive Member
    Join Date
    Oct 2005
    Posts
    257

    Re: [2005] Reading Webpages

    not entirely sure about the downloading the page part but you should be able to open the html page once it's downloaded and read the whole thing in with a streamreader and then use ummmmm i think instring or one of those types of functions to test for how many times the word occurs.
    I tried to end process on Visual Studio 2005
    but PETA stopped me saying it's smart enough
    to be a living creature

  3. #3
    "The" RedHeadedLefty
    Join Date
    Aug 2005
    Location
    College Station, TX Preferred Nickname: Gig Current Mood: Just Peachy Turnons: String Manipulation
    Posts
    4,495

    Re: [2005] Reading Webpages

    A combination of reading the html into a string and then using a regex expression to return the matches of the text you want to find. I had replied a while back on a similar question, and both can be found on this thread:

    http://www.vbforums.com/showthread.php?t=367537

    ***EDIT - you are probably better off using a regex match collection (not shown in that thread), then all you need to do to get the number of instances of the word you found was get the count of matches in the matchcollection..

    An example of using a matchcollection: http://www.vbforums.com/showthread.php?t=379735

  4. #4
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,020

    Re: [2005] Reading Webpages

    when I did something like this a while back I downloaded the whole html file, read it into a string and then deleted the file Might be a dodgy solution but it worked and it was surprisingly fast too.
    Let me know if youd like to know more.
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  5. #5

    Thread Starter
    Fanatic Member Jumpercables's Avatar
    Join Date
    Jul 2005
    Location
    Colorado
    Posts
    592

    Re: [2005] Reading Webpages

    I see, thanks for the advice, which now leads me to the next question is how can I download the html file from the website ?

    C# - .NET 1.1 / .NET 2.0

    "Take everything I say with a grain of salt, sometimes I'm right, sometimes I'm wrong but in the end we've both learned something."
    _____________________
    Regular Expressions Library
    Connection String
    API Functions
    Database FAQ & Tutorial

  6. #6
    Frenzied Member the182guy's Avatar
    Join Date
    Nov 2005
    Location
    Cheshire, UK
    Posts
    1,473

    Re: [2005] Reading Webpages

    do it the way web browsers do it, have a look at the TCPClient socket class, you just need to connect the socket to the website domain e.g google.com, port 80 for the HTTP request, once connected, send
    Code:
    "GET /folder/file.html HTTP/1.1" & vbcrlf & vbcrlf
    the server will return the HTML along with a HTTP header, so you need to cut off the header, and you will have the pure HTML just as a web browser would
    Chris

  7. #7
    "The" RedHeadedLefty
    Join Date
    Aug 2005
    Location
    College Station, TX Preferred Nickname: Gig Current Mood: Just Peachy Turnons: String Manipulation
    Posts
    4,495

    Re: [2005] Reading Webpages

    Quote Originally Posted by Jumpercables
    I see, thanks for the advice, which now leads me to the next question is how can I download the html file from the website ?
    Did you ignore my post? There is a searchpage function in the first link that reads the html into a string.... the second link is an example of using regex on a string to return matches...

  8. #8

    Thread Starter
    Fanatic Member Jumpercables's Avatar
    Join Date
    Jul 2005
    Location
    Colorado
    Posts
    592

    Re: [2005] Reading Webpages

    I must have over looked it sorry gigemboy - Thanks! I will get started.

    C# - .NET 1.1 / .NET 2.0

    "Take everything I say with a grain of salt, sometimes I'm right, sometimes I'm wrong but in the end we've both learned something."
    _____________________
    Regular Expressions Library
    Connection String
    API Functions
    Database FAQ & Tutorial

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width