Results 1 to 8 of 8

Thread: getting html through winsock

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2005
    Posts
    1,168

    getting html through winsock

    I am trying to use winsock to get an html page. I'm able to get something with the code below, but i'm receiving header information as well. How do I do this properly.

    Code:
       'ws (winsock object) is declared and instantiated outside as a global
            Dim bt() As Byte 
            bt = System.Text.ASCIIEncoding.ASCII.GetBytes("GET / HTTP/1.1" & Chr(10) & Chr(13) & Chr(10) & Chr(13))
            Dim x As Integer = 1
    
            ws.SendData(bt)
    Note: I know other ways to get webpages, I just want to learn how to do it using winsock.
    Last edited by benmartin101; Mar 6th, 2007 at 05:51 PM.

  2. #2
    Super Moderator manavo11's Avatar
    Join Date
    Nov 2002
    Location
    Around the corner from si_the_geek
    Posts
    7,171

    Re: getting html through winsock

    By saying headers, you mean stylesheets and the <head> tag? Do you get only that or that as well as all the HTML? Shouldn't you get that instead of just the <body> tag?


    Has someone helped you? Then you can Rate their helpful post.

  3. #3
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: getting html through winsock

    You'll always get at least a few headers. You just need to trim them off (making use of the important ones as you go).

  4. #4

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2005
    Posts
    1,168

    Re: getting html through winsock

    They look like headers. I think they're headers and the body is there as well.. I'll try just removing them and see how it goes. Thanks.

  5. #5
    Super Moderator manavo11's Avatar
    Join Date
    Nov 2002
    Location
    Around the corner from si_the_geek
    Posts
    7,171

    Re: getting html through winsock

    Can you post an example of the data you receive?


    Has someone helped you? Then you can Rate their helpful post.

  6. #6
    Frenzied Member
    Join Date
    Aug 2000
    Location
    O!
    Posts
    1,177

    Re: getting html through winsock

    Quote Originally Posted by benmartin101
    They look like headers. I think they're headers and the body is there as well.. I'll try just removing them and see how it goes. Thanks.
    Is this what you are looking for?

    If getHTML is a string containing everything returned by a POST or GET then
    Code:
      Str = Mid$(getHTML, InStr(getHTML, "<html"))
    will strip the header info.

    You will of course have to take into consideration the fact that "html" may be capitalized. And if you want other info such as the DOCTYPE then the InStr code will have to change accordingly.

  7. #7
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: getting html through winsock

    You'll also have to consider that while technically "improper" some servers will return garbage after the end of the valid content. These servers generally expect the user agent (your HTTP client) to respect the Content-Length or Transfer-Encoding header.

    In the most general terms you can't rely on <HTML> and </HTML> as delimiters either. A text, image, css, script, etc. file won't have these, and as suggested HTML data can have prefixes and even suffixes outside the page markup itself.

    In the end this is why rolling your own code to handle HTTP requests is usually a waste of time. The effort to create code that works better then "doesn't crash, most of the time" just isn't worth it.


    Sometimes of course the effort is useful in expanding your knowledge.

    http://www.ietf.org/rfc/rfc2616.txt
    Last edited by dilettante; Mar 9th, 2007 at 05:41 PM.

  8. #8
    Lively Member Nerd-Man's Avatar
    Join Date
    Dec 2006
    Location
    India
    Posts
    119

    Re: getting html through winsock

    i did that before to grab html page using winsock. below is my code...

    Code:
    Private Sub Winsock1_Connect()
     Dim Chunks As String
     Chunks = "GET /index.html" & " HTTP/1.1" & vbCrLf
     Chunks = Chunks & "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*" & vbCrLf
     Chunks = Chunks & "Accept -language: en -us" & vbCrLf
     Chunks = Chunks & "Accept -encoding: gzip , deflate" & vbCrLf
     Chunks = Chunks & "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" & vbCrLf
     Chunks = Chunks & "Host: " & Server.Text & vbCrLf
     Chunks = Chunks & "Connection: Keep -Alive" & vbCrLf & vbCrLf
     Winsock1.SendData (Chunks)
    End Sub
    
    Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
     Dim Data As String
     Winsock1.GetData Data, vbString, bytesTotal
     HTMLSource.Text = Data
    End Sub
    i think that is how i get the source of a html page using winsock.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width