|
-
Mar 6th, 2007, 05:48 PM
#1
Thread Starter
Frenzied Member
getting html through winsock
I am trying to use winsock to get an html page. I'm able to get something with the code below, but i'm receiving header information as well. How do I do this properly.
Code:
'ws (winsock object) is declared and instantiated outside as a global
Dim bt() As Byte
bt = System.Text.ASCIIEncoding.ASCII.GetBytes("GET / HTTP/1.1" & Chr(10) & Chr(13) & Chr(10) & Chr(13))
Dim x As Integer = 1
ws.SendData(bt)
Note: I know other ways to get webpages, I just want to learn how to do it using winsock.
Last edited by benmartin101; Mar 6th, 2007 at 05:51 PM.
-
Mar 6th, 2007, 05:53 PM
#2
-
Mar 6th, 2007, 06:47 PM
#3
Re: getting html through winsock
You'll always get at least a few headers. You just need to trim them off (making use of the important ones as you go).
-
Mar 7th, 2007, 06:31 PM
#4
Thread Starter
Frenzied Member
Re: getting html through winsock
They look like headers. I think they're headers and the body is there as well.. I'll try just removing them and see how it goes. Thanks.
-
Mar 7th, 2007, 09:08 PM
#5
-
Mar 8th, 2007, 05:49 PM
#6
Frenzied Member
Re: getting html through winsock
 Originally Posted by benmartin101
They look like headers. I think they're headers and the body is there as well.. I'll try just removing them and see how it goes. Thanks.
Is this what you are looking for?
If getHTML is a string containing everything returned by a POST or GET then
Code:
Str = Mid$(getHTML, InStr(getHTML, "<html"))
will strip the header info.
You will of course have to take into consideration the fact that "html" may be capitalized. And if you want other info such as the DOCTYPE then the InStr code will have to change accordingly.
-
Mar 9th, 2007, 05:37 PM
#7
Re: getting html through winsock
You'll also have to consider that while technically "improper" some servers will return garbage after the end of the valid content. These servers generally expect the user agent (your HTTP client) to respect the Content-Length or Transfer-Encoding header.
In the most general terms you can't rely on <HTML> and </HTML> as delimiters either. A text, image, css, script, etc. file won't have these, and as suggested HTML data can have prefixes and even suffixes outside the page markup itself.
In the end this is why rolling your own code to handle HTTP requests is usually a waste of time. The effort to create code that works better then "doesn't crash, most of the time" just isn't worth it.
Sometimes of course the effort is useful in expanding your knowledge.
http://www.ietf.org/rfc/rfc2616.txt
Last edited by dilettante; Mar 9th, 2007 at 05:41 PM.
-
Mar 10th, 2007, 09:54 AM
#8
Lively Member
Re: getting html through winsock
i did that before to grab html page using winsock. below is my code...
Code:
Private Sub Winsock1_Connect()
Dim Chunks As String
Chunks = "GET /index.html" & " HTTP/1.1" & vbCrLf
Chunks = Chunks & "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*" & vbCrLf
Chunks = Chunks & "Accept -language: en -us" & vbCrLf
Chunks = Chunks & "Accept -encoding: gzip , deflate" & vbCrLf
Chunks = Chunks & "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" & vbCrLf
Chunks = Chunks & "Host: " & Server.Text & vbCrLf
Chunks = Chunks & "Connection: Keep -Alive" & vbCrLf & vbCrLf
Winsock1.SendData (Chunks)
End Sub
Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
Dim Data As String
Winsock1.GetData Data, vbString, bytesTotal
HTMLSource.Text = Data
End Sub
i think that is how i get the source of a html page using winsock.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|