Hello,
I am using internetreadfile api to download the code of a webpage. I have set the buffer size to 1024. I am writing the each downloaded line to a textfile. Problem is the result is not the same for all executions of the program. Some times it is working fine. Sometimes, some additional characters are being added. Some times a part of the line is being printed to the file more than once. Is this expected or am I doing something wrong here?
Thank you.
Last edited by srisa; May 25th, 2006 at 10:00 AM.
Reason: marking the thread resolved
Is there a reason why you aren't using the Inet Control? It's a lot more reliable.
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
If I am designing the application for windows : If windows is the OS being used , we can expect Internet Explorer to be used. When IE is used wininet.dll will be present on the user's system. If the application is downloaded from net, using api's will reduce the size of the application.
Sounds pretty hi-fi.
Well, real answer is I am learning VB. As part of it I am trying to use api's. Internetreadfile makes the entry here.
The function below should download the entire file to a file of your choice, not matter what the buffer size you choose.
VB Code:
Option Explicit
Const INTERNET_OPEN_TYPE_DIRECT = 1
Const INTERNET_OPEN_TYPE_PROXY = 3
Const INTERNET_FLAG_RELOAD = &H80000000
Private Declare Function InternetOpen Lib "wininet" Alias "InternetOpenA" (ByVal sAgent As String, ByVal lAccessType As Long, ByVal sProxyName As String, ByVal sProxyBypass As String, ByVal lFlags As Long) As Long
Private Declare Function InternetCloseHandle Lib "wininet" (ByVal hInet As Long) As Integer
Private Declare Function InternetReadFile Lib "wininet" (ByVal hFile As Long, ByVal sBuffer As String, ByVal lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
Private Declare Function InternetOpenUrl Lib "wininet" Alias "InternetOpenUrlA" (ByVal hInternetSession As Long, ByVal lpszUrl As String, ByVal lpszHeaders As String, ByVal dwHeadersLength As Long, ByVal dwFlags As Long, ByVal dwContext As Long) As Long
Private Sub GetFile(sourcefile As String, destfile As String, buffersize As Long)
Dim hOpen As Long, hFile As Long, sBuffer As String, Ret As Long, oldbuffer As String
'Create a buffer for the file we're going to download
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Thank you.The function as such is working fine. But , how do I check for errors during download like connection timeout, network disconnection or such things. During one of the trial runs, session got timed out, result being incomplete file download.
Erm...You could use the checkinternetconnection function. I'll check when i get back home, and post the fix.
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Private Declare Function InternetCheckConnection Lib "wininet.dll" Alias "InternetCheckConnectionA" (ByVal lpszUrl As String, ByVal dwFlags As Long, ByVal dwReserved As Long) As Long
Private Declare Function InternetOpen Lib "wininet" Alias "InternetOpenA" (ByVal sAgent As String, ByVal lAccessType As Long, ByVal sProxyName As String, ByVal sProxyBypass As String, ByVal lFlags As Long) As Long
Private Declare Function InternetCloseHandle Lib "wininet" (ByVal hInet As Long) As Integer
Private Declare Function InternetReadFile Lib "wininet" (ByVal hFile As Long, ByVal sBuffer As String, ByVal lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
Private Declare Function InternetOpenUrl Lib "wininet" Alias "InternetOpenUrlA" (ByVal hInternetSession As Long, ByVal lpszUrl As String, ByVal lpszHeaders As String, ByVal dwHeadersLength As Long, ByVal dwFlags As Long, ByVal dwContext As Long) As Long
Private Sub GetFile(sourcefile As String, destfile As String, buffersize As Long)
Dim hOpen As Long, hFile As Long, sBuffer As String, Ret As Long, oldbuffer As String
'Create a buffer for the file we're going to download
If InternetCheckConnection(sourcefile, FLAG_ICC_FORCE_CONNECTION, 0) = False Then
Do Until InternetCheckConnection(sourcefile, FLAG_ICC_FORCE_CONNECTION, 0) = True
Loop
End If
oldbuffer = sBuffer
InternetReadFile hFile, sBuffer, buffersize, Ret
If sBuffer = oldbuffer Then GoTo cleanup
Print #1, sBuffer
Loop
cleanup:
InternetCloseHandle hFile
InternetCloseHandle hOpen
Close
End Sub
Right then, that should, if it can't connect to the desired page, for whatever reason, it will hang until it can. Then it will continue download.
Hope thats what you're looking for.
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Private Declare Function InternetCheckConnection Lib "wininet.dll" Alias "InternetCheckConnectionA" (ByVal lpszUrl As String, ByVal dwFlags As Long, ByVal dwReserved As Long) As Long
Private Declare Function InternetOpen Lib "wininet" Alias "InternetOpenA" (ByVal sAgent As String, ByVal lAccessType As Long, ByVal sProxyName As String, ByVal sProxyBypass As String, ByVal lFlags As Long) As Long
Private Declare Function InternetCloseHandle Lib "wininet" (ByVal hInet As Long) As Integer
Private Declare Function InternetReadFile Lib "wininet" (ByVal hFile As Long, ByVal sBuffer As String, ByVal lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
Private Declare Function InternetOpenUrl Lib "wininet" Alias "InternetOpenUrlA" (ByVal hInternetSession As Long, ByVal lpszUrl As String, ByVal lpszHeaders As String, ByVal dwHeadersLength As Long, ByVal dwFlags As Long, ByVal dwContext As Long) As Long
Public timeoutcount as long
Private Sub GetFile(sourcefile As String, destfile As String, buffersize As Long, timeoutlevel as long)
Dim hOpen As Long, hFile As Long, sBuffer As String, Ret As Long, oldbuffer As String
'Create a buffer for the file we're going to download
If InternetCheckConnection(sourcefile, FLAG_ICC_FORCE_CONNECTION, 0) = False Then
timeoutcount = 0
timeouttimer.enabled = True
timeouttimer.interval = 100
Do Until InternetCheckConnection(sourcefile, FLAG_ICC_FORCE_CONNECTION, 0) = True
Doevents
If timeoutcount = timeoutlevel then
msgbox("Connection Timed Out")
timeouttimer.enabled = False
Exit Sub
End If
Loop
timeouttimer.enabled = False
End If
oldbuffer = sBuffer
InternetReadFile hFile, sBuffer, buffersize, Ret
If sBuffer = oldbuffer Then GoTo cleanup
Print #1, sBuffer
Loop
cleanup:
InternetCloseHandle hFile
InternetCloseHandle hOpen
Close
End Sub
Sub timeouttimer_Timer()
timeoutcount = timeoutcount + 1
End sub
That should, if it has to wait a set length of time for a connection, popup a timeout message and exit sub. You have to make a timer and call it timeouttimer, though.
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Hmm, though i still think i prefer inet, at least it always pauses your code until it's done
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Well, before using internetopen, I am checking the connection using internetcheckconnection. If connection could be established start downloading , otherwise no. To check if the entire file is downloaded, I am storing the entire file in a string variable and using the instr function to see if </html> tag is present. If it is present then entire file has been downloaded otherwise no. Is this approach ok or am I overlooking or missing something here? And , thanks for the interest. Earlier I have posted three or four times in api section but this is the first time more than one person has taken time to post.
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Time now for the specifics. I came across a project : at music.yahoo.com site , there will be list of videos alphabetically separated. What he wants is a list of all the videos listed there in this format : artistname | video name | a number. Number : when we hover the mouse over the video name, in the status bar, javascript : playvideo(23457181) , some such number will be displayed. This is the number that is needed. Artist name and videoname are hyperlinks. What I am doing is downloading the entire file and searching for these strings
search1 = "http://music.yahoo.com/ar-"
search2 = "javascript : playVideos"
and trying to retrieve the values. Is this the right approach? If you want I will upload the project that I have written.
Last edited by srisa; Apr 27th, 2006 at 10:03 AM.
Reason: typo
meaning you want to retrieve whatever comes after that value, until...
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
This is what the a tag for artist looks like , from this the value between the greater than and less than signs ( inner text ) is what I retrieve :
<a href="http://music.yahoo.com/ar-298276---The-International-Noise-Conspiracy" title="The (International) Noise Conspiracy">The (International) Noise Conspiracy</a>.
This is what it looks like for javascript thing:
<a href="javascriptlayVideos(2153097)" class="listheader" title="Reproduction Of Death">Reproduction Of Death</a>
The value between parantheses gives the number and the value between > and < signs ( inner text again ) gives the album title. I am instr function to retrieve the positions of the symbols and then mid$ function to get the value I want.
Last edited by srisa; Apr 28th, 2006 at 07:22 AM.
Reason: typos again
<a href="http://music.yahoo.com/ar-298276---The-International-Noise-Conspiracy" title="The (International) Noise Conspiracy">The (International) Noise Conspiracy</a>.
This is what it looks like for javascript thing:
<a href="javascriptlayVideos(2153097)" class="listheader" title="Reproduction Of Death">Reproduction Of Death</a>
If the name of album/song is the same for both, are the bold numbers both the same.
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
The number after ar is the artist number which is being used by the programmer for easy reference to the artist. First value will be the artist name. Value between parantheses is the number, last value will be the title. This string http://music.yahoo.com/ar is common for all the hyperlinks refering to the artists. So I am searching for that text to get the artist's name. The number is parantheses might be some sort of reference to the video in the database or something like that.
Sorry for going into hibernation for so long. Extracting the string part is ok. But the problem is the downloaded content is not coherent. The code is not in order and some of the lines appear two times etc. I changed the extension of the file to html and opened it in IE. The page was not similar to the page downloaded.There were some errors.
May be using internetreadfile is complicated. Is there any way to know if the downloaded file is in order.
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Yes . Faster download and to see if any part of code is not necessary. And if you have patience to start all over again with internetreadfile: Why is it that it gets the data with duplications and other distortions?
Sorry, but I have no idea why the downloaded content wouldn't match the original source... It shouldn't since it just downloads the html code from what I understand, right? So after that, opening the downloaded data in a browser it should be identical?
As for speed, the only restriction should be your internet connection. There is no way to speed that up at all...
Has someone helped you? Then you can Rate their helpful post.
I am attaching the zipped project. In that google.com homepage is downloaded using both the internetreadfile and inet. Result from each will be written a separate file with names usingapi.html and usinginet.html.
When I open them in IE. Result from Inet is more coherent and nearer to the actual page when compared with api one. Size of the file is small so there aren't many discrepancies. As the filesize grows so do the distortions.
Please have a look at the project and let me know if I am doing something wrong with internetreadile method.
true, but that doesn't explain the discrepancies between api and inet
KAZAR
The Law Of Programming:
As the Number of Lines of code increases, the number of bugs generated by fixing a bug increases exponentially.
__________________________________ www.startingqbasic.co.uk
Ok, we don't seem to be making much headway with this thread. I will mark it , "resolved with inet". If I come to know why internetreadfile is giving problems , I will update this thread.
I have to admit that I have been lazy. Using google I found this bit of code, which looks like vbscript. I made few changes so that it confirms to vb.
Here is the code
VB Code:
Dim hopen As Long, hfile As Long, bytesread As Long