-
Oct 23rd, 2020, 09:51 PM
#1
Thread Starter
Hyperactive Member
Read more the page source with StreamReader?
I have an application that reads google drive page source from a folder URL. Using this it gathers information about the contents of that folder. However, I have ran into an issue. I now am in need of a folder read that has many files in it, but the program doesnt read all of the files. I noticed that it stops where the limit buffer is for google drive (something like the 50's file or so). There are more files in there, but I dont know how to get them to register int he source code for reading.
This is the code I am currently using. I have tried to increase the buffer to an insane amount on my streamreader just to test it, but it results in the same effect. Any suggestions?
Code:
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim bf As Integer = 40000000
Dim s As StreamReader = New StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII, False, bf)
Dim sourcecode As String = s.ReadToEnd
Last edited by Frabulator; Oct 24th, 2020 at 01:10 AM.
-
Oct 24th, 2020, 03:56 AM
#2
Re: Read more the page source with StreamReader?
Originally Posted by Frabulator
I have an application that reads google drive page source from a folder URL. Using this it gathers information about the contents of that folder. However, I have ran into an issue. I now am in need of a folder read that has many files in it, but the program doesnt read all of the files. I noticed that it stops where the limit buffer is for google drive (something like the 50's file or so). There are more files in there, but I dont know how to get them to register int he source code for reading.
This is the code I am currently using. I have tried to increase the buffer to an insane amount on my streamreader just to test it, but it results in the same effect. Any suggestions?
Code:
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim bf As Integer = 40000000
Dim s As StreamReader = New StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII, False, bf)
Dim sourcecode As String = s.ReadToEnd
You might be better using the provided API https://developers.google.com/drive rather than trying to scrape HTML.
-
Oct 24th, 2020, 08:05 AM
#3
Thread Starter
Hyperactive Member
Re: Read more the page source with StreamReader?
Originally Posted by PlausiblyDamp
The issue I am running into with use the API is it ask users to login, authorize the program and then go through steps to verify. That is a lot of extra steps that are not needed, or at least I feel.
If there is a way to complete circumnavigate this, then I would be down for using the API.
Unless I am completely missing something and this login stuff is just a one-time shot that only I see and not the end user.
Last edited by Frabulator; Oct 24th, 2020 at 08:51 AM.
-
Oct 24th, 2020, 09:42 AM
#4
Re: Read more the page source with StreamReader?
Originally Posted by Frabulator
The issue I am running into with use the API is it ask users to login, authorize the program and then go through steps to verify. That is a lot of extra steps that are not needed, or at least I feel.
If there is a way to complete circumnavigate this, then I would be down for using the API.
Unless I am completely missing something and this login stuff is just a one-time shot that only I see and not the end user.
Normally those steps would only be required the first time the application is run, after that it shouldn't need authenticating again. In the long run the API is going to be a lot more of a reliable solution and it isn't going to break if Google Drive's design changes either.
-
Oct 24th, 2020, 11:49 AM
#5
Re: Read more the page source with StreamReader?
Relying on web scraping reminds me of Dwight's directions to Schrute Farms from The Office:
"156 paces from the light red mailbox, make a left. Walk until you hear the beehive."
You are relying on things not changing, things that you have zero control over.
-
Oct 25th, 2020, 11:25 AM
#6
Thread Starter
Hyperactive Member
Re: Read more the page source with StreamReader?
Thank you all very much! I am looking into the Google API. I agree with everyone saying that I am fighting a losing battle because things change, however I am still wondering if there is away to increase the buffer limit in the page source?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|