Results 1 to 6 of 6

Thread: Read more the page source with StreamReader?

  1. #1

    Thread Starter
    Hyperactive Member Frabulator's Avatar
    Join Date
    Jan 2015
    Posts
    311

    Read more the page source with StreamReader?

    I have an application that reads google drive page source from a folder URL. Using this it gathers information about the contents of that folder. However, I have ran into an issue. I now am in need of a folder read that has many files in it, but the program doesnt read all of the files. I noticed that it stops where the limit buffer is for google drive (something like the 50's file or so). There are more files in there, but I dont know how to get them to register int he source code for reading.

    This is the code I am currently using. I have tried to increase the buffer to an insane amount on my streamreader just to test it, but it results in the same effect. Any suggestions?


    Code:
                Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
                Dim response As System.Net.HttpWebResponse = request.GetResponse()
                Dim bf As Integer = 40000000
                Dim s As StreamReader = New StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII, False, bf)
                Dim sourcecode As String = s.ReadToEnd
    Last edited by Frabulator; Oct 24th, 2020 at 01:10 AM.
    Oops, There it goes. Yep... my brain stopped...
    _________________________________

  2. #2
    Frenzied Member PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Newport, UK
    Posts
    1,279

    Re: Read more the page source with StreamReader?

    Quote Originally Posted by Frabulator View Post
    I have an application that reads google drive page source from a folder URL. Using this it gathers information about the contents of that folder. However, I have ran into an issue. I now am in need of a folder read that has many files in it, but the program doesnt read all of the files. I noticed that it stops where the limit buffer is for google drive (something like the 50's file or so). There are more files in there, but I dont know how to get them to register int he source code for reading.

    This is the code I am currently using. I have tried to increase the buffer to an insane amount on my streamreader just to test it, but it results in the same effect. Any suggestions?


    Code:
                Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
                Dim response As System.Net.HttpWebResponse = request.GetResponse()
                Dim bf As Integer = 40000000
                Dim s As StreamReader = New StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII, False, bf)
                Dim sourcecode As String = s.ReadToEnd
    You might be better using the provided API https://developers.google.com/drive rather than trying to scrape HTML.

  3. #3

    Thread Starter
    Hyperactive Member Frabulator's Avatar
    Join Date
    Jan 2015
    Posts
    311

    Re: Read more the page source with StreamReader?

    Quote Originally Posted by PlausiblyDamp View Post
    You might be better using the provided API https://developers.google.com/drive rather than trying to scrape HTML.
    The issue I am running into with use the API is it ask users to login, authorize the program and then go through steps to verify. That is a lot of extra steps that are not needed, or at least I feel.

    If there is a way to complete circumnavigate this, then I would be down for using the API.

    Unless I am completely missing something and this login stuff is just a one-time shot that only I see and not the end user.
    Last edited by Frabulator; Oct 24th, 2020 at 08:51 AM.
    Oops, There it goes. Yep... my brain stopped...
    _________________________________

  4. #4
    Frenzied Member PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Newport, UK
    Posts
    1,279

    Re: Read more the page source with StreamReader?

    Quote Originally Posted by Frabulator View Post
    The issue I am running into with use the API is it ask users to login, authorize the program and then go through steps to verify. That is a lot of extra steps that are not needed, or at least I feel.

    If there is a way to complete circumnavigate this, then I would be down for using the API.

    Unless I am completely missing something and this login stuff is just a one-time shot that only I see and not the end user.
    Normally those steps would only be required the first time the application is run, after that it shouldn't need authenticating again. In the long run the API is going to be a lot more of a reliable solution and it isn't going to break if Google Drive's design changes either.

  5. #5
    Frenzied Member
    Join Date
    Nov 2017
    Posts
    1,230

    Re: Read more the page source with StreamReader?

    Relying on web scraping reminds me of Dwight's directions to Schrute Farms from The Office:

    "156 paces from the light red mailbox, make a left. Walk until you hear the beehive."

    You are relying on things not changing, things that you have zero control over.

  6. #6

    Thread Starter
    Hyperactive Member Frabulator's Avatar
    Join Date
    Jan 2015
    Posts
    311

    Re: Read more the page source with StreamReader?

    Thank you all very much! I am looking into the Google API. I agree with everyone saying that I am fighting a losing battle because things change, however I am still wondering if there is away to increase the buffer limit in the page source?
    Oops, There it goes. Yep... my brain stopped...
    _________________________________

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width