retrieving google documents contents

**victorb17** · May 3rd, 2013, 09:33 AM

I am trying to extract the contents of a google drive file. It is a google document file. I have the code to pull the entire document, including metadata, etc. I saw the google api page concerning this: https://developers.google.com/drive/...m_google_drive

I am having trouble being able to parse the body out of what I am getting from my code. Has anyone done this? thanks, Victor.

Code:

Dim request As HttpWebRequest = DirectCast(WebRequest.Create("insert google doc url here"), HttpWebRequest)

                Dim response As HttpWebResponse = DirectCast(request.GetResponse(), HttpWebResponse)

                Dim Stream As Stream = response.GetResponseStream()

                Dim reader As StreamReader = New StreamReader(Stream)

                MsgBox(reader.ReadToEnd)

**jayinthe813** · May 4th, 2013, 12:54 AM

What do you mean parse the body out of? You mean read the content? It seems per documentation:

Code:

 HttpWebRequest request = (HttpWebRequest)WebRequest.Create(new Uri(downloadUrl));
        auth.ApplyAuthenticationToRequest(request);

        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        System.IO.Stream stream = response.GetResponseStream();
        StreamReader reader = new StreamReader(stream);
        return reader.ReadToEnd(); // this returns the string

What is the problem you are getting or exceptions thrown? What does reader.readtoend() return for you?

**victorb17** · May 6th, 2013, 12:37 PM

It returns huge amounts of data. You can find the body contents if you search the document. to clarify: I put the 'reader' into a textbox then moved it over to microsoft word for easier viewing. I managed to see the document title and further down I found the contents or 'body'. how ever it was no readable because it looked something like: ~3()}$example title()#}}P(#/{}{3$o}example body text,.#*{32}{4uh}tero(#{}E}{}teno!){}.

To give you an idea of how much extra there is, I had one word in the body and the file returns 19 pages in word.

I am not sure how to parse the body from the rest. It seems impossible unless google can do it on their side and send you just the actual contents with no metadata stuff.

**dunfiddlin** · May 6th, 2013, 12:48 PM

It's a Google document so it has to be read in Google's software. You would get a very similar result if you read a Word document or indeed the much simpler Rich Text Format as plain text. I would have thought that was obvious. If you want plain text then you need to either save the original document in that form at the server or simply copy and paste from the document in situ.

Thread: retrieving google documents contents

Thread Tools

Display

retrieving google documents contents

Re: retrieving google documents contents

Re: retrieving google documents contents

Re: retrieving google documents contents

Posting Permissions