Results 1 to 7 of 7

Thread: Get contents of Word document

  1. #1

    Thread Starter
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Get contents of Word document

    How could I get the plain text content of a word document? (It's for indexing all of the words in it, so no need for any formatting.)

  2. #2
    Wait... what? weirddemon's Avatar
    Join Date
    Jan 2009
    Location
    USA
    Posts
    3,826

    Re: Get contents of Word document

    Is it safe to assume you've already tried a StreamReader?
    CodeBank contributions: Process Manager, Temp File Cleaner

    Quote Originally Posted by SJWhiteley
    "game trainer" is the same as calling the act of robbing a bank "wealth redistribution"....

  3. #3

    Thread Starter
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Get contents of Word document

    I can't figure the format out. I looked at it; a bunch of FFs, a bunch of 00s, the text, a bunch of 00s, a bunch of meta-data, etc.
    Last edited by minitech; Jan 16th, 2010 at 12:33 PM.

  4. #4
    VB Addict Pradeep1210's Avatar
    Join Date
    Apr 2004
    Location
    Inside the CPU...
    Posts
    6,614

    Re: Get contents of Word document

    Try this:
    vb.net Code:
    1. Dim wdApp As New Word.Application
    2. Dim wdDoc As New Word.Document
    3. wdDoc = wdApp.Documents.Open("C:\Temp\Test.doc")
    4.  
    5. Dim myText As String = wdDoc.Range.Text   '<-- this is the simplest way to get all text. Alternatively you may use paragraphs etc. too.
    6.  
    7. wdDoc.Close()
    8. wdApp.Quit(SaveChanges:=False)
    You would need to add a reference to Microsoft.Office.Interop.Word in your project.
    Pradeep, Microsoft MVP (Visual Basic)
    Please appreciate posts that have helped you by clicking icon on the left of the post.
    "A problem well stated is a problem half solved." — Charles F. Kettering

    Read articles on My Blog101 LINQ SamplesJSON ValidatorXML Schema Validator"How Do I" videos on MSDNVB.NET and C# ComparisonGood Coding PracticesVBForums Reputation SaverString EnumSuper Simple Tetris Game


    (2010-2013)
    NB: I do not answer coding questions via PM. If you want my help, then make a post and PM me it's link. If I can help, trust me I will...

  5. #5

    Thread Starter
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Get contents of Word document

    I was hoping not to have to use the Word control... I have to index over 50 thousand files. Thanks, though. I'll use it.

    Does anyone have another way?

  6. #6
    VB Addict Pradeep1210's Avatar
    Join Date
    Apr 2004
    Location
    Inside the CPU...
    Posts
    6,614

    Re: Get contents of Word document

    Quote Originally Posted by minitech View Post
    I was hoping not to have to use the Word control... I have to index over 50 thousand files. Thanks, though. I'll use it.

    Does anyone have another way?
    You would need to use either Word or some third party control capable of reading word files.

    When you need to open so many files, don't open word application for each file. Do it only once. That would be quite fast. Hardly 15 minutes or so.

    1. Open Word.Application only once.
    2. Open word document read text and close it.
    3. Repeat for all files.
    4. Close the Word Application.
    Pradeep, Microsoft MVP (Visual Basic)
    Please appreciate posts that have helped you by clicking icon on the left of the post.
    "A problem well stated is a problem half solved." — Charles F. Kettering

    Read articles on My Blog101 LINQ SamplesJSON ValidatorXML Schema Validator"How Do I" videos on MSDNVB.NET and C# ComparisonGood Coding PracticesVBForums Reputation SaverString EnumSuper Simple Tetris Game


    (2010-2013)
    NB: I do not answer coding questions via PM. If you want my help, then make a post and PM me it's link. If I can help, trust me I will...

  7. #7

    Thread Starter
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Get contents of Word document

    I know. It just usually takes 5 mins., so it's kind of a big fall.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width