|
-
Jan 16th, 2010, 11:45 AM
#1
Thread Starter
Stack Overflow moderator
Get contents of Word document
How could I get the plain text content of a word document? (It's for indexing all of the words in it, so no need for any formatting.)
-
Jan 16th, 2010, 12:27 PM
#2
Re: Get contents of Word document
Is it safe to assume you've already tried a StreamReader?
CodeBank contributions: Process Manager, Temp File Cleaner
 Originally Posted by SJWhiteley
"game trainer" is the same as calling the act of robbing a bank "wealth redistribution"....
-
Jan 16th, 2010, 12:28 PM
#3
Thread Starter
Stack Overflow moderator
Re: Get contents of Word document
I can't figure the format out. I looked at it; a bunch of FFs, a bunch of 00s, the text, a bunch of 00s, a bunch of meta-data, etc.
Last edited by minitech; Jan 16th, 2010 at 12:33 PM.
-
Jan 16th, 2010, 02:44 PM
#4
Re: Get contents of Word document
Try this:
vb.net Code:
Dim wdApp As New Word.Application Dim wdDoc As New Word.Document wdDoc = wdApp.Documents.Open("C:\Temp\Test.doc") Dim myText As String = wdDoc.Range.Text '<-- this is the simplest way to get all text. Alternatively you may use paragraphs etc. too. wdDoc.Close() wdApp.Quit(SaveChanges:=False)
You would need to add a reference to Microsoft.Office.Interop.Word in your project.
-
Jan 16th, 2010, 02:59 PM
#5
Thread Starter
Stack Overflow moderator
Re: Get contents of Word document
I was hoping not to have to use the Word control... I have to index over 50 thousand files. Thanks, though. I'll use it. 
Does anyone have another way?
-
Jan 16th, 2010, 03:11 PM
#6
Re: Get contents of Word document
 Originally Posted by minitech
I was hoping not to have to use the Word control... I have to index over 50 thousand files. Thanks, though. I'll use it.
Does anyone have another way?
You would need to use either Word or some third party control capable of reading word files.
When you need to open so many files, don't open word application for each file. Do it only once. That would be quite fast. Hardly 15 minutes or so.
1. Open Word.Application only once.
2. Open word document read text and close it.
3. Repeat for all files.
4. Close the Word Application.
-
Jan 16th, 2010, 03:15 PM
#7
Thread Starter
Stack Overflow moderator
Re: Get contents of Word document
I know. It just usually takes 5 mins., so it's kind of a big fall.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|