-
Feb 20th, 2012, 07:15 AM
#1
Thread Starter
Junior Member
Parsing Word Document & Export to Excel
Hi...
I'm a third year software design student and I'm currently on my work placement. I have never used VB before, but my research has led me to believe that this is the best method...I have been given the task of programatically parsing hundreds of word (2003) documents for information stored in certain tables. The documents comprise of paragraphs and tables. I need to extract the info in preferred tables and export it to excel...Can anyone help please. I know it can be done but I've been researching this for a week now and I really am pulling my hair out at this stage. If anyone can help or point me in the right direction or even link a solid tutorial I would be so, so grateful.
Kind regards,
DJ-DOO
-
Feb 20th, 2012, 10:18 AM
#2
Re: Parsing Word Document & Export to Excel
JG
... If your problem is fixed don't forget to mark your threads as resolved using the Thread Tools menu ...
-
Feb 20th, 2012, 03:39 PM
#3
Re: Parsing Word Document & Export to Excel
how do you define which tables in each document contain information you want?
vb Code:
for each t in doc.tables ' where doc is a document object if t.cell(1, 1).range.text = mystring then ' or some other criteria 'do stuff here end if next
are you writing your code in vb6 or inside some office application (VBA)?
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Feb 21st, 2012, 12:24 PM
#4
Member
Re: Parsing Word Document & Export to Excel
I think I would drive this from within an Excel VBA module. Hopefully, all your Word docs are in one directory! If that directory is a constant, you can hard-code your file path; otherwise you're going to have to set up a way to search all in one directory and then ask if you need to do more, if so navigate to the next directory and start again.
You can set a document object for each opened document, and search for the table you need. I hope there's something consistent and unique that identifies only the one table you need in each doc?? Likewise, are the table cells you need consistent for each applicable table?
As you copy the data into Excel, is it the same data type each time? For instance: first column = date; second = name; third = amount?? As you change docs, do you want to skip a line in the Excel sheet? Do you want each doc's info on a separate sheet?
Figuring all this out ahead of time will save lots of re-writing!!
Ed
-
Mar 14th, 2012, 06:51 AM
#5
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
Hey Guys thank you so much for all of your input.
As per Ed_from_AZ's suggestion I decided to drive from within an application. However, my requirements have since changed slightly.
I now need to parse selected chapters within the word document. So at the moment I'm using a checklist populated from a config file (all chapter names) (requirement). Based on the selection(s) I have to locate chapter(s) within the word doc (chapter headings) once identified I need to copy that full chapter export into an excel doc. An issue is also the chapter names appear numerous times throughout the doc, so the only way to differentiate the chapter heading from any other mention is by heading style. All the chapter headings are in 'h2' style. Chapters include both paragraphs and tables. I am running out of excel, I wish to export to a different spreadsheet.
Here is what I've got so far, this does locate the chapter heading, but that is it. It is also quite slow, these documents are 20,000+ words, so if I select the final chapter, it takes 30-40 seconds to hit. I would like to keep the code pretty much along the same lines (same logic), but if anyone can please help I would greatly appreciate it!!
I have made pretty good progress (I think) but unfortunately progress is slowing and I'm coming up against a brick wall as I don't really have much experience in vba
HTML Code:
'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()
ListBox1.ListFillRange = "Config!A1:A45"
End Sub
'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================
Public Sub Parse_Click()
'======================================================================
' DECLARING VARIABLES
'======================================================================
Dim i As Long
Dim C As New Collection
Dim Path As String
With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With
'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub
Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With
If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If
'Install an error handler (remove if you have any)
On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub
Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub
'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================
Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True
'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If
Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0
WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next
CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()
Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
End Sub
-
Mar 14th, 2012, 08:38 AM
#6
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
Now that I've changed to vba, maybe I should repost in office development forum
-
Mar 14th, 2012, 08:47 AM
#7
Member
Re: Parsing Word Document & Export to Excel
When you say: "I am running out of excel, I wish to export to a different spreadsheet.", do you mean you're running out of room in the sheet to place your data? And you want to use a different program? Or you want to overflow onto a new worksheet? Why aren't you using the 2007 or 2010 versions (xlsx or xlsm)? These give you over a million rows and columns to XFD, vice 65536 rows and columns to IV.
Your config file: it contains the doc name and the specific chapters within that doc that you need to find? If not, how are you knowing which chpter in which doc?
When you say: "I need to copy that full chapter export into an excel doc" and "Chapters include both paragraphs and tables.", what do you need to copy into Excel and what is it supposed to look like when you're done? Whole chapters in one cell? Paragraphs in individual cells, and tables in an Excel table? Are there any special requirements that must be met to allow this data to be manipulated within Excel? Or is this just a report and the inrformation is imported for display only?
One option to get the whole chapter might be:
-- Open doc
-- Set bookmarks to every H2 chapter heading
-- Iterate through bookmarks until you hit one that matches the text in your config file
(This assumes you have an array or at least a string of all the chapter titles for that doc that you can match against.)
-- When you get a hit, grab the index number of that bookmark, and set a range from doc.bookmark(index) to doc.bookmark(index + 1).
That should give you a range to that chapter; with that, you can copy, parse, find tables, whatever.
Here's some sites that might help you:
http://www.gmayor.com/Word_pages.htm
http://wordfaqs.mvps.org/
http://word.mvps.org/FAQs/MacrosVBA/index.htm
http://word.mvps.org/FAQs/InterDev/E...ateBinding.htm
http://word.mvps.org/FAQs/InterDev/C...WordFromXL.htm
http://msdn.microsoft.com/en-us/library/Aa203688
Good luck!
Ed
-
Mar 14th, 2012, 08:51 AM
#8
Member
Re: Parsing Word Document & Export to Excel
Oh - and if you're running this from within Excel VBA, you might consider pulling your config file list (if it's CSV or similar format) into a sheet in your Excel workbook, vice holding populated arrays in memory. Then you can just iterate down the listings in the sheet by row and column and know what you're looking for.
Ed
-
Mar 14th, 2012, 09:36 AM
#9
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
hi Ed_from_AZ....sorry I'm running in from within excel vba, not running out of excel...
the config file is a worksheet within the workbook.
That's a good idea about setting up the bookmarks, I will look into that. Thank you for your suggestions and help
The data imported into Excel is just for viewing/reporting purposes at the moment, tbh, they're not entirely sure what way they want it laid out in the spreadsheet yet.
-
Mar 15th, 2012, 06:05 AM
#10
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
Ed_from_AZ...There are already bookmarks within the doc, so I'm just wondering, the word documents I'm working on have an index, each chapter listed in the index has a hyperlink that when clicked brings you straight to the chapter. Can I use these in someway to extract the whole chapter between hyperlinks?
I would really appreciate a dig out with this as I've a progress meeting on Tues as we've a bank hol here on mon.
Thank you in advance
-
Mar 15th, 2012, 09:25 AM
#11
Member
Re: Parsing Word Document & Export to Excel
Sounds like an auto-genereated Table of Contents. And with that, each of your chapter headings would indeed have a bookmark as a place for the hyperlink in the TOC to jump to. Sure, those bookmarks can be used - if you know what they are and they are consistent between docs.
You're looking at a loop, which contains repeating instructions to be run for each desired chapter in a doc - and then needs to be repeated for each desired doc. If the chapter heading bookmarks are all the same in each doc for each chapter you need to parse, then by all means use what's there. Otherwise, I'd set up my own bookmarks just so I could write code that knows exactly what to look for each and every time.
In "air code", what I envision would look something like:
vb Code:
For Each doc in your collection of docs
' Set bookmarks
numBkmark = 1
For Each H2 header
Set range to header range
Set bookmark to range _
bkmk.name = "Chapter_" & numBkmark
numBkmark = numBkmark + 1
Next H2 header
' Iterate bkmks to find the right chapter
numBkmark = 1
For Each bkmark in doc.bookmarks
If bkmark.name = "Chapter_" & numBkmark Then
If bkmark.name is in your list of chapters
' We've got a match!
Set a range _
Start:= bkmark.range.start, _
End:= next bookmark.range.start
'This gives you a range for everything
'between the two chapter headings
'Now you can extract the text, tables, etc.
End If
End If
numBkmark = numBlmark + 1
Next bkmark
Next doc
Anyway, that's how I would start it.
Hope I said it so it makes sense to you, too!! :8>)
Ed
-
Mar 15th, 2012, 09:50 AM
#12
Member
Re: Parsing Word Document & Export to Excel
What I see as the building blocks for this are:
-- A way to create a collection of documents to work on, or
A method to use the doc path and name to get to the doc
-- A list that identifies which chapters you need to parse:
Is there a consistent chapter title? or
Will you need to identify a certain table or text within a chapter?
-- A loop that will iterate all the chapter headings or chapters, identify the right one(s), and grab the table, text, photo, whatever
-- A procedure that will pull the desired element(s) from the doc into the workbook in a useable manner
Does each doc and/or chapter need its own worksheet?
Are you going to try to aggregate some or all of this info? (Don't do it on the fly - get it all into the workbook first, and then you can work with it.)
Can you get someone to give you an idea of what a "useable manner" looks like??
-- Error handling:
Doc doesn't have the chapter - do you need to send someone a red flag?
What's in the doc range refuses to copy over into the worksheet - how should you handle it?
The doc range exceeds the limits of the worksheet - how should you handle it?
Someone else is using the doc when your code wants it - can you detect that and come back to it?
Flesh out the project as much as possible: flowchart the operation; detect loops within loops and If/Then branches; look at what you need to indentify everything you must work with - doc, workbook, worksheet, ranges in both Word and Excel, bookmarks, paragraphs, tables, etc. What are the blind spots? What are the possible hiccups and errors?
Craft some questions and put *them* on the spot!! What do you need from them to write good code? You can write any code they want, but you can't read their minds to know what they want. Intelligent questions can make you look better than lots of info.
Cheers!
Ed
-
Mar 15th, 2012, 11:02 AM
#13
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
..that's great...thanks...
I toggled the field codes for the index....
this is the result....
{HYPERLINK\|"_Toc292701022"}
{HYPERLINK\|"_Toc292701023"}
|
|
V
{HYPERLINK\|"_Toc292701074"}
So these are the bookmarks/hyperlinks for each chapter, you can see there are 52 chapters. They are in numerical order, so I'm guessing I could use these? You see I'm populating a multi list box from a config file (which is a list of the chapters), so based on the selection, thats the chapter I need to parse.
At the moment selecting a folder to process.
At the moment the requirement is to parse the whole chapter and "dump" it into an excel spread sheet. So tbh, if I was that far for Tues, I would be extremely happy!!
-
Mar 15th, 2012, 11:20 AM
#14
Member
Re: Parsing Word Document & Export to Excel
If you can translate "Toc292701022" into the chapter you need, then you're good to go. Try using the GoTo feature and type in one of these bookmark names and see if you go directly to the desired chapter.
Are you going to have to do all these one at a time? Open a doc, populate a list box, manually choose a chapter, and then run your code? Or is this just to create a demo of yhow you can get the info into Excel?
What does a chapter look like? Is it all paragraphs of plain text? Will you have graphics, graphs, charts, tables, hyperlinks?
What versions of Word and Excel are you using?
-
Mar 16th, 2012, 03:45 AM
#15
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
I tried putting in the link into goto but to no avail....
How it works at the moment, as I'm running this out of excel vba, I open the workbook (multi list already populated), I tick relevant chapters, then hit a parse button, this then opens a folder picker, which allows me to select the folder that contains the documents for parsing and hit ok, the program (at the moment), then opens the documents one by one and goes through them, finds the heading brings up a text box (for testing) and closes the doc, and moves onto the next...
I hope to find the heading, extract the chapter, then copy it to a worksheet, then close the doc and move to the next.
That is how it is now, eventually they want to link it to an internal tool within the company to parse docs stored in that, but I don't need to worry about that at the moment.
Chapters contain both paragraphs and tables..
I'm using office 2003
Thank you very much for your help thus far, I've gotten this far and now I'm pulling my hair out!
Last edited by DJ-DOO; Mar 16th, 2012 at 07:42 AM.
-
Mar 16th, 2012, 04:32 AM
#16
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
could you set ranges by heading style rather than bookmarks?? so from one h2 style to the next?? I don't know and again, I'm really appreciative of your help as I'm quite inexperienced in vba, I am proficient in Java & c++, but when it comes to vba, it's absolutely learn as you go...
Just to add, I have posted a similar style question to this on another forum, I found out today that cross posting is a no no, I honestly wasn't aware of this, I thought that you post to forums to ask the experts and hope for some good advice....I've posted the link to my other post below. Apologies!
http://www.vbaexpress.com/forum/show...d=1#post262800
Last edited by DJ-DOO; Mar 16th, 2012 at 04:35 AM.
-
Mar 16th, 2012, 07:36 AM
#17
Thread Starter
Junior Member
-
Mar 16th, 2012, 07:43 AM
#18
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
I was wrong the original bookmarks work!!
_Toc292701022
_Toc292701023
|
|
V
_Toc292701074"
-
Mar 16th, 2012, 04:18 PM
#19
Member
Re: Parsing Word Document & Export to Excel
I've come up with something - hope it works for you. I tried to attach the files, but it wouldn't go. So here's the code that I put in a code module in an Excel file.
Sheet1 of this file is set up with your chapter headers in column A, and an X by some of them in column B. The code runs down column B - if there's an X (or x - it will catch either case), it grabs the text of that chapter.
Hope it gives you a start.
Ed
Code:
Sub GetMyStuff()
'*******************
'Object declarations
'
'To declare Word objects, set a reference
'to the Word object library.
'Tools >> References >> Microsoft Word
'
'(Since both Word and Excel have range objects,
'but they have different Methods and Properties,
'we must specifically declare as Word or Excel
'range to avoid confusion.)
'*******************
'Set Excel objects
Dim XLwkb1 As Workbook 'This workbook
Dim XLwkb2 As Workbook 'Workbook for data
Dim XLwks1 As Worksheet 'Worksheet with config
Dim XLwks2 As Worksheet 'Worksheet to hold data
Dim XLrngC1 As Excel.Range 'Individual cell in config wksht
Dim XLrngC2 As Excel.Range 'Individual cell in data wksht
Dim XLrngW1 As Excel.Range 'Range of cells in wks1
Dim XLrngW2 As Excel.Range 'Range of cells in wks2
'Set Word objects
Dim appWD As Word.Application
Dim WDdoc As Document 'Report document
Dim WDrngCh As Word.Range 'Range for one complete chapter
Dim WDrngHdr1 As Word.Range 'Range for one chapter header
Dim WDrngHdr2 As Word.Range 'Range for next chapter header
'Other objects
Dim strRptP As String 'Complete path for report
Dim strRptN As String 'Doc name for report
Dim strChp1 As String 'Title of chapter to grab
Dim strChp2 As String 'Title of next chapter
Dim numRow As Long 'Used for Excel row number
Dim arrName 'Used to extract doc name from path
'*******************
'Main routine
'*******************
'Set initial Excel objects
Set XLwkb1 = ActiveWorkbook
Set XLwks1 = XLwkb1.Worksheets("Sheet1")
'Find and open your Word doc
Set appWD = New Word.Application
strRptP = Application.GetOpenFilename
Set WDdoc = appWD.Documents.Open(strRptP)
appWD.Visible = True
'Get name of report
strRptN = WDdoc.Name
'Remove doc type from name
arrName = Split(strRptN, ".")
strRptN = arrName(0)
'Initialize the search range
Set WDrngHdr1 = WDdoc.Content.Duplicate
WDrngHdr1.Collapse wdCollapseStart 'This point is now the beginning of the document
Set WDrngCh = WDrngHdr1.Duplicate
'Create new workbook for search results
Set XLwkb2 = Workbooks.Add
Set XLwks2 = XLwkb2.Worksheets("Sheet1")
'Format first column
With XLwks2.Range("A:A")
.WrapText = True
.ColumnWidth = 65
End With
'Stop screen flashing
'Comment out or remove if you want to observe it working while testing
'Application.ScreenUpdating = False
'appWD.ScreenUpdating = False
'Find your X
For Each XLrngC1 In XLwks1.Range("B2:B12") 'Iterate through cells B2:B12 to find the X
If XLrngC1 = "x" Or XLrngC1 = "X" Then 'Big X or small x in cell
strChp1 = XLrngC1.Offset(0, -1) 'Get text in cell on same row, one column back
strChp2 = XLrngC1.Offset(1, -1) 'Get text from one row down, one column back
'Set the search range to the unsearched portion of the document
WDrngCh.SetRange _
Start:=WDrngHdr1.End, _
End:=WDdoc.Content.End
'Find chapter name and grab if it's style Heading2
pointGetChp1: 'Set a return point
WDrngCh.Find.Execute Findtext:=strChp1, Forward:=wdFindStop
'Check for Style; if wrong, it's not the chapter heading
If WDrngCh.Style <> "Heading 2" Then _
GoTo pointGetChp1 'Go back and try again
'GoTo CleanUp
'If we're here, we have found the heading for the chapter we need.
'WDrngCh encompases just the text we have found, which is only the heading.
'We need to find the heading of the chapter after this, the get the text in between.
'Set the Hdr1 range to a duplicate of the current range
Set WDrngHdr1 = WDrngCh.Duplicate
'Set the Chapter range to the rest of the document
WDrngCh.SetRange _
Start:=WDrngHdr1.End, _
End:=WDdoc.Content.End
'If the last chapter heading on the config worksheet is X'd, then strChp2 is blank.
'We'll skip the next Find and just grab the remaining text
If strChp2 <> "" Then
'Repeat Find to get next chapter name if it's style Heading2
pointGetChp2: 'Set a return point
WDrngCh.Find.Execute Findtext:=strChp2, Forward:=wdFindStop
'Check for Style; if wrong, it's not the chapter heading
If WDrngCh.Style <> "Heading 2" Then _
GoTo pointGetChp2 'Go back and try again
'Set the Hdr2 range to a duplicate of the current range
Set WDrngHdr2 = WDrngCh.Duplicate
'Get the text in between
WDrngCh.SetRange _
Start:=WDrngHdr1.End, _
End:=WDrngHdr2.Start
End If
'Set our text into new workbook, Sheet1
'Find the first empty cell in Column A and add 2
numRow = XLwks2.Range("A10000").End(xlUp).Row + 2
'Put chapter heading in first cell
XLwks2.Range("A" & numRow).Value = WDrngHdr1.Text
'Skip a cell and copy chapter text
WDrngCh.Copy
XLwks2.Range("A" & numRow + 2).PasteSpecial xlPasteValues
'If you stop here and look at the pasted-in results,
'Excel has put each paragraph in one cell.
'Not sure what it will do with tables, graphics, etc.
'This completes the grab for one chapter
'Lather, rinse, repeat until finished
End If
Next XLrngC1
CleanUp:
'Screen updating on
Application.ScreenUpdating = True
appWD.ScreenUpdating = True
'Save and close the Excel file
'If you have a designated folder, add it into the Filename
XLwkb2.SaveAs Filename:=WDdoc.Path & "\" & strRptN
XLwkb2.Close
'Close the report doc
WDdoc.Close wdDoNotSaveChanges
appWD.Quit 'Closes and quits Word
Set appWD = Nothing 'Releases the Word object
End Sub
-
Mar 16th, 2012, 04:44 PM
#20
Member
Re: Parsing Word Document & Export to Excel
I uploaded the files to my web site so maybe you can download them from there.
www.FamilyFunMagic.com/Stuff/MyReport.doc
www.FamilyFunMagic.com/Stuff/MyReport.xls
Ed
-
Mar 21st, 2012, 04:45 AM
#21
Thread Starter
Junior Member
-
Mar 21st, 2012, 01:16 PM
#22
Member
Re: Parsing Word Document & Export to Excel
I did some playing with your sample doc. I don't think we're going to be able to use the Table of Contents bookmarks. (Probably someone else with a lot more experience in Word VBA could, but I don't think I'm going to get there!) Word has a lot of hidden bookmarks. The TOC bookmarks are hidden, and you can't access them by code unless you know the secret password:
vb Code:
ActiveDocument.Bookmarks.ShowHidden = True
Once that vat is open, though, there's a ton of stuff! Now you have to sift and sort through all the hidden bookmarks by name and range and anything else you can think of. Because your search text is goin to be found in several of them - I don't know why. And going to the place in the document for that specific bookmark is also difficult.
So I changed the code to search for your chapter/subchapter text and verify the style. The code searches from the end of the document for the text in the Config file, so it bypasses the TOC altogether. The only hiccup during test was that your Heading 2 style overflowed into your body text at one point, so it caught the caption text in the body paragraph instead of the header. And it doesn't treat tables very well - but that can all be smoothed out as they decide what they want to do with the data.
So here's the updated code. I do have several Stop points in there to allow you to see what's going on. Very useful for me since I have two monitors and I can put the code on one and the test doc on the other!
Code:
Sub GetMyStuff()
'*******************
'Object declarations
'
'To declare Word objects, set a reference
'to the Word object library.
'Tools >> References >> Microsoft Word
'
'(Since both Word and Excel have range objects,
'but they have different Methods and Properties,
'we must specifically declare as Word or Excel
'range to avoid confusion.)
'*******************
'Set Excel objects
Dim XLwkb1 As Workbook 'This workbook
Dim XLwkb2 As Workbook 'Workbook for data
Dim XLwks1 As Worksheet 'Worksheet with config
Dim XLwks2 As Worksheet 'Worksheet to hold data
Dim XLrngC1 As Excel.Range 'Individual cell in config wksht
Dim XLrngC2 As Excel.Range 'Individual cell in data wksht
Dim XLrngW1 As Excel.Range 'Range of cells in wks1
Dim XLrngW2 As Excel.Range 'Range of cells in wks2
'Set Word objects
Dim appWD As Word.Application
Dim WDdoc As Document 'Report document
Dim WDrngCh As Word.Range 'Range for one complete chapter
Dim WDrngHdr1 As Word.Range 'Range for one chapter header
Dim WDrngHdr2 As Word.Range 'Range for next chapter header
'Other objects
Dim strRptP As String 'Complete path for report
Dim strRptN As String 'Doc name for report
Dim strChp1 As String 'Title of chapter to grab
Dim strChp2 As String 'Title of next chapter
Dim strStyle As String 'Name of text style
Dim lastRow As Long 'Last row to search in config book
Dim numRow As Long 'Used for row number in data wkb
Dim arrName 'Used to extract doc name from path
Dim valAutoSec 'Word AutomationSecurity setting
'*******************
'Main routine
'*******************
'Set initial Excel objects
Set XLwkb1 = ActiveWorkbook
Set XLwks1 = XLwkb1.Worksheets("Sheet1")
'Find and open your Word doc
Set appWD = New Word.Application
'Turn off macros
On Error Resume Next
valAutoSec = appWD.AutomationSecurity
appWD.AutomationSecurity = msoAutomationSecurityForceDisable
On Error GoTo 0
strRptP = Application.GetOpenFilename
If strRptP <> "" Or _
strRptP = "False" Then
Set WDdoc = appWD.Documents.Open(strRptP)
appWD.Visible = True
'Get name of report
strRptN = WDdoc.Name
'Remove doc type from name
arrName = Split(strRptN, ".")
strRptN = arrName(0)
Else
MsgBox "No file selected"
GoTo CleanUp
End If
'Create new workbook for search results
Set XLwkb2 = Workbooks.Add
Set XLwks2 = XLwkb2.Worksheets("Sheet1")
'Format first column
With XLwks2.Range("A:A")
.WrapText = True
.ColumnWidth = 65
End With
'Stop screen flashing
'Comment out or remove if you want to observe it working while testing
'Application.ScreenUpdating = False
'appWD.ScreenUpdating = False
'Find your X
'Get the last row of config requirements
lastRow = XLwks1.Range("A10000").End(xlUp).Row
'Iterate through cells in Col B to find the X
For Each XLrngC1 In XLwks1.Range("B2:B" & lastRow)
'********
'For checking code
Debug.Print XLrngC1.Row
Stop
'********
If XLrngC1.Value = "x" Or XLrngC1.Value = "X" Then 'Big X or small x in cell
strChp1 = XLrngC1.Offset(0, -1) 'Get text in cell on same row, one column back
strChp2 = XLrngC1.Offset(1, -1) 'Get text from one row down, one column back
Set WDrngCh = WDdoc.Content.Duplicate
'Find chapter name
pointGetChp1: 'Set a return point
WDrngCh.Find.Execute Findtext:=strChp1, Forward:=False, MatchCase:=True, Wrap:=wdFindStop
If WDrngCh Is Nothing Then
MsgBox strChp1 & " not found in this document."
GoTo CleanUp
End If
'********
'For checking code
WDrngCh.Select
Stop
'********
'GoTo CleanUp
'Check for Style
strStyle = WDrngCh.Style
Stop
'GoTo CleanUp
If InStr(1, strStyle, "Heading 2") = 0 Then _
GoTo pointGetChp1 'Go back and try again
'GoTo CleanUp
'Set range to chapter heading
Set WDrngHdr1 = WDrngCh.Duplicate
'********
'For checking code
WDrngHdr1.Select
Stop
'********
'Reset the search range
' Set WDrngCh = WDdoc.Content.Duplicate
WDrngCh.SetRange WDrngHdr1.End, WDdoc.Content.End
'If the last chapter heading on the config worksheet is X'd, then strChp2 is blank.
'We'll skip the next Find and just grab the remaining text
If strChp2 <> "" Then
'Repeat Find to get next chapter name
pointGetChp2: 'Set a return point
WDrngCh.Find.Execute Findtext:=strChp2, Forward:=False, MatchCase:=True, Wrap:=wdFindStop
If WDrngCh Is Nothing Then
MsgBox strChp2 & " not found in this document."
GoTo CleanUp
End If
'********
'For checking code
WDrngCh.Select
Stop
'********
'GoTo CleanUp
'Check for Style
strStyle = WDrngCh.Style
Stop
'GoTo CleanUp
If InStr(1, strStyle, "Heading 2") = 0 Then _
GoTo pointGetChp2 'Go back and try again
'Set range to chapter heading
Set WDrngHdr2 = WDrngCh.Duplicate
'********
'For checking code
WDrngHdr2.Select
Stop
'********
'Get the text in between
WDrngCh.SetRange _
Start:=WDrngHdr1.End, _
End:=WDrngHdr2.Start
Else
Stop
WDrngCh.SetRange _
Start:=WDrngHdr1.End, _
End:=WDdoc.Content.End
End If
'********
'For checking code
WDrngCh.Select
Stop
'********
'Set our text into new workbook, Sheet1
'Find the first empty cell in Column A and add 2
numRow = XLwks2.Range("A10000").End(xlUp).Row + 2
'Put chapter heading in first cell
XLwks2.Range("A" & numRow).Value = WDrngHdr1.Text
'Skip a cell and copy chapter text
WDrngCh.Copy
XLwks2.Range("A" & numRow + 2).PasteSpecial xlPasteValues
'If you stop here and look at the pasted-in results,
'Excel has put each paragraph in one cell.
'Not sure what it will do with tables, graphics, etc.
'This completes the grab for one doc
'Lather, rinse, repeat until finished
End If
Next XLrngC1
CleanUp:
'Screen updating on
Application.ScreenUpdating = True
appWD.ScreenUpdating = True
appWD.AutomationSecurity = valAutoSec
'Save and close the Excel file
'If you have a designated folder, add it into the Filename
XLwkb2.SaveAs Filename:=WDdoc.Path & "\" & strRptN
XLwkb2.Close
'Close the report doc
WDdoc.Close wdDoNotSaveChanges
appWD.DisplayAlerts = wdAlertsNone
appWD.Quit 'Closes and quits Word
Set appWD = Nothing 'Releases the Word object
appWD.DisplayAlerts = wdAlertsAll
End Sub
-
Mar 22nd, 2012, 10:28 AM
#23
Thread Starter
Junior Member
Re: Parsing Word Document & Export to Excel
Ed I'm wondering, based on my selection in the multi list box, it is possible to put an 'x' in the corresponding cell in the next column? That way, I could incorporate your answer into my program. The parsing has to be based on selection, so consider the following scenario:
If you select a chapter heading in the multi list...an x in placed in the adjacent relevant cell. That way when your code searches for the x in column b it will be there based on selection, then once parse we can clear column b?
Last edited by DJ-DOO; Mar 22nd, 2012 at 10:33 AM.
-
Mar 22nd, 2012, 11:23 AM
#24
Member
Re: Parsing Word Document & Export to Excel
Well, the col B 'x' was just my way of setting up the list of what chapters to grab. It just made it easy to know which two headings we were working with, because for my code you needed the chapter heading you wanted to grab and the next one in line so you knew where to stop selecting.
So you're populating a ListBox ... from "Config!A1:A45"? I would:
-- grab the returned value from the ListBox
-- find the row number of that value in your A1:A45 list
-- drop down one row to get the next value
-- use those two values as WDrngHdr1 and WDrngHdr2
That eliminates the need for an extra column of "X"s next to your Config list.
The other thing you could do is simply view the Config list page and set it up so when you put an "X" by a heading, it grabs it, runs the code, and then clears the "X". Can be fired either by a Selection_Change or Worksheet_Change macro in the worksheet or by a button in the worksheet. That would eliminate the ListBox altogether. That would seem simpler to me - but I don't know why you have the requirements for the ListBox.
Last edited by Ed_from_AZ; Mar 22nd, 2012 at 12:29 PM.
-
Mar 23rd, 2012, 06:25 AM
#25
Thread Starter
Junior Member
-
Mar 23rd, 2012, 02:02 PM
#26
Member
Re: Parsing Word Document & Export to Excel
No problem at all with the hand-holding! I'd love to say I learned all this all by myself - but truth be told my hand was held many a time! Two things I have learned though - F1 (Help) and Google are your good friends.
With a ListBox, Value returns the value of the selected item as a text string, and ListIndex returns the position in the list of the selected item as Long I think. A gotcha here - the ListIndex starts with 0, not 1. So if you populate your listbox in the same order as the items are listed in A1:A45, Value will show you the selected header, and ListIndex + 1 will give you the row it's in on the Config sheet. Something like this:
Code:
Private Sub ListBox1_Change()
ListBox1.Tag = ListBox1.ListIndex
End Sub
I put a CommandButton on the form to call the routine:
Code:
Private Sub CommandButton1_Click()
Call GetMyStuff
Unload Me
End Sub
And in the main routine, you do something like this:
Code:
'Stop screen flashing
'Comment out or remove if you want to observe it working while testing
'Application.ScreenUpdating = False
'appWD.ScreenUpdating = False
lastRow = UserForm1.ListBox1.Tag
''''
''''
''''
'''''Find your X
'''''Get the last row of config requirements
''''lastRow = XLwks1.Range("A10000").End(xlUp).Row
'''''Iterate through cells in Col B to find the X
''''For Each XLrngC1 In XLwks1.Range("B2:B" & lastRow)
''''
'''' '********
'''' 'For checking code
'''' Debug.Print XLrngC1.Row
'''' Stop
'''' '********
''''
'''' If XLrngC1.Value = "x" Or XLrngC1.Value = "X" Then 'Big X or small x in cell
'''' strChp1 = XLrngC1.Offset(0, -1) 'Get text in cell on same row, one column back
'''' strChp2 = XLrngC1.Offset(1, -1) 'Get text from one row down, one column back
If lastRow <> "" Then
Set XLrngC1 = XLwks1.Range("A" & lastRow + 1)
strChp1 = XLrngC1.Value 'Get text in cell
strChp2 = XLrngC1.Offset(1, 0) 'Get text from one row down
Set WDrngCh = WDdoc.Content.Duplicate
'Find chapter name
pointGetChp1: 'Set a return point
WDrngCh.Find.Execute Findtext:=strChp1, Forward:=False, MatchCase:=True, Wrap:=wdFindStop
And don't forget to comment out the end of the With loop:
Code:
End If
''''Next XLrngC1
CleanUp:
Okay - I've tested a bit of this, but not all of this.
Shout back if it doesn't make sense.
Ed
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|