Results 1 to 26 of 26

Thread: Parsing Word Document & Export to Excel

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Parsing Word Document & Export to Excel

    Hi...

    I'm a third year software design student and I'm currently on my work placement. I have never used VB before, but my research has led me to believe that this is the best method...I have been given the task of programatically parsing hundreds of word (2003) documents for information stored in certain tables. The documents comprise of paragraphs and tables. I need to extract the info in preferred tables and export it to excel...Can anyone help please. I know it can be done but I've been researching this for a week now and I really am pulling my hair out at this stage. If anyone can help or point me in the right direction or even link a solid tutorial I would be so, so grateful.


    Kind regards,
    DJ-DOO

  2. #2
    Frenzied Member
    Join Date
    May 2006
    Location
    some place in the cloud
    Posts
    1,886

    Re: Parsing Word Document & Export to Excel

    JG


    ... If your problem is fixed don't forget to mark your threads as resolved using the Thread Tools menu ...

  3. #3
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Parsing Word Document & Export to Excel

    how do you define which tables in each document contain information you want?

    vb Code:
    1. for each t in doc.tables    ' where doc is a document object
    2.   if t.cell(1, 1).range.text = mystring then ' or some other criteria
    3.     'do stuff here
    4.   end if
    5. next
    are you writing your code in vb6 or inside some office application (VBA)?
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  4. #4
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    I think I would drive this from within an Excel VBA module. Hopefully, all your Word docs are in one directory! If that directory is a constant, you can hard-code your file path; otherwise you're going to have to set up a way to search all in one directory and then ask if you need to do more, if so navigate to the next directory and start again.

    You can set a document object for each opened document, and search for the table you need. I hope there's something consistent and unique that identifies only the one table you need in each doc?? Likewise, are the table cells you need consistent for each applicable table?

    As you copy the data into Excel, is it the same data type each time? For instance: first column = date; second = name; third = amount?? As you change docs, do you want to skip a line in the Excel sheet? Do you want each doc's info on a separate sheet?

    Figuring all this out ahead of time will save lots of re-writing!!

    Ed

  5. #5

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    Hey Guys thank you so much for all of your input.

    As per Ed_from_AZ's suggestion I decided to drive from within an application. However, my requirements have since changed slightly.

    I now need to parse selected chapters within the word document. So at the moment I'm using a checklist populated from a config file (all chapter names) (requirement). Based on the selection(s) I have to locate chapter(s) within the word doc (chapter headings) once identified I need to copy that full chapter export into an excel doc. An issue is also the chapter names appear numerous times throughout the doc, so the only way to differentiate the chapter heading from any other mention is by heading style. All the chapter headings are in 'h2' style. Chapters include both paragraphs and tables. I am running out of excel, I wish to export to a different spreadsheet.

    Here is what I've got so far, this does locate the chapter heading, but that is it. It is also quite slow, these documents are 20,000+ words, so if I select the final chapter, it takes 30-40 seconds to hit. I would like to keep the code pretty much along the same lines (same logic), but if anyone can please help I would greatly appreciate it!!


    I have made pretty good progress (I think) but unfortunately progress is slowing and I'm coming up against a brick wall as I don't really have much experience in vba

    HTML Code:
    
    '====================================================================
    '               POPULATING LIST BOX WITH DATA IN
    '                      CONFIG WORKSHEET
    '=====================================================================
    Private Sub UserForm_Initialize()
    
    ListBox1.ListFillRange = "Config!A1:A45"
    
    End Sub
    
    '======================================================================
    '                   PROCESSING LISTBOX SELECTION
    '======================================================================
    
    Public Sub Parse_Click()
    
    '======================================================================
    '                       DECLARING VARIABLES
    '======================================================================
    
      Dim i As Long
      Dim C As New Collection
      Dim Path As String
    
      With ListBox1
        For i = 0 To .ListCount - 1
          'Add all selected items to a collection
          If .Selected(i) Then C.Add .List(i)
        Next
      End With
    
      'Nothing selected, nothing to do
      If C.Count = 0 Then Exit Sub
    
      With Application.FileDialog(msoFileDialogFolderPicker)
        .Title = "Select Folder to Process and Click OK"
        .AllowMultiSelect = False
        .InitialView = msoFileDialogViewList
        If .Show <> -1 Then Exit Sub
    
        Path = .SelectedItems(1)
        If Right(Path, 1) <> "\" Then Path = Path + "\"
        'Remove any "
        Path = Replace(Path, """", "")
      End With
    
      If Dir$(Path & "*.doc") = "" Then
        MsgBox "No files found"
        Exit Sub
      End If
      
      'Install an error handler (remove if you have any)
      On Error GoTo Errorhandler
      ParseDoc Path, C
      Exit Sub
      
    Errorhandler:
      MsgBox "Error " & Err.Number & ": " & Err.Description
    End Sub
    
    '======================================================================
    '                   PARSING WORD DOC FOR
    '                       SELECTED ITEMS
    '======================================================================
    
    Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
      Dim objExcel As Object 'Excel.Application
      Dim ExcelBook As Object 'Excel.Workbook
      Dim WasOpen As Boolean
      Dim oDoc As Document
      Dim oPara As Paragraph
      Dim strFilename As String
      Dim Item
      Dim Rng As Range
      Dim objWord As Word.Application
      Set objWord = New Word.Application
      objWord.Visible = True
        
      'Setting Location of Excel Spread for Parsed Details
      Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
    
      'Set objWord = New Word.Application
      On Error Resume Next
      WasOpen = True
      Set objExcel = GetObject(, "Excel.Application")
      If objExcel Is Nothing Then
        Set objExcel = CreateObject("Excel.Application")
        If objExcel Is Nothing Then _
          Err.Raise 1000, "ParseDoc", "Excel is not accessible"
        objExcel.Visible = True
        WasOpen = False
      End If
      
      Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
      If ExcelBook Is Nothing Then
        If WasOpen Then objExcel.Quit
        Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
      End If
      On Error GoTo 0
    
      WordBasic.DisableAutoMacros 1
      strFilename = Dir$(strPath & "*.doc")
      While Len(strFilename) <> 0
        Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
    
            For Each oPara In oDoc.Paragraphs
              For Each Item In Items
                If InStr(1, oPara.Range, Item) > 0 Then
                 If InStr(1, oPara.Style, "H2") > 0 Then
                  oPara.Range.Select
                  MsgBox "You have found the string!"
                  GoTo CloseDoc
                 End If
                End If
              Next
            Next  
                  
    CloseDoc:
        oDoc.Close wdDoNotSaveChanges
        strFilename = Dir$()
        
      Wend
      WordBasic.DisableAutoMacros 0
      objWord.Quit
    End Sub
    

  6. #6

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    Now that I've changed to vba, maybe I should repost in office development forum

  7. #7
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    When you say: "I am running out of excel, I wish to export to a different spreadsheet.", do you mean you're running out of room in the sheet to place your data? And you want to use a different program? Or you want to overflow onto a new worksheet? Why aren't you using the 2007 or 2010 versions (xlsx or xlsm)? These give you over a million rows and columns to XFD, vice 65536 rows and columns to IV.

    Your config file: it contains the doc name and the specific chapters within that doc that you need to find? If not, how are you knowing which chpter in which doc?

    When you say: "I need to copy that full chapter export into an excel doc" and "Chapters include both paragraphs and tables.", what do you need to copy into Excel and what is it supposed to look like when you're done? Whole chapters in one cell? Paragraphs in individual cells, and tables in an Excel table? Are there any special requirements that must be met to allow this data to be manipulated within Excel? Or is this just a report and the inrformation is imported for display only?

    One option to get the whole chapter might be:
    -- Open doc
    -- Set bookmarks to every H2 chapter heading
    -- Iterate through bookmarks until you hit one that matches the text in your config file
    (This assumes you have an array or at least a string of all the chapter titles for that doc that you can match against.)
    -- When you get a hit, grab the index number of that bookmark, and set a range from doc.bookmark(index) to doc.bookmark(index + 1).

    That should give you a range to that chapter; with that, you can copy, parse, find tables, whatever.

    Here's some sites that might help you:
    http://www.gmayor.com/Word_pages.htm
    http://wordfaqs.mvps.org/
    http://word.mvps.org/FAQs/MacrosVBA/index.htm
    http://word.mvps.org/FAQs/InterDev/E...ateBinding.htm
    http://word.mvps.org/FAQs/InterDev/C...WordFromXL.htm
    http://msdn.microsoft.com/en-us/library/Aa203688

    Good luck!
    Ed

  8. #8
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    Oh - and if you're running this from within Excel VBA, you might consider pulling your config file list (if it's CSV or similar format) into a sheet in your Excel workbook, vice holding populated arrays in memory. Then you can just iterate down the listings in the sheet by row and column and know what you're looking for.

    Ed

  9. #9

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    hi Ed_from_AZ....sorry I'm running in from within excel vba, not running out of excel...

    the config file is a worksheet within the workbook.

    That's a good idea about setting up the bookmarks, I will look into that. Thank you for your suggestions and help

    The data imported into Excel is just for viewing/reporting purposes at the moment, tbh, they're not entirely sure what way they want it laid out in the spreadsheet yet.

  10. #10

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    Ed_from_AZ...There are already bookmarks within the doc, so I'm just wondering, the word documents I'm working on have an index, each chapter listed in the index has a hyperlink that when clicked brings you straight to the chapter. Can I use these in someway to extract the whole chapter between hyperlinks?

    I would really appreciate a dig out with this as I've a progress meeting on Tues as we've a bank hol here on mon.

    Thank you in advance

  11. #11
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    Sounds like an auto-genereated Table of Contents. And with that, each of your chapter headings would indeed have a bookmark as a place for the hyperlink in the TOC to jump to. Sure, those bookmarks can be used - if you know what they are and they are consistent between docs.

    You're looking at a loop, which contains repeating instructions to be run for each desired chapter in a doc - and then needs to be repeated for each desired doc. If the chapter heading bookmarks are all the same in each doc for each chapter you need to parse, then by all means use what's there. Otherwise, I'd set up my own bookmarks just so I could write code that knows exactly what to look for each and every time.

    In "air code", what I envision would look something like:

    vb Code:
    1. For Each doc in your collection of docs
    2.   ' Set bookmarks
    3.   numBkmark = 1
    4.   For Each H2 header
    5.     Set range to header range
    6.     Set bookmark to range _
    7.       bkmk.name = "Chapter_" & numBkmark
    8.     numBkmark = numBkmark + 1
    9.   Next H2 header
    10.  
    11.   ' Iterate bkmks to find the right chapter
    12.   numBkmark = 1
    13.   For Each bkmark in doc.bookmarks
    14.     If bkmark.name = "Chapter_" & numBkmark Then
    15.       If bkmark.name is in your list of chapters
    16.         ' We've got a match!
    17.         Set a range _
    18.           Start:= bkmark.range.start, _
    19.           End:= next bookmark.range.start
    20.         'This gives you a range for everything
    21.         'between the two chapter headings
    22.         'Now you can extract the text, tables, etc.
    23.  
    24.       End If
    25.     End If
    26.     numBkmark = numBlmark + 1
    27.   Next bkmark
    28. Next doc
    Anyway, that's how I would start it.
    Hope I said it so it makes sense to you, too!! :8>)

    Ed

  12. #12
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    What I see as the building blocks for this are:

    -- A way to create a collection of documents to work on, or
    A method to use the doc path and name to get to the doc
    -- A list that identifies which chapters you need to parse:
    Is there a consistent chapter title? or
    Will you need to identify a certain table or text within a chapter?
    -- A loop that will iterate all the chapter headings or chapters, identify the right one(s), and grab the table, text, photo, whatever
    -- A procedure that will pull the desired element(s) from the doc into the workbook in a useable manner
    Does each doc and/or chapter need its own worksheet?
    Are you going to try to aggregate some or all of this info? (Don't do it on the fly - get it all into the workbook first, and then you can work with it.)
    Can you get someone to give you an idea of what a "useable manner" looks like??
    -- Error handling:
    Doc doesn't have the chapter - do you need to send someone a red flag?
    What's in the doc range refuses to copy over into the worksheet - how should you handle it?
    The doc range exceeds the limits of the worksheet - how should you handle it?
    Someone else is using the doc when your code wants it - can you detect that and come back to it?

    Flesh out the project as much as possible: flowchart the operation; detect loops within loops and If/Then branches; look at what you need to indentify everything you must work with - doc, workbook, worksheet, ranges in both Word and Excel, bookmarks, paragraphs, tables, etc. What are the blind spots? What are the possible hiccups and errors?

    Craft some questions and put *them* on the spot!! What do you need from them to write good code? You can write any code they want, but you can't read their minds to know what they want. Intelligent questions can make you look better than lots of info.

    Cheers!
    Ed

  13. #13

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    ..that's great...thanks...

    I toggled the field codes for the index....

    this is the result....

    {HYPERLINK\|"_Toc292701022"}
    {HYPERLINK\|"_Toc292701023"}
    |
    |
    V
    {HYPERLINK\|"_Toc292701074"}

    So these are the bookmarks/hyperlinks for each chapter, you can see there are 52 chapters. They are in numerical order, so I'm guessing I could use these? You see I'm populating a multi list box from a config file (which is a list of the chapters), so based on the selection, thats the chapter I need to parse.

    At the moment selecting a folder to process.

    At the moment the requirement is to parse the whole chapter and "dump" it into an excel spread sheet. So tbh, if I was that far for Tues, I would be extremely happy!!

  14. #14
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    If you can translate "Toc292701022" into the chapter you need, then you're good to go. Try using the GoTo feature and type in one of these bookmark names and see if you go directly to the desired chapter.

    Are you going to have to do all these one at a time? Open a doc, populate a list box, manually choose a chapter, and then run your code? Or is this just to create a demo of yhow you can get the info into Excel?

    What does a chapter look like? Is it all paragraphs of plain text? Will you have graphics, graphs, charts, tables, hyperlinks?

    What versions of Word and Excel are you using?

  15. #15

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    I tried putting in the link into goto but to no avail....

    How it works at the moment, as I'm running this out of excel vba, I open the workbook (multi list already populated), I tick relevant chapters, then hit a parse button, this then opens a folder picker, which allows me to select the folder that contains the documents for parsing and hit ok, the program (at the moment), then opens the documents one by one and goes through them, finds the heading brings up a text box (for testing) and closes the doc, and moves onto the next...

    I hope to find the heading, extract the chapter, then copy it to a worksheet, then close the doc and move to the next.

    That is how it is now, eventually they want to link it to an internal tool within the company to parse docs stored in that, but I don't need to worry about that at the moment.

    Chapters contain both paragraphs and tables..

    I'm using office 2003

    Thank you very much for your help thus far, I've gotten this far and now I'm pulling my hair out!
    Last edited by DJ-DOO; Mar 16th, 2012 at 07:42 AM.

  16. #16

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    could you set ranges by heading style rather than bookmarks?? so from one h2 style to the next?? I don't know and again, I'm really appreciative of your help as I'm quite inexperienced in vba, I am proficient in Java & c++, but when it comes to vba, it's absolutely learn as you go...

    Just to add, I have posted a similar style question to this on another forum, I found out today that cross posting is a no no, I honestly wasn't aware of this, I thought that you post to forums to ask the experts and hope for some good advice....I've posted the link to my other post below. Apologies!

    http://www.vbaexpress.com/forum/show...d=1#post262800
    Last edited by DJ-DOO; Mar 16th, 2012 at 04:35 AM.

  17. #17

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    I have managed to locate the bookmarks, they were hidden bookmarks, now unfortunately they aren't in order (numerically).

    e.g.

    _Toc129482794
    _Toc292701023
    _Toc63137014
    _Toc63137015

    these do work!

    Now my thinking is if there is some way (a way of which I don't know) to incorporate these with the chapter listings in the config file, then once selected in the multi list box, i can go to bookmark rather than heading.

    which will also make it easier to create a range...

    Here's an example of my config list

    Introduction
    Document history
    Purpose
    Scope
    Prerequisites and Dependencies
    Open Issues
    Decision Log
    Assumptions
    Input Requirements
    Main Requirements
    External Stakeholder Requirements

    I've matched each bookmark to each chapter e.g.

    _Toc129482794 Introduction
    _Toc292701023 Document history
    _Toc63137014 Purpose
    _Toc63137015 Scope
    _Toc63137016 Prerequisites and Dependencies
    _Toc63137021 Open Issues
    _Toc63137022 Decision Log
    _Toc129482805 Assumptions
    _Toc63137023 Input Requirements
    _Toc63137024 Main Requirements
    _Toc292701032 External Stakeholder Requirements

    Would you have any thoughts on this?
    This will also speed up my program!!

    Thanking you in advance!

  18. #18

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    I was wrong the original bookmarks work!!

    _Toc292701022
    _Toc292701023
    |
    |
    V
    _Toc292701074"

  19. #19
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    I've come up with something - hope it works for you. I tried to attach the files, but it wouldn't go. So here's the code that I put in a code module in an Excel file.

    Sheet1 of this file is set up with your chapter headers in column A, and an X by some of them in column B. The code runs down column B - if there's an X (or x - it will catch either case), it grabs the text of that chapter.

    Hope it gives you a start.

    Ed

    Code:
    Sub GetMyStuff()
    
    '*******************
    'Object declarations
    '
    'To declare Word objects, set a reference
    'to the Word object library.
    'Tools >> References >> Microsoft Word
    '
    '(Since both Word and Excel have range objects,
    'but they have different Methods and Properties,
    'we must specifically declare as Word or Excel
    'range to avoid confusion.)
    '*******************
    
    'Set Excel objects
    Dim XLwkb1        As Workbook     'This workbook
    Dim XLwkb2        As Workbook     'Workbook for data
    Dim XLwks1        As Worksheet    'Worksheet with config
    Dim XLwks2        As Worksheet    'Worksheet to hold data
    Dim XLrngC1       As Excel.Range  'Individual cell in config wksht
    Dim XLrngC2       As Excel.Range  'Individual cell in data wksht
    Dim XLrngW1       As Excel.Range  'Range of cells in wks1
    Dim XLrngW2       As Excel.Range  'Range of cells in wks2
    
    'Set Word objects
    Dim appWD         As Word.Application
    Dim WDdoc         As Document     'Report document
    Dim WDrngCh       As Word.Range   'Range for one complete chapter
    Dim WDrngHdr1     As Word.Range   'Range for one chapter header
    Dim WDrngHdr2     As Word.Range   'Range for next chapter header
    
    'Other objects
    Dim strRptP       As String       'Complete path for report
    Dim strRptN       As String       'Doc name for report
    Dim strChp1       As String       'Title of chapter to grab
    Dim strChp2       As String       'Title of next chapter
    
    Dim numRow        As Long         'Used for Excel row number
    Dim arrName                       'Used to extract doc name from path
    
    '*******************
    'Main routine
    '*******************
    
    'Set initial Excel objects
    Set XLwkb1 = ActiveWorkbook
    Set XLwks1 = XLwkb1.Worksheets("Sheet1")
    
    'Find and open your Word doc
    Set appWD = New Word.Application
    strRptP = Application.GetOpenFilename
    Set WDdoc = appWD.Documents.Open(strRptP)
    appWD.Visible = True
    'Get name of report
    strRptN = WDdoc.Name
    'Remove doc type from name
    arrName = Split(strRptN, ".")
    strRptN = arrName(0)
    
    'Initialize the search range
    Set WDrngHdr1 = WDdoc.Content.Duplicate
    WDrngHdr1.Collapse wdCollapseStart  'This point is now the beginning of the document
    Set WDrngCh = WDrngHdr1.Duplicate
    
    'Create new workbook for search results
    Set XLwkb2 = Workbooks.Add
    Set XLwks2 = XLwkb2.Worksheets("Sheet1")
    'Format first column
    With XLwks2.Range("A:A")
      .WrapText = True
      .ColumnWidth = 65
    End With
    
    'Stop screen flashing
    'Comment out or remove if you want to observe it working while testing
    'Application.ScreenUpdating = False
    'appWD.ScreenUpdating = False
    
    'Find your X
    For Each XLrngC1 In XLwks1.Range("B2:B12")  'Iterate through cells B2:B12 to find the X
      If XLrngC1 = "x" Or XLrngC1 = "X" Then    'Big X or small x in cell
        strChp1 = XLrngC1.Offset(0, -1)         'Get text in cell on same row, one column back
        strChp2 = XLrngC1.Offset(1, -1)         'Get text from one row down, one column back
        
        'Set the search range to the unsearched portion of the document
        WDrngCh.SetRange _
          Start:=WDrngHdr1.End, _
          End:=WDdoc.Content.End
          
        'Find chapter name and grab if it's style Heading2
    pointGetChp1:   'Set a return point
        WDrngCh.Find.Execute Findtext:=strChp1, Forward:=wdFindStop
        
        'Check for Style; if wrong, it's not the chapter heading
        If WDrngCh.Style <> "Heading 2" Then _
          GoTo pointGetChp1   'Go back and try again
      'GoTo CleanUp
        'If we're here, we have found the heading for the chapter we need.
        'WDrngCh encompases just the text we have found, which is only the heading.
        'We need to find the heading of the chapter after this, the get the text in between.
        
        'Set the Hdr1 range to a duplicate of the current range
        Set WDrngHdr1 = WDrngCh.Duplicate
        'Set the Chapter range to the rest of the document
        WDrngCh.SetRange _
          Start:=WDrngHdr1.End, _
          End:=WDdoc.Content.End
          
        'If the last chapter heading on the config worksheet is X'd, then strChp2 is blank.
        'We'll skip the next Find and just grab the remaining text
        If strChp2 <> "" Then
        
          'Repeat Find to get next chapter name if it's style Heading2
    pointGetChp2:     'Set a return point
        WDrngCh.Find.Execute Findtext:=strChp2, Forward:=wdFindStop
          
          'Check for Style; if wrong, it's not the chapter heading
          If WDrngCh.Style <> "Heading 2" Then _
            GoTo pointGetChp2   'Go back and try again
        
          'Set the Hdr2 range to a duplicate of the current range
          Set WDrngHdr2 = WDrngCh.Duplicate
        
          'Get the text in between
          WDrngCh.SetRange _
            Start:=WDrngHdr1.End, _
            End:=WDrngHdr2.Start
        End If
        
        'Set our text into new workbook, Sheet1
        'Find the first empty cell in Column A and add 2
        numRow = XLwks2.Range("A10000").End(xlUp).Row + 2
        
        'Put chapter heading in first cell
        XLwks2.Range("A" & numRow).Value = WDrngHdr1.Text
        
        'Skip a cell and copy chapter text
        WDrngCh.Copy
        XLwks2.Range("A" & numRow + 2).PasteSpecial xlPasteValues
        'If you stop here and look at the pasted-in results,
        'Excel has put each paragraph in one cell.
        'Not sure what it will do with tables, graphics, etc.
    
        'This completes the grab for one chapter
        'Lather, rinse, repeat until finished
      End If
    Next XLrngC1
    
    CleanUp:
    'Screen updating on
    Application.ScreenUpdating = True
    appWD.ScreenUpdating = True
    
    'Save and close the Excel file
    'If you have a designated folder, add it into the Filename
    XLwkb2.SaveAs Filename:=WDdoc.Path & "\" & strRptN
    XLwkb2.Close
    
    'Close the report doc
    WDdoc.Close wdDoNotSaveChanges
    appWD.Quit           'Closes and quits Word
    Set appWD = Nothing  'Releases the Word object
    
    End Sub

  20. #20
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    I uploaded the files to my web site so maybe you can download them from there.

    www.FamilyFunMagic.com/Stuff/MyReport.doc
    www.FamilyFunMagic.com/Stuff/MyReport.xls

    Ed

  21. #21

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    Hi Ed,

    Thank you for putting so much work into the answer for me.

    However, unfortunately, I'm trying to, once selected, go directly to that bookmark, then set the range from that bookmark to the next, then I can copy that range....thank you so much for helping me so far, it really has been appreciated.

    Any thoughts on how I might proceed using the table of contents bookmarks? If you do feel inclined to have a look at this problem with me, I've attached a sample doc of which I would be working on...

    Thanks again!!
    Attached Files Attached Files

  22. #22
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    I did some playing with your sample doc. I don't think we're going to be able to use the Table of Contents bookmarks. (Probably someone else with a lot more experience in Word VBA could, but I don't think I'm going to get there!) Word has a lot of hidden bookmarks. The TOC bookmarks are hidden, and you can't access them by code unless you know the secret password:
    vb Code:
    1. ActiveDocument.Bookmarks.ShowHidden = True
    Once that vat is open, though, there's a ton of stuff! Now you have to sift and sort through all the hidden bookmarks by name and range and anything else you can think of. Because your search text is goin to be found in several of them - I don't know why. And going to the place in the document for that specific bookmark is also difficult.

    So I changed the code to search for your chapter/subchapter text and verify the style. The code searches from the end of the document for the text in the Config file, so it bypasses the TOC altogether. The only hiccup during test was that your Heading 2 style overflowed into your body text at one point, so it caught the caption text in the body paragraph instead of the header. And it doesn't treat tables very well - but that can all be smoothed out as they decide what they want to do with the data.

    So here's the updated code. I do have several Stop points in there to allow you to see what's going on. Very useful for me since I have two monitors and I can put the code on one and the test doc on the other!

    Code:
    Sub GetMyStuff()
    
    '*******************
    'Object declarations
    '
    'To declare Word objects, set a reference
    'to the Word object library.
    'Tools >> References >> Microsoft Word
    '
    '(Since both Word and Excel have range objects,
    'but they have different Methods and Properties,
    'we must specifically declare as Word or Excel
    'range to avoid confusion.)
    '*******************
    
    'Set Excel objects
    Dim XLwkb1        As Workbook     'This workbook
    Dim XLwkb2        As Workbook     'Workbook for data
    Dim XLwks1        As Worksheet    'Worksheet with config
    Dim XLwks2        As Worksheet    'Worksheet to hold data
    Dim XLrngC1       As Excel.Range  'Individual cell in config wksht
    Dim XLrngC2       As Excel.Range  'Individual cell in data wksht
    Dim XLrngW1       As Excel.Range  'Range of cells in wks1
    Dim XLrngW2       As Excel.Range  'Range of cells in wks2
    
    'Set Word objects
    Dim appWD         As Word.Application
    Dim WDdoc         As Document     'Report document
    Dim WDrngCh       As Word.Range   'Range for one complete chapter
    Dim WDrngHdr1     As Word.Range   'Range for one chapter header
    Dim WDrngHdr2     As Word.Range   'Range for next chapter header
    
    'Other objects
    Dim strRptP       As String       'Complete path for report
    Dim strRptN       As String       'Doc name for report
    Dim strChp1       As String       'Title of chapter to grab
    Dim strChp2       As String       'Title of next chapter
    Dim strStyle      As String       'Name of text style
    
    Dim lastRow       As Long         'Last row to search in config book
    Dim numRow        As Long         'Used for row number in data wkb
    Dim arrName                       'Used to extract doc name from path
    Dim valAutoSec                    'Word AutomationSecurity setting
    
    '*******************
    'Main routine
    '*******************
    
    'Set initial Excel objects
    Set XLwkb1 = ActiveWorkbook
    Set XLwks1 = XLwkb1.Worksheets("Sheet1")
    
    'Find and open your Word doc
    Set appWD = New Word.Application
    
    'Turn off macros
    On Error Resume Next
    valAutoSec = appWD.AutomationSecurity
    appWD.AutomationSecurity = msoAutomationSecurityForceDisable
    On Error GoTo 0
    
    strRptP = Application.GetOpenFilename
    If strRptP <> "" Or _
       strRptP = "False" Then
      Set WDdoc = appWD.Documents.Open(strRptP)
      appWD.Visible = True
      'Get name of report
      strRptN = WDdoc.Name
      'Remove doc type from name
      arrName = Split(strRptN, ".")
      strRptN = arrName(0)
    Else
      MsgBox "No file selected"
      GoTo CleanUp
    End If
    
    'Create new workbook for search results
    Set XLwkb2 = Workbooks.Add
    Set XLwks2 = XLwkb2.Worksheets("Sheet1")
    'Format first column
    With XLwks2.Range("A:A")
      .WrapText = True
      .ColumnWidth = 65
    End With
    
    'Stop screen flashing
    'Comment out or remove if you want to observe it working while testing
    'Application.ScreenUpdating = False
    'appWD.ScreenUpdating = False
    
    'Find your X
    'Get the last row of config requirements
    lastRow = XLwks1.Range("A10000").End(xlUp).Row
    'Iterate through cells in Col B to find the X
    For Each XLrngC1 In XLwks1.Range("B2:B" & lastRow)
      
        '********
        'For checking code
        Debug.Print XLrngC1.Row
        Stop
        '********
        
      If XLrngC1.Value = "x" Or XLrngC1.Value = "X" Then    'Big X or small x in cell
        strChp1 = XLrngC1.Offset(0, -1)         'Get text in cell on same row, one column back
        strChp2 = XLrngC1.Offset(1, -1)         'Get text from one row down, one column back
        
        Set WDrngCh = WDdoc.Content.Duplicate
        
        'Find chapter name
    pointGetChp1:   'Set a return point
        WDrngCh.Find.Execute Findtext:=strChp1, Forward:=False, MatchCase:=True, Wrap:=wdFindStop
    
      If WDrngCh Is Nothing Then
        MsgBox strChp1 & " not found in this document."
        GoTo CleanUp
      End If
    
        '********
        'For checking code
        WDrngCh.Select
        Stop
        '********
    
        'GoTo CleanUp
        'Check for Style
        strStyle = WDrngCh.Style
        Stop
        'GoTo CleanUp
        If InStr(1, strStyle, "Heading 2") = 0 Then _
          GoTo pointGetChp1   'Go back and try again
        'GoTo CleanUp
        
        'Set range to chapter heading
        Set WDrngHdr1 = WDrngCh.Duplicate
        
        '********
        'For checking code
        WDrngHdr1.Select
        Stop
        '********
    
        'Reset the search range
       ' Set WDrngCh = WDdoc.Content.Duplicate
        
        WDrngCh.SetRange WDrngHdr1.End, WDdoc.Content.End
        
        'If the last chapter heading on the config worksheet is X'd, then strChp2 is blank.
        'We'll skip the next Find and just grab the remaining text
        If strChp2 <> "" Then
    
          'Repeat Find to get next chapter name
    pointGetChp2:     'Set a return point
        WDrngCh.Find.Execute Findtext:=strChp2, Forward:=False, MatchCase:=True, Wrap:=wdFindStop
    
          If WDrngCh Is Nothing Then
            MsgBox strChp2 & " not found in this document."
            GoTo CleanUp
          End If
          
          '********
          'For checking code
          WDrngCh.Select
          Stop
          '********
    'GoTo CleanUp
    
          'Check for Style
          strStyle = WDrngCh.Style
          Stop
          'GoTo CleanUp
          If InStr(1, strStyle, "Heading 2") = 0 Then _
            GoTo pointGetChp2   'Go back and try again
    
          'Set range to chapter heading
          Set WDrngHdr2 = WDrngCh.Duplicate
          
          '********
          'For checking code
          WDrngHdr2.Select
          Stop
          '********
        
          'Get the text in between
          WDrngCh.SetRange _
            Start:=WDrngHdr1.End, _
            End:=WDrngHdr2.Start
        
        Else
        Stop
          WDrngCh.SetRange _
            Start:=WDrngHdr1.End, _
            End:=WDdoc.Content.End
            
        End If
        
          '********
          'For checking code
          WDrngCh.Select
          Stop
          '********
        
        'Set our text into new workbook, Sheet1
        'Find the first empty cell in Column A and add 2
        numRow = XLwks2.Range("A10000").End(xlUp).Row + 2
        
        'Put chapter heading in first cell
        XLwks2.Range("A" & numRow).Value = WDrngHdr1.Text
        
        'Skip a cell and copy chapter text
        WDrngCh.Copy
        XLwks2.Range("A" & numRow + 2).PasteSpecial xlPasteValues
        'If you stop here and look at the pasted-in results,
        'Excel has put each paragraph in one cell.
        'Not sure what it will do with tables, graphics, etc.
    
        'This completes the grab for one doc
        'Lather, rinse, repeat until finished
      End If
    Next XLrngC1
    
    CleanUp:
    'Screen updating on
    Application.ScreenUpdating = True
    appWD.ScreenUpdating = True
    appWD.AutomationSecurity = valAutoSec
    
    'Save and close the Excel file
    'If you have a designated folder, add it into the Filename
    XLwkb2.SaveAs Filename:=WDdoc.Path & "\" & strRptN
    XLwkb2.Close
    
    'Close the report doc
    WDdoc.Close wdDoNotSaveChanges
    appWD.DisplayAlerts = wdAlertsNone
    appWD.Quit           'Closes and quits Word
    Set appWD = Nothing  'Releases the Word object
    appWD.DisplayAlerts = wdAlertsAll
    
    End Sub

  23. #23

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    Ed I'm wondering, based on my selection in the multi list box, it is possible to put an 'x' in the corresponding cell in the next column? That way, I could incorporate your answer into my program. The parsing has to be based on selection, so consider the following scenario:

    If you select a chapter heading in the multi list...an x in placed in the adjacent relevant cell. That way when your code searches for the x in column b it will be there based on selection, then once parse we can clear column b?
    Last edited by DJ-DOO; Mar 22nd, 2012 at 10:33 AM.

  24. #24
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    Well, the col B 'x' was just my way of setting up the list of what chapters to grab. It just made it easy to know which two headings we were working with, because for my code you needed the chapter heading you wanted to grab and the next one in line so you knew where to stop selecting.

    So you're populating a ListBox ... from "Config!A1:A45"? I would:
    -- grab the returned value from the ListBox
    -- find the row number of that value in your A1:A45 list
    -- drop down one row to get the next value
    -- use those two values as WDrngHdr1 and WDrngHdr2

    That eliminates the need for an extra column of "X"s next to your Config list.

    The other thing you could do is simply view the Config list page and set it up so when you put an "X" by a heading, it grabs it, runs the code, and then clears the "X". Can be fired either by a Selection_Change or Worksheet_Change macro in the worksheet or by a button in the worksheet. That would eliminate the ListBox altogether. That would seem simpler to me - but I don't know why you have the requirements for the ListBox.
    Last edited by Ed_from_AZ; Mar 22nd, 2012 at 12:29 PM.

  25. #25

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word Document & Export to Excel

    I am required to develop a tool to be used within the organisation to allow users to parse relevant chapters of the IS/IP reports. So they wish to be able to click on the relevant chapter to parse. The idea is to make it user friendly, and one of the requirements is a multi list box in order to click on the chapter they wish to parse.

    If you have a quick squiz at my code below, where I populate the checklist, I then add each "checked item" into a collection.

    Could you suggest as to how I

    -- grab the returned value from the ListBox
    -- find the row number of that value in your A1:A45 list
    -- drop down one row to get the next value
    -- use those two values as WDrngHdr1 and WDrngHdr2


    I know at this stage you probably feel like you're holding my hand, .....but I have worked my way up to this point with no or very little experience in VBA, it's just at this stage, i'm kinda stuck

    Thank you so much for your help thus far

  26. #26
    Member
    Join Date
    Dec 2010
    Posts
    37

    Re: Parsing Word Document & Export to Excel

    No problem at all with the hand-holding! I'd love to say I learned all this all by myself - but truth be told my hand was held many a time! Two things I have learned though - F1 (Help) and Google are your good friends.

    With a ListBox, Value returns the value of the selected item as a text string, and ListIndex returns the position in the list of the selected item as Long I think. A gotcha here - the ListIndex starts with 0, not 1. So if you populate your listbox in the same order as the items are listed in A1:A45, Value will show you the selected header, and ListIndex + 1 will give you the row it's in on the Config sheet. Something like this:

    Code:
    Private Sub ListBox1_Change()
    
    ListBox1.Tag = ListBox1.ListIndex
    
    End Sub
    I put a CommandButton on the form to call the routine:

    Code:
    Private Sub CommandButton1_Click()
      Call GetMyStuff
      Unload Me
    End Sub
    And in the main routine, you do something like this:

    Code:
    'Stop screen flashing
    'Comment out or remove if you want to observe it working while testing
    'Application.ScreenUpdating = False
    'appWD.ScreenUpdating = False
    
    lastRow = UserForm1.ListBox1.Tag
    
    ''''
    ''''
    ''''
    '''''Find your X
    '''''Get the last row of config requirements
    ''''lastRow = XLwks1.Range("A10000").End(xlUp).Row
    '''''Iterate through cells in Col B to find the X
    ''''For Each XLrngC1 In XLwks1.Range("B2:B" & lastRow)
    ''''
    ''''    '********
    ''''    'For checking code
    ''''    Debug.Print XLrngC1.Row
    ''''    Stop
    ''''    '********
    ''''
    ''''  If XLrngC1.Value = "x" Or XLrngC1.Value = "X" Then    'Big X or small x in cell
    ''''    strChp1 = XLrngC1.Offset(0, -1)         'Get text in cell on same row, one column back
    ''''    strChp2 = XLrngC1.Offset(1, -1)         'Get text from one row down, one column back
    
      If lastRow <> "" Then
        
        Set XLrngC1 = XLwks1.Range("A" & lastRow + 1)
        strChp1 = XLrngC1.Value                 'Get text in cell
        strChp2 = XLrngC1.Offset(1, 0)          'Get text from one row down
        
        Set WDrngCh = WDdoc.Content.Duplicate
        
        'Find chapter name
    pointGetChp1:   'Set a return point
        WDrngCh.Find.Execute Findtext:=strChp1, Forward:=False, MatchCase:=True, Wrap:=wdFindStop
    And don't forget to comment out the end of the With loop:
    Code:
      End If
    ''''Next XLrngC1
    
    CleanUp:
    Okay - I've tested a bit of this, but not all of this.
    Shout back if it doesn't make sense.

    Ed

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width