Results 1 to 9 of 9

Thread: Parsing Word doc and export to Excel

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Parsing Word doc and export to Excel

    Hello Everyone!

    I am a third year student and I am on my work placement I have been assigned a task of parsing 100s of documents and export them to excel. At this time I've only 3 weeks experience in vba....any help would be greatly appreciated.

    I need to parse selected chapters within the word documents. So at the moment I'm using a checklist populated from a config file (all chapter names) (requirement). Based on the selection(s) I have to locate chapter(s) within the word doc (chapter headings) once identified I need to copy that full chapter export into an excel doc. An issue is also the chapter names appear numerous times throughout the doc, so the only way to differentiate the chapter heading from any other mention is by heading style. All the chapter headings are in 'h2' style. Chapters include both paragraphs and tables. I am running out of excel, I wish to export to a different spreadsheet.

    Here is what I've got so far, this does locate the chapter heading, but that is it. It is also quite slow, these documents are 20,000+ words, so if I select the final chapter, it takes 30-40 seconds to hit, which let's face it is faaaarrr to slow...I would like to keep the code pretty much along the same lines (same logic), but if anyone can please help I would greatly appreciate it!!

    Any idea on how to extract the chapter(s) and perhaps suggest a more efficient method that would speed up my program.

    I have made pretty good progress (I think) but unfortunately progress is slowing and I'm coming up against a brick wall as I don't really have much experience in vba


    HTML Code:
    '====================================================================
    '               POPULATING LIST BOX WITH DATA IN
    '                      CONFIG WORKSHEET
    '=====================================================================
    Private Sub UserForm_Initialize()
    
    ListBox1.ListFillRange = "Config!A1:A45"
    
    End Sub
    
    '======================================================================
    '                   PROCESSING LISTBOX SELECTION
    '======================================================================
    
    Public Sub Parse_Click()
    
    '======================================================================
    '                       DECLARING VARIABLES
    '======================================================================
    
      Dim i As Long
      Dim C As New Collection
      Dim Path As String
    
      With ListBox1
        For i = 0 To .ListCount - 1
          'Add all selected items to a collection
          If .Selected(i) Then C.Add .List(i)
        Next
      End With
    
      'Nothing selected, nothing to do
      If C.Count = 0 Then Exit Sub
    
      With Application.FileDialog(msoFileDialogFolderPicker)
        .Title = "Select Folder to Process and Click OK"
        .AllowMultiSelect = False
        .InitialView = msoFileDialogViewList
        If .Show <> -1 Then Exit Sub
    
        Path = .SelectedItems(1)
        If Right(Path, 1) <> "\" Then Path = Path + "\"
        'Remove any "
        Path = Replace(Path, """", "")
      End With
    
      If Dir$(Path & "*.doc") = "" Then
        MsgBox "No files found"
        Exit Sub
      End If
      
      'Install an error handler (remove if you have any)
      On Error GoTo Errorhandler
      ParseDoc Path, C
      Exit Sub
      
    Errorhandler:
      MsgBox "Error " & Err.Number & ": " & Err.Description
    End Sub
    
    '======================================================================
    '                   PARSING WORD DOC FOR
    '                       SELECTED ITEMS
    '======================================================================
    
    Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
      Dim objExcel As Object 'Excel.Application
      Dim ExcelBook As Object 'Excel.Workbook
      Dim WasOpen As Boolean
      Dim oDoc As Document
      Dim oPara As Paragraph
      Dim strFilename As String
      Dim Item
      Dim Rng As Range
      Dim objWord As Word.Application
      Set objWord = New Word.Application
      objWord.Visible = True
        
      'Setting Location of Excel Spread for Parsed Details
      Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
    
      'Set objWord = New Word.Application
      On Error Resume Next
      WasOpen = True
      Set objExcel = GetObject(, "Excel.Application")
      If objExcel Is Nothing Then
        Set objExcel = CreateObject("Excel.Application")
        If objExcel Is Nothing Then _
          Err.Raise 1000, "ParseDoc", "Excel is not accessible"
        objExcel.Visible = True
        WasOpen = False
      End If
      
      Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
      If ExcelBook Is Nothing Then
        If WasOpen Then objExcel.Quit
        Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
      End If
      On Error GoTo 0
    
      WordBasic.DisableAutoMacros 1
      strFilename = Dir$(strPath & "*.doc")
      While Len(strFilename) <> 0
        Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
    
            For Each oPara In oDoc.Paragraphs
              For Each Item In Items
                If InStr(1, oPara.Range, Item) > 0 Then
                 If InStr(1, oPara.Style, "H2") > 0 Then
                  oPara.Range.Select
                  MsgBox "You have found the string!"
                  GoTo CloseDoc
                 End If
                End If
              Next
            Next  
                  
    CloseDoc:
        oDoc.Close wdDoNotSaveChanges
        strFilename = Dir$()
        
      Wend
      WordBasic.DisableAutoMacros 0
      objWord.Quit
    End Sub
    
    
    

  2. #2
    Fanatic Member dmaruca's Avatar
    Join Date
    May 2006
    Location
    Jacksonville, FL
    Posts
    577

    Re: Parsing Word doc and export to Excel

    You should check to see if the style is H2 first before you loop through your collection. You could save the oPara.Range.Text to a string before looping through the collection so it doesn't have to navigate the object every time you do an Instr. Those could help speed it up a little.

    If there is a table of contents based on H2, you could search that instead of the document.

    If these are docx files, you could load the document xml and parse it that way instead. This would probably be your best bet since looping paragraphs is a lot slower.

    That's all the help I can provide from this far off. Hope it helps.

  3. #3

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word doc and export to Excel

    hi dmaruca...

    The index is in normal style, which lists the chapter headings, and the actual chapter headings are in h2 style. Unfortunately they're .doc files, so I can't parse xml. Could you please explain a little more about saving oPara.Range.Text to a string??

    I would appreciate that.

    Thank you for your assistance thus far.

    Regards

  4. #4
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Parsing Word doc and export to Excel

    having read your other thread again, i do not believe it makes any difference at all whether you had done this in vb6 or from within excel or word

    now you are working within excel you no longer need to create objects of excel, you can just use the application object

    you should be able to use words find method for some of your parsing
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  5. #5

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word doc and export to Excel

    Hi

    Could I use bookmarks as I've been discussing with Ed in the vb 6 and earlier forum?

    There is an index page on which all chapters are listed all chapters have hyperlinks, and there are invisible bookmark at each chapter heading. I have listed a snippet of the chapters and their corresponding bookmarks below.

    _Toc292701022 Introduction
    _Toc292701023 Document history
    _Toc292701024 Purpose
    _Toc292701025 Scope
    _Toc292701026 Prerequisites and Dependencies
    _Toc292701027 Open Issues
    _Toc292701028 Decision Log
    _Toc292701029 Assumptions
    _Toc292701030 Input Requirements
    _Toc292701031 Main Requirements
    _Toc292701032 External Stakeholder Requirements
    _Toc292701033 System Functionality and Characteristics impacts
    _Toc292701034 Statement of problem
    _Toc292701035 Use Case Impacts

    If anyone could help me I would appreciate it. I just need to find out how to link bookmarks to selection and then how to set a range, I would be so so so so grateful...I know this logic will work, it's just the implementation of it, I have 3 weeks experience in vba and I am learning as I go from forums and the web etc.. so I would be very grateful for assistance

    Regards,

    Gary

  6. #6
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Parsing Word doc and export to Excel

    post a sample document (.doc), with TOC, and i will see if i can find a simple solution to select the required chapters,
    i assume that you want all the chapters in the TOC, i am not sure that bookmarks help in this case, unless you want to get all the text at a book mark up till the next one
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  7. #7

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word doc and export to Excel

    Hi westconn1

    I've attached a test doc, I've shortened it and I've just substituted the sensitive information with latin ...on the index page, there is the TOC & I've listed the relevant bookmark addresses underneath, they're in order of the TOC.

    I would be looking to extract the entire chapter from top to bottom (text/tables etc) so from one bookmark to the next basically.

    Thank you very much for your help on this, it's much appreciated.

    Regards..
    Attached Files Attached Files

  8. #8
    New Member
    Join Date
    Mar 2012
    Posts
    5

    Re: Parsing Word doc and export to Excel

    Hello,
    Thanks for this value able post, i fond this post really informative.
    Your work is really admire able.
    Last edited by bravo_; Mar 24th, 2012 at 07:18 AM.

  9. #9

    Thread Starter
    Junior Member
    Join Date
    Feb 2012
    Location
    Ireland
    Posts
    23

    Re: Parsing Word doc and export to Excel

    Hi westconn1...any joy after?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width