I am a third year student and I am on my work placement I have been assigned a task of parsing 100s of documents and export them to excel. At this time I've only 3 weeks experience in vba....any help would be greatly appreciated.
I need to parse selected chapters within the word documents. So at the moment I'm using a checklist populated from a config file (all chapter names) (requirement). Based on the selection(s) I have to locate chapter(s) within the word doc (chapter headings) once identified I need to copy that full chapter export into an excel doc. An issue is also the chapter names appear numerous times throughout the doc, so the only way to differentiate the chapter heading from any other mention is by heading style. All the chapter headings are in 'h2' style. Chapters include both paragraphs and tables. I am running out of excel, I wish to export to a different spreadsheet.
Here is what I've got so far, this does locate the chapter heading, but that is it. It is also quite slow, these documents are 20,000+ words, so if I select the final chapter, it takes 30-40 seconds to hit, which let's face it is faaaarrr to slow...I would like to keep the code pretty much along the same lines (same logic), but if anyone can please help I would greatly appreciate it!!
Any idea on how to extract the chapter(s) and perhaps suggest a more efficient method that would speed up my program.
I have made pretty good progress (I think) but unfortunately progress is slowing and I'm coming up against a brick wall as I don't really have much experience in vba
HTML Code:
'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()
ListBox1.ListFillRange = "Config!A1:A45"
End Sub
'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================
Public Sub Parse_Click()
'======================================================================
' DECLARING VARIABLES
'======================================================================
Dim i As Long
Dim C As New Collection
Dim Path As String
With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With
'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub
Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With
If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If
'Install an error handler (remove if you have any)
On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub
Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub
'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================
Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True
'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If
Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0
WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next
CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()
Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
End Sub
You should check to see if the style is H2 first before you loop through your collection. You could save the oPara.Range.Text to a string before looping through the collection so it doesn't have to navigate the object every time you do an Instr. Those could help speed it up a little.
If there is a table of contents based on H2, you could search that instead of the document.
If these are docx files, you could load the document xml and parse it that way instead. This would probably be your best bet since looping paragraphs is a lot slower.
That's all the help I can provide from this far off. Hope it helps.
The index is in normal style, which lists the chapter headings, and the actual chapter headings are in h2 style. Unfortunately they're .doc files, so I can't parse xml. Could you please explain a little more about saving oPara.Range.Text to a string??
having read your other thread again, i do not believe it makes any difference at all whether you had done this in vb6 or from within excel or word
now you are working within excel you no longer need to create objects of excel, you can just use the application object
you should be able to use words find method for some of your parsing
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
Could I use bookmarks as I've been discussing with Ed in the vb 6 and earlier forum?
There is an index page on which all chapters are listed all chapters have hyperlinks, and there are invisible bookmark at each chapter heading. I have listed a snippet of the chapters and their corresponding bookmarks below.
_Toc292701022 Introduction
_Toc292701023 Document history
_Toc292701024 Purpose
_Toc292701025 Scope
_Toc292701026 Prerequisites and Dependencies
_Toc292701027 Open Issues
_Toc292701028 Decision Log
_Toc292701029 Assumptions
_Toc292701030 Input Requirements
_Toc292701031 Main Requirements
_Toc292701032 External Stakeholder Requirements
_Toc292701033 System Functionality and Characteristics impacts
_Toc292701034 Statement of problem
_Toc292701035 Use Case Impacts
If anyone could help me I would appreciate it. I just need to find out how to link bookmarks to selection and then how to set a range, I would be so so so so grateful...I know this logic will work, it's just the implementation of it, I have 3 weeks experience in vba and I am learning as I go from forums and the web etc.. so I would be very grateful for assistance
post a sample document (.doc), with TOC, and i will see if i can find a simple solution to select the required chapters,
i assume that you want all the chapters in the TOC, i am not sure that bookmarks help in this case, unless you want to get all the text at a book mark up till the next one
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
I've attached a test doc, I've shortened it and I've just substituted the sensitive information with latin ...on the index page, there is the TOC & I've listed the relevant bookmark addresses underneath, they're in order of the TOC.
I would be looking to extract the entire chapter from top to bottom (text/tables etc) so from one bookmark to the next basically.
Thank you very much for your help on this, it's much appreciated.