|
-
Mar 14th, 2012, 09:02 AM
#1
Thread Starter
Junior Member
Parsing Word doc and export to Excel
Hello Everyone!
I am a third year student and I am on my work placement I have been assigned a task of parsing 100s of documents and export them to excel. At this time I've only 3 weeks experience in vba....any help would be greatly appreciated.
I need to parse selected chapters within the word documents. So at the moment I'm using a checklist populated from a config file (all chapter names) (requirement). Based on the selection(s) I have to locate chapter(s) within the word doc (chapter headings) once identified I need to copy that full chapter export into an excel doc. An issue is also the chapter names appear numerous times throughout the doc, so the only way to differentiate the chapter heading from any other mention is by heading style. All the chapter headings are in 'h2' style. Chapters include both paragraphs and tables. I am running out of excel, I wish to export to a different spreadsheet.
Here is what I've got so far, this does locate the chapter heading, but that is it. It is also quite slow, these documents are 20,000+ words, so if I select the final chapter, it takes 30-40 seconds to hit, which let's face it is faaaarrr to slow...I would like to keep the code pretty much along the same lines (same logic), but if anyone can please help I would greatly appreciate it!!
Any idea on how to extract the chapter(s) and perhaps suggest a more efficient method that would speed up my program.
I have made pretty good progress (I think) but unfortunately progress is slowing and I'm coming up against a brick wall as I don't really have much experience in vba 
HTML Code:
'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()
ListBox1.ListFillRange = "Config!A1:A45"
End Sub
'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================
Public Sub Parse_Click()
'======================================================================
' DECLARING VARIABLES
'======================================================================
Dim i As Long
Dim C As New Collection
Dim Path As String
With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With
'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub
Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With
If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If
'Install an error handler (remove if you have any)
On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub
Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub
'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================
Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True
'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If
Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0
WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next
CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()
Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
End Sub
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|