Results 1 to 26 of 26

Thread: Read MS Word Document Title

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Read MS Word Document Title

    Sorry I named the thread wrong. The proper name is "Read MS Word Document Author Property"
    I need to get the title of the MS Word document. When the document is created by VB program its Author property is set by VB and the file is saved. Then I need to find out the Author to process something depending on the title of the document. I wrote this code:
    Code:
    Private Function IsDBBasedProposal(FileName As String) As Boolean
    Dim objWord As Object
    Dim wd As Object
        If objWord Is Nothing Then
            Set objWord = CreateObject("Word.Application")
        Else
            Set objWord = GetObject(, "Word.Application")
        End If
    
        DoEvents
    
        Set wd = objWord.Documents.Open(FileName)
    
        If wd.BuiltInDocumentProperties("Author") = "CoordinatorDB" Then 'wdPropertyTitle
            IsDBBasedProposal = True
        Else
            IsDBBasedProposal = False
        End If
    
        objWord.ActiveDocument.Close
        DoEvents
        objWord.Quit
        Set objWord = Nothing
    End Function
    It gives exactly what I want, but it is slow. On the local drive it takes 2 seconds, in the network it is even longer.
    Is there any way to read the Author property of MS Word document without creating Word Object, opening the document? I hope it could be faster.
    I tried FileSystemObject with no success

    Thank you
    Last edited by chapran; Feb 16th, 2018 at 03:58 PM.

  2. #2
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    Well, I can tell you "the hard way" to do it. And this is only if we're talking about DOCX (or DOTX, DOCM, DOTM) type files. These files are actually just zipped files with many other files within them. From VB6, I'd use wqweto's unzip utility (which can be found here). It's a pure VB6 solution with all source code available.

    Once you're in a position to unzip a file, if you unzip one of these DOCX files, you'll see a sub-folder named docProps. Within this docProps folder, you'll find a file named core.xml.

    This is a standard ASCII/ANSI file, and will open with notepad (or using VB6 Line Input statements). Also, there are several XML parsers floating around these forums.

    Within this core.xml file, you'll find a <dc:creator> tag. Within that tag will be the document's author. For instance, you might find something like:

    <dc:creator>Elroy</dc:creator>

    That's what you're supposedly looking for.

    Best Of Luck,
    Elroy

    EDIT1: Just to put it all together, you might try opening one of these DOCX files with something like 7Zip and snooping around in it.
    Last edited by Elroy; Feb 16th, 2018 at 08:44 PM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  3. #3

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    Symantec Endpoint does not allow to open the file reporting virus.

  4. #4

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    I found this:
    https://support.microsoft.com/en-us/...erties-when-yo
    I used the DLL and modified the code from example to this:
    Code:
    Public Function IsDBBasedProposal(FileName As String) As Boolean
    Dim objSummProps As Object
    Dim blnOpenReadOnly As Boolean
    Dim objDocumentProps As Object
       
        blnOpenReadOnly = CBool(cdlOFNFileMustExist And cdlOFNReadOnly)
        
        If Len(FileName) = 0 Then
            IsDBBasedProposal = False
            Exit Function
        End If
        Set objDocumentProps = CreateObject("DSOFile.OleDocumentProperties")
        objDocumentProps.Open FileName, blnOpenReadOnly, 2 'dsoOptionOpenReadOnlyIfNoWriteAccess
        
        Set objSummProps = objDocumentProps.SummaryProperties
        
        If objSummProps.author = "CoordinatorDB" Then
            IsDBBasedProposal = True
        Else
            IsDBBasedProposal = False
        End If
    End Function
    It works just perfect

  5. #5
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    Open what file? Your DOCX file? Open it how? With 7Zip? Not sure what to tell you. Maybe your DOCX file is infected.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  6. #6
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    It looks like a nice solution, even if it does require a new dependency. I'm sure that Dsofile.dll is just doing what I outlined. But hey ho. Now you don't have to jump through all the hoops I outlined.

    Take Care,
    Elroy
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  7. #7

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    Quote Originally Posted by Elroy View Post
    Open what file? Your DOCX file? Open it how? With 7Zip? Not sure what to tell you. Maybe your DOCX file is infected.
    You gave me the link. There is a button for download UnzipClass-master.zip file. I downloaded and tried to unzip. I've got the message that one of the internal files is infected. Unzip process was stopped.

  8. #8
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    Ahhh, you seem to be correct. It seems the Project1.exe in wqweto's example is infected. I've only ever taken the source code and incorporated it into my own project. I'll let wqweto know through a PM that his example is infected.

    Take Care,
    Elroy
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  9. #9
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    Hi,

    you could use Shell

    this will return all Files located in Folder TestText
    I have Textfiles; Excel ;Word in that Folder, and all Author's
    are returned.

    Code:
    Private Sub Command3_Click()
    Dim sFile As Variant
    Dim i As Long
    Dim oShell: Set oShell = CreateObject("Shell.Application")
    Dim oDir:   Set oDir = oShell.Namespace("c:\TestText")
    For Each sFile In oDir.Items
       List1.AddItem oDir.GetDetailsOf(sFile, 9) '9 = is the Author
    Next
    
    For i = 0 To 40
       Debug.Print i, oDir.GetDetailsOf(oDir.Items, i) ' get a list of what is avalible to read
    Next
    End Sub
    EDIT:
    with Listview
    Code:
    Dim sFile As Variant
    Dim i As Long
    Dim Li As ListItem
     
     
    Dim oShell: Set oShell = CreateObject("Shell.Application")
    Dim oDir:   Set oDir = oShell.Namespace("c:\TestText")
    With ListView1
             .View = lvwReport
             .LabelEdit = lvwManual
             .FullRowSelect = True
             .ListItems.Clear
             .ColumnHeaders.Clear
             .ColumnHeaders.Add , , "Filename", 3000
             .ColumnHeaders.Add , , "File Type", 1500, vbLeftJustify
             .ColumnHeaders.Add , , "Author", 1200, vbLeftJustify
             .ColumnHeaders.Add , , "Kb", 900, vbCenter
    
    For Each sFile In oDir.Items
       List1.AddItem oDir.GetDetailsOf(sFile, 5) '9 = is the Author
           Set Li = .ListItems.Add(, , sFile)
                Li.SubItems(1) = oDir.GetDetailsOf(sFile, 2)
                Li.SubItems(2) = oDir.GetDetailsOf(sFile, 9)
                Li.SubItems(3) = oDir.GetDetailsOf(sFile, 1)
        Next
    End With
    
    For i = 0 To 40
       Debug.Print i, oDir.GetDetailsOf(oDir.Items, i) ' get a list
    Next
    End Sub
    regards
    Chris
    Last edited by ChrisE; Feb 17th, 2018 at 02:47 AM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  10. #10
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: Read MS Word Document Title

    Quote Originally Posted by Elroy View Post
    \It seems the Project1.exe in wqweto's example is infected. I've only ever taken the source code and incorporated it into my own project. I'll let wqweto know through a PM that his example is infected.
    Thanks, Elroy.

    It turns out it's the other way around. In the past 6 years this unzip class started being used by lots of droppers and other malware so it's signatures got included in most anti-virus databases, so Project1.exe is now falsely recognized as virus or related malware.

    Rest assured the sample executable is clean and the warning is a false alarm, but I just removed it from the repo -- it's a bad practice to keep binaries under source control anyway.

    I personally would never use this unzip class as it's dog slow, being native VB6 implementation. Latest ZipArchive would be much better for the job of browsing & uncompressing .docx files, and much faster for sure.

    It's a single class too, and can be trimmed to extract-only version (that can ony unzip) with conditional compilation like ZIP_NOCOMPRESS = 1

    cheers,
    </wqw>

  11. #11
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    Ahhh, thanks for getting this straightened out wqweto. It's actually your ZipArchive version that I use.

    I just did a quick Google search and bumped into the other one to give chapran a way to unzip files in VB6.

    Chapran, if you still have your ears on and you're at all interested in digging out the author the "hard" way, use wqweto's latest ZipArchive to get it done. It sounds like you're on your way though with the Dsofile.dll from Microsoft. Sorry about the somewhat old link.

    Take Care,
    Elroy
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  12. #12

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    Tnanks a lot to all who tried to help me.
    The way I found is much easier than the way recommended here.

  13. #13
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    Quote Originally Posted by chapran View Post
    Tnanks a lot to all who tried to help me.
    The way I found is much easier than the way recommended here.
    and how ?
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  14. #14

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    Quote Originally Posted by ChrisE View Post
    and how ?
    I posted the code with DSOFile above

  15. #15
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    Quote Originally Posted by chapran View Post
    I posted the code with DSOFile above
    didn't see that, but adding a DLL just to get the Author from documents ?
    Last edited by ChrisE; Feb 17th, 2018 at 10:34 AM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  16. #16

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    I tried the suggested here download. It was infected. Then I was informed that the infected file was removed. I downloaded again. No virus - fine. I'm starting the project, it cannot download Form becaus it doesn't see the , I created a new project hoping to add the class. When I added the class its code has many red lines. My knowledg doesn't allow me to find what's wrong with those lines.
    So I decided to use the approach I found somewhere.
    The way offered here is too complex for my limited knowledge.

    Thank you.

  17. #17
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    ah I see,
    well glad you got it working now

    regards
    Chris
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  18. #18
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: Read MS Word Document Title

    FYI, here is a code snippet to retrieve docProps using cZipArchive as Elroy described in second post:
    vb Code:
    1. Option Explicit
    2.  
    3. '--- for WideCharToMultiByte
    4. Private Const CP_UTF8                       As Long = 65001
    5.  
    6. Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
    7.  
    8. Private Sub Form_Load()
    9.     Dim oProps      As Object
    10.     Dim vKey        As Variant
    11.    
    12.     Set oProps = GetOpenXmlDocProps("D:\TEMP\Biff12\clip2.xlsb")
    13.     For Each vKey In oProps.Keys
    14.         Debug.Print vKey & ": " & oProps.Item(vKey)
    15.     Next
    16. End Sub
    17.  
    18. Public Function GetOpenXmlDocProps(FileName As String) As Object
    19.     Dim oRetVal     As Object
    20.     Dim baCoreXml() As Byte
    21.     Dim oRoot       As Object
    22.     Dim oNode       As Object
    23.    
    24.     Set oRetVal = CreateObject("Scripting.Dictionary")
    25.     oRetVal.CompareMode = vbTextCompare
    26.     With New cZipArchive
    27.         If Not .OpenArchive(FileName) Then
    28.             GoTo QH
    29.         End If
    30.         If Not .Extract(baCoreXml, "docProps/core.xml") Then
    31.             GoTo QH
    32.         End If
    33.     End With
    34.     With CreateObject("MSXML2.DOMDocument")
    35.         .LoadXml FromUtf8Array(baCoreXml)
    36.         .setProperty "SelectionNamespaces", "xmlns:cp=""http://schemas.openxmlformats.org/package/2006/metadata/core-properties"""
    37.         Set oRoot = .selectSingleNode("//cp:coreProperties")
    38.         If oRoot Is Nothing Then
    39.             GoTo QH
    40.         End If
    41.         For Each oNode In oRoot.childNodes
    42.             oRetVal.Item(oNode.baseName) = oNode.Text
    43.         Next
    44.     End With
    45. QH:
    46.     Set GetOpenXmlDocProps = oRetVal
    47. End Function
    48.  
    49. Public Function FromUtf8Array(baText() As Byte) As String
    50.     Dim lSize           As Long
    51.    
    52.     FromUtf8Array = String$(2 * UBound(baText), 0)
    53.     lSize = MultiByteToWideChar(CP_UTF8, 0, baText(0), UBound(baText) + 1, StrPtr(FromUtf8Array), Len(FromUtf8Array))
    54.     FromUtf8Array = Left$(FromUtf8Array, lSize)
    55. End Function
    `GetOpenXmlDocProps` function usually fetches "creator", "lastModifiedBy", "created" and "modified" as keys of the retured dictionary.

    cheers,
    </wqw>
    Last edited by wqweto; Feb 17th, 2018 at 11:27 AM.

  19. #19
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    5,652

    Re: Read MS Word Document Title

    chapran, DSOFile is working for you for newer docx files? Gave me errors for the new zip-based ones.

    ChrisE, same for going through the shell for me; it works with docx for you? I tried the code you posted just to be sure; I can't seem to access the properties through the files' IPropertyStore either. Was really hoping to not have to parse xml.

  20. #20
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    Hi ,

    see Image for results with diffrent Files

    Name:  Author.jpg
Views: 377
Size:  25.6 KB

    you could upload a docx File
    I will try to read the Author with the code I have.

    regards
    Chris
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  21. #21
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    Chris,

    It'd be much better if you just posted your code. Two reasons: 1) I believe they frown on the uploading of Office files, as they're not ANSI and they're not VB6; and 2) if your code works, then others could use it, specifically for the purposes of this thread.

    Best Regards,
    Elroy
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  22. #22
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    Quote Originally Posted by Elroy View Post
    Chris,

    It'd be much better if you just posted your code. Two reasons: 1) I believe they frown on the uploading of Office files, as they're not ANSI and they're not VB6; and 2) if your code works, then others could use it, specifically for the purposes of this thread.

    Best Regards,
    Elroy
    Hi Elroy,

    see Post#9
    EDIT: with Listview
    that is the Code I used

    I tried this under Win7 64Bit, I only have VB6 Installed on that PC, there is no Office installed. as you can see from the Image, no Author is shown, but I don't
    recieve any Error running the Code ????

    wonder why fafalone is getting an Error

    here a Image (Win7 64Bit - no Office Installed.)
    Name:  AuthorWin7.jpg
Views: 331
Size:  27.5 KB

    EDIT:
    changed the code and now I get the Author from .doc and .xls Files

    Code:
    Dim sFile As Variant
    Dim i As Long
    Dim Li As ListItem
     
     
    Dim oShell: Set oShell = CreateObject("Shell.Application")
    Dim oDir:   Set oDir = oShell.Namespace("c:\TestText")
    With ListView1
             .View = lvwReport
             .LabelEdit = lvwManual
             .FullRowSelect = True
             .ListItems.Clear
             .ColumnHeaders.Clear
             .ColumnHeaders.Add , , "Filename", 3000
             .ColumnHeaders.Add , , "File Type", 1500, vbLeftJustify
             .ColumnHeaders.Add , , "Author", 1200, vbLeftJustify
             .ColumnHeaders.Add , , "Kb", 900, vbCenter
    
    For Each sFile In oDir.Items
       List1.AddItem oDir.GetDetailsOf(sFile, 5) '9 = is the Author
           Set Li = .ListItems.Add(, , sFile)
                Li.SubItems(1) = oDir.GetDetailsOf(sFile, 2)
                Li.SubItems(2) = oDir.GetDetailsOf(sFile, 20)
                Li.SubItems(3) = oDir.GetDetailsOf(sFile, 1)
        Next
    End With
    
    For i = 0 To 40
       Debug.Print i, oDir.GetDetailsOf(oDir.Items, i) ' get a list
    Next
    End Sub
    regards
    Chris
    Last edited by ChrisE; Feb 18th, 2018 at 12:41 PM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  23. #23
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Read MS Word Document Title

    @ChrisE: Ahhh, very good. I'll get out of the way now. Y'all take care.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  24. #24

    Thread Starter
    Hyperactive Member
    Join Date
    Apr 2014
    Posts
    362

    Re: Read MS Word Document Title

    Quote Originally Posted by fafalone View Post
    chapran, DSOFile is working for you for newer docx files?
    My application uses doc documents as Templates because these features were written before docx were offered by Microsoft. So, I did not try with docx and for my situation it is not important. My current goal to find the method to identify what way this particular document was created. The old documents either have blank Authors property or something not set by my program. New documents will have some special values (for instance "From DB", "Based On Template" etc.) in Authors which will be used to decide what way to work with document.

  25. #25
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    5,652

    Re: Read MS Word Document Title

    @ChrisE, it's still not working for docx/xlsx files created with Office 2013. It's not that a runtime error is raised it just returns blanks for authors (20) and title (21) (the numbers are from the list of properties that your code prints out at the bottom)...

    5: 2/17/2018 9:32 PM
    2: Microsoft Word Document
    20:
    21:
    1: 29.5 KB

    5: 2/17/2018 9:34 PM
    2: Microsoft Excel Worksheet
    20:
    21:
    1: 10.6 KB

    5: 2/17/2018 9:34 PM
    2: Microsoft Word Document
    20:
    21:
    1: 16.5 KB

    I'm on Win7 x64 without Office installed as well. Windows Explorer displays, and can edit these properties, I wonder if it's really just manually digging into the xml. (Your code, like my IPropertyStore based method, works fine on some old Office 97-2003 docs I have; just not the new xml based ones)
    Last edited by fafalone; Feb 19th, 2018 at 07:49 PM.

  26. #26
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: Read MS Word Document Title

    Quote Originally Posted by fafalone View Post
    @ChrisE, it's still not working for docx/xlsx files created with Office 2013. It's not that a runtime error is raised it just returns blanks for authors (20) and title (21) (the numbers are from the list of properties that your code prints out at the bottom)...

    5: 2/17/2018 9:32 PM
    2: Microsoft Word Document
    20:
    21:
    1: 29.5 KB

    5: 2/17/2018 9:34 PM
    2: Microsoft Excel Worksheet
    20:
    21:
    1: 10.6 KB

    5: 2/17/2018 9:34 PM
    2: Microsoft Word Document
    20:
    21:
    1: 16.5 KB

    I'm on Win7 x64 without Office installed as well. Windows Explorer displays, and can edit these properties, I wonder if it's really just manually digging into the xml. (Your code, like my IPropertyStore based method, works fine on some old Office 97-2003 docs I have; just not the new xml based ones)
    sure looks like M$ want's us digging, if you think about it... we just want to know the Author...
    when I have time I'll get a new Verson of Office and Install, but I don't think it will change anything.

    wqweto's shown us how to do it, it would have been nice the other way.


    regards
    Chris
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width