VBA parsing Word document into Excel with problems [resolved]
Hello all,
As part of the "next big thing" project, my team (me and one other guy) needs to take a huge and very ugly Word document, grab some data out of tables in the document, and populate an Excel worksheet with the results. If this sounds like it shouldn't be that hard, I'm inclined to agree.
However there is a fairly agravating snag:
Whoever created the Word document decided to use a non Windows standard font. Over all this has not created too many problems, but in an unfortunate number of cases somewhere between Word and Excel an unkown character gets converted into a square character and placed in the current cell amidst the text or numbers. If you can see a square here -> , these are the little buggers we're having troubles with. If not, imagine a small square that fills you with rage and you'll get my meaning.
We have tried to do any number of code workarounds to be rid of our friends the squares. We've tried Triming the line, we've written a new trim function that's more robust, we've written a trim function that theoretically will change any ASCII character outside of certain ranges into a blank space. We've used ASCII funtions to take care of line feeds, new lines, tabs, carriage returns, and all manner of other more obscure ASCII characters. Some of this worked, but ultimately we can't get rid of some of the squares.
I'm sure by now you're thinking, "Change the stupid font and be done with it." Well, there are two problems. First of all we're not supposed to even be allowed to see this document, and second we tried that already anyways, so no such luck. The most important detail is we need a dash "-" from this wacky font to be replaced with a dash from a normal font. If we could detect it we'd just replace it, but for everything we've tried we just cannot get the parser to even acknowledge the existance of that square and.
Personally I'm expecting the answer to be, "Wow dude, you're hosed." However, I'm throwing hope to the winds and asking anyway. After reading through a lot of posts here I've been quite impressed with the knowledgeable people here. Hopefully one of you has a solution or some suggestion of something we haven't tried yet.
Thanks for any help!
-S
Re: VBA parsing Word document into Excel with problems
Welcome to the Forums ethyachk.
Where to start? First you know that you can not do any parsing on the square once its in Excel. The
square is just a windows representation of an un-supported Font character. One thing you could do is
to try to save a version of the document using a different document format or character encoding (if its
also not standard) like an rtf document. Then parse that new document and hopefully you will have
all chars.
HTH
Re: VBA parsing Word document into Excel with problems
Thanks for the reply!
I am very aware that the parsing must be done before it hits Excel unfortunately. I'm quite knowledgeable about what I can't do at this point. :) I just tried your convert to RTF suggestion, which I'm kicking myself for not thinking of on my own. On the up side it gets rid of one set of those unrecognized characters! On the down side it also creates a pile of brand new ones. I'm not giving up on your idea yet but the work day is coming to a close. If I do not get the opportunity to try this out thoroughly enough before quittin' time, I will post on Monday with further results.
-S
Re: VBA parsing Word document into Excel with problems
Ok, glad to have helped give you another way to look at it. Also, try different formats too to see which one
gets you the closest so you will do the least amount of fixing up.
Re: VBA parsing Word document into Excel with problems
Well, I've tried RTF, Text, DOS Text, Encoded Text, and HTML. So far, no luck. Each one looks promising until I find something psychotic that the encoding did and turned something (or somethings) into a square. HTML was the closest, but it still was not free of encoding errors. I will have to put this off until Monday, but I will fight with it more then if I have the time.
In case you were wondering, "it can't be done" is a perfectly acceptable answer. If it can't be done we should have the authority to insist on standard fonts.
-S
Re: VBA parsing Word document into Excel with problems
Since the document was created on another system with that special font installed on it. Any other system
may or may not be able to display it correctly. I would insist that you get a requirement for a standard font.
If they change the font again or ??? they are going to say "why doesn't it work now?" and look back to you guys.
HTH
Re: VBA parsing Word document into Excel with problems
I am going to have to agree with you on that one. After all this work, I'm fairly certain that there's no reasonable way to do what would be required for this situation. The problem is someone else breaking standards more than code, so it's time to unleash the power that is my manager! :bigyello:
That's more or less what I was expecting the final answer to be, but you've still managed to give me more understanding of VB! I'll be reading and posting here more. Thanks for all your help!
-S