Yeah it’s tricky. I’m trying a solution based on splitting it word by word, then weeding out the parts I don’t want. I have the complete file as a properly formatted pdf and I can also convert it to a properly formatted docx. The problem is that the words are arranged in columns.