[RESOLVED] Replace/Regex.Replace?
I have a large badly formatted wordlist...
Quote:
ability (n) able
(adj) • be able to
about (adv & prep)
• about 500 students (adv) •
The film is about a small
boy.
(prep) above (adj,
adv & prep) abroad
(adv) absent (adj)
absolutely (adv)
• The movie was absolutely
awful. accent(n)
• She has a beautiful
French accent. accept (v) access
(n)
• disabled access •
internet access accident
(n) accommodation (n)
accompany (v) according to
(prep phr) account (n)
accountant (n) accurate (adj)
ache (n) achieve (v) across (adv
& prep) act (n & v)
• in the second act (of the
play)
(n)
• to act in a play (v)
• to act strangely (v) action
(n) active (adj)
actor (n) actress
(n) actually
(adv)
I need to format that so each (dictionary) word is on one line, each line ending with (n) or (v) etc.
There are bullet point words or sentences in there that i want to completely remove, and the result would be...
Quote:
ability (n)
able (adj)
about (adv & prep)
above (adj, adv & prep)
abroad (adv)
absent (adj)
absolutely (adv)
accent (n)
accept (v)
access (n)
accident (n)
accommodation (n)
accompany (v)
account (n)
accountant (n)
accurate (adj)
ache (n)
achieve (v)
across (adv & prep)
act (n & v)
action (n)
active (adj)
actor (n)
actress (n)
actually (adv)
Can anyone help with a regex pattern or something which can do that?
Re: Replace/Regex.Replace?
Ouch!
Not going to be easy...
i see some issues
Quote:
ability (n) able
(adj) • be able to
about (adv & prep)
• about 500 students (adv) •
The film is about a small
boy.
(prep) above (adj,
adv & prep) abroad
(adv) absent (adj)
absolutely (adv)
• The movie was absolutely
awful. accent(n)
• She has a beautiful
French accent. accept (v) access
(n)
• disabled access •
internet access accident
(n) accommodation (n)
You want to remove "• be able to " --> OK
But you also want to remove "• about 500 students (adv) • " (INCL: "students (adv)"!!)
and this: "The film is about a small
boy.
(prep)" until here, but keep "above (adj,"
You want to remove "• The movie was absolutely
awful." until here, but keep " accent(n) "
You want to remove "• She has a beautiful
French accent." until here, but keep "accept (v)"
and so on.... just at first look
Re: Replace/Regex.Replace?
Yeah it’s tricky. I’m trying a solution based on splitting it word by word, then weeding out the parts I don’t want. I have the complete file as a properly formatted pdf and I can also convert it to a properly formatted docx. The problem is that the words are arranged in columns.
Re: Replace/Regex.Replace?
I might try to copy and paste into excel. If it splits it into the fields I want, I think I can work with that…
Re: [RESOLVED] Replace/Regex.Replace?
I tried pasting the tabulated data into Excel, and refining the list with VBA. So far, it hasn’t been too bad, I’d prefer a 10 second algorithm, but this text is particularly nasty with all of those bullet points, and truncations where they’re hard to fix… I’ll mark this resolved.