Results 1 to 5 of 5

Thread: [RESOLVED] Replace/Regex.Replace?

  1. #1

    Thread Starter
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,978

    Resolved [RESOLVED] Replace/Regex.Replace?

    I have a large badly formatted wordlist...

    ability (n) able
    (adj) • be able to
    about (adv & prep)
    • about 500 students (adv) •
    The film is about a small
    boy.
    (prep) above (adj,
    adv & prep) abroad
    (adv) absent (adj)
    absolutely (adv)
    • The movie was absolutely
    awful. accent(n)
    • She has a beautiful
    French accent. accept (v) access
    (n)
    • disabled access •
    internet access accident
    (n) accommodation (n)
    accompany (v) according to
    (prep phr) account (n)
    accountant (n) accurate (adj)
    ache (n) achieve (v) across (adv
    & prep) act (n & v)
    • in the second act (of the
    play)
    (n)
    • to act in a play (v)
    • to act strangely (v) action
    (n) active (adj)
    actor (n) actress
    (n) actually
    (adv)
    I need to format that so each (dictionary) word is on one line, each line ending with (n) or (v) etc.
    There are bullet point words or sentences in there that i want to completely remove, and the result would be...

    ability (n)
    able (adj)
    about (adv & prep)
    above (adj, adv & prep)
    abroad (adv)
    absent (adj)
    absolutely (adv)
    accent (n)
    accept (v)
    access (n)
    accident (n)
    accommodation (n)
    accompany (v)
    account (n)
    accountant (n)
    accurate (adj)
    ache (n)
    achieve (v)
    across (adv & prep)
    act (n & v)
    action (n)
    active (adj)
    actor (n)
    actress (n)
    actually (adv)
    Can anyone help with a regex pattern or something which can do that?

  2. #2
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    5,010

    Re: Replace/Regex.Replace?

    Ouch!
    Not going to be easy...
    i see some issues

    ability (n) able
    (adj) • be able to
    about (adv & prep)
    • about 500 students (adv) •
    The film is about a small
    boy.
    (prep) above (adj,
    adv & prep) abroad
    (adv) absent (adj)
    absolutely (adv)
    • The movie was absolutely
    awful. accent(n)
    • She has a beautiful
    French accent. accept (v) access
    (n)
    • disabled access •
    internet access accident
    (n) accommodation (n)
    You want to remove "• be able to " --> OK

    But you also want to remove "• about 500 students (adv) • " (INCL: "students (adv)"!!)
    and this: "The film is about a small
    boy.
    (prep)" until here, but keep "above (adj,"

    You want to remove "• The movie was absolutely
    awful." until here, but keep " accent(n) "

    You want to remove "• She has a beautiful
    French accent." until here, but keep "accept (v)"

    and so on.... just at first look
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  3. #3

    Thread Starter
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,978

    Re: Replace/Regex.Replace?

    Yeah it’s tricky. I’m trying a solution based on splitting it word by word, then weeding out the parts I don’t want. I have the complete file as a properly formatted pdf and I can also convert it to a properly formatted docx. The problem is that the words are arranged in columns.

  4. #4

    Thread Starter
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,978

    Re: Replace/Regex.Replace?

    I might try to copy and paste into excel. If it splits it into the fields I want, I think I can work with that…

  5. #5

    Thread Starter
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,978

    Re: [RESOLVED] Replace/Regex.Replace?

    I tried pasting the tabulated data into Excel, and refining the list with VBA. So far, it hasn’t been too bad, I’d prefer a 10 second algorithm, but this text is particularly nasty with all of those bullet points, and truncations where they’re hard to fix… I’ll mark this resolved.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width