Results 1 to 2 of 2

Thread: [RegEx][Preg] Find text between two tags

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2008
    Posts
    1,754

    Question [RegEx][Preg] Find text between two tags

    I need a RegEx expression to find the text between two XML tags.

    I have tried the following expressions:

    Code:
    /<w:p[^Pr][^>]*>(([^(<w:p>)])|([^(<w:p>)]))*<\/w:p>/
    /<w:p[^Pr][^>]*>[^<\/w:p>]*<\/w:p>/
    The issue is that the entire XML file is being 'matched' which is why I need something in between the tags to stop the RegEx from going past the first </w:p> it encounters.

    An example of the XML file is as follows:

    Life is Good.
    <w:p>
    <w:pPr>
    <w:b />
    Random text here.
    </w:pPr>
    Okay...
    </w:p>
    More text.
    Yo..
    <w:p src=".." href=".." style="..">Yoyo</w:p>
    <w:p src=".." href=".." style=".."/>Yoyo2</w:p>
    What I want to match is the pairs of w:p tags. In the above example there should be 3 matches. The output I want is the following:

    Match 1:

    <w:p>
    <w:pPr>
    <w:b />
    Random text here.
    </w:pPr>
    Okay...
    </w:p>

    Match 2:

    <w:p src=".." href=".." style="..">Yoyo</w:p>

    Match 3:

    <w:p src=".." href=".." style=".."/>Yoyo2</w:p>
    Can anyone offer some guidance/advice?

  2. #2

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2008
    Posts
    1,754

    Re: [RegEx][Preg] Find text between two tags

    I've improved the expression to this:

    Code:
    <\s*(w:p)(\s+[^>]*>)|(\s*>)([^[<\s*\/\s*(w:p)\s*>]]*)<\s*\/\s*(w:p)\s*>
    This expression is giving me the full w:p tag and closing tag i.e. <w:p id=23 class="..." etc="ugotthepoint"> in one match and </w:p> in another match. I am not however getting the content in between the two tags. The content is showing up in my match as "" (null). This is with preg (in PHP).

    EDIT: Just for clarification the (\s+[^>]*>)|(\s*>) prevents tags similiar to w:p from being matched such as w:pPr.

    EDIT: I don't think anyone is actually reading this but I've modified my expression to this:

    Code:
    #<\s*(w:p)\s*[^(\/>)]*>([^(<\/\1>)]*)<\s*\/\1\s*>#is
    Last edited by noahssite; Oct 10th, 2011 at 10:40 AM. Reason: added

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width