Results 1 to 12 of 12

Thread: Get the Text inbetween two words (such as HTML Tags) without RegEx

Threaded View

  1. #1

    Thread Starter
    Fanatic Member Vectris's Avatar
    Join Date
    Dec 2008
    Location
    USA
    Posts
    941

    Get the Text inbetween two words (such as HTML Tags) without RegEx

    Since I've seen this asked a lot here is the function for getting all the text in between two other strings (or tags, words, etc.).

    Code:
    Code:
        Private Function GetTagContents(ByVal Source As String, ByVal startTag As String, ByVal endTag As String) As List(Of String)
            Dim StringsFound As New List(Of String)
            Dim Index As Integer = Source.IndexOf(startTag) + startTag.Length
    
            While Index <> startTag.Length - 1
                StringsFound.Add(Source.Substring(Index, Source.IndexOf(endTag, Index) - Index))
                Index = Source.IndexOf(startTag, Index) + startTag.Length
            End While
    
            Return StringsFound
        End Function
    Example Scenario:
    If Source was set to "I {b}love{/b} the word {b}life{/b} don't you?" and you set "{b}" and "{/b}" as the starting and ending tags, the List {"love","life"} would be returned. If the tags don't appear at all in the Source string then the lists count will be 0.

    Explanation:
    The first 2 lines are just variable declarations. Although in the second one we go ahead and search for our first match with:
    Code:
    Source.IndexOf(startTag) + startTag.Length
    As you can see its just a normal IndexOf which gives us the index of the first start tag, however then I put + starTag.Length. The reason for this addition is that we don't want the index of the startag, we want the index of hte text after the startTag, so adding the length of startTag to it's index will give us what comes directly after it.

    Then comes the While Loop. Our condition is:
    Code:
            While Index <> startTag.Length - 1
    As you know, when IndexOf can't find the string, it returns -1. Well we can't just put "While Index <> - 1" because we will always add the startTag.Length onto the IndexOf to get the index of the text in it. So -1 would really be - 1 + startTag.Length, or switch it around to be easier like in the code.

    Then comes the first line of the loop:
    Code:
                StringsFound.Add(Source.Substring(Index, Source.IndexOf(endTag, Index) - Index))
    It starts off with StringsFound.Add, so as you can tell where going to add the string we just found to the list. If no string was found then the loop will never run thanks to its condition. Now, we still don't have the string to add, just it's starting index, so within the Add command were also going to find the rest of the string in between the tags at the same time. We start off with a substring of the Source because we already know the starting index of that string thanks to when we declared Index. Then for the length of the substring, you search for the endTag using IndexOf and then put it's index. Notice that you don't add the endtags length like we did with startTag, this is because the index of endTag is the same index as the very end of the string we need, so we don't need to change it.

    Notice that there is an extra parameter in the IndexOf though, this is because in the future were going to move onto the next set of Tags, we don't want to get the index of the same endTag the whole time! The second parameter is what index to start looking for the endTag, this is easy, we want to start looking for the endtag right after the word starts, so just use Index.

    Then comes the last line of the loop:
    Code:
                Index = Source.IndexOf(startTag, Index) + startTag.Length
    It's nearly the same as the line that we declared Index on, in fact it does the same thing. Can you spot the difference? Yes there is a second parameter for this IndexOf. We do that for the same reason as we did in the first line of the loop, because we don't want to find the same startTag over and over again. So since we know that the index of the very beginning of the string in between comes after the startTag we found, then using that same index will get us the startTag that comes next after it.

    And that's it for the loop. The last line simply returns the List(Of String), StringsFound, as the result of the function.
    Last edited by Vectris; Sep 8th, 2009 at 07:35 PM.
    If your problem is solved, click the Thread Tools button at the top and mark your topic as Resolved!

    If someone helped you out, click the button on their post and leave them a comment to let them know they did a good job

    __________________
    My Vb.Net CodeBank Submissions:
    Microsoft Calculator Clone
    Custom TextBox Restrictions
    Get the Text inbetween HTML Tags (or two words)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width