Results 1 to 12 of 12

Thread: Get the Text inbetween two words (such as HTML Tags) without RegEx

  1. #1

    Thread Starter
    Fanatic Member Vectris's Avatar
    Join Date
    Dec 2008
    Location
    USA
    Posts
    941

    Get the Text inbetween two words (such as HTML Tags) without RegEx

    Since I've seen this asked a lot here is the function for getting all the text in between two other strings (or tags, words, etc.).

    Code:
    Code:
        Private Function GetTagContents(ByVal Source As String, ByVal startTag As String, ByVal endTag As String) As List(Of String)
            Dim StringsFound As New List(Of String)
            Dim Index As Integer = Source.IndexOf(startTag) + startTag.Length
    
            While Index <> startTag.Length - 1
                StringsFound.Add(Source.Substring(Index, Source.IndexOf(endTag, Index) - Index))
                Index = Source.IndexOf(startTag, Index) + startTag.Length
            End While
    
            Return StringsFound
        End Function
    Example Scenario:
    If Source was set to "I {b}love{/b} the word {b}life{/b} don't you?" and you set "{b}" and "{/b}" as the starting and ending tags, the List {"love","life"} would be returned. If the tags don't appear at all in the Source string then the lists count will be 0.

    Explanation:
    The first 2 lines are just variable declarations. Although in the second one we go ahead and search for our first match with:
    Code:
    Source.IndexOf(startTag) + startTag.Length
    As you can see its just a normal IndexOf which gives us the index of the first start tag, however then I put + starTag.Length. The reason for this addition is that we don't want the index of the startag, we want the index of hte text after the startTag, so adding the length of startTag to it's index will give us what comes directly after it.

    Then comes the While Loop. Our condition is:
    Code:
            While Index <> startTag.Length - 1
    As you know, when IndexOf can't find the string, it returns -1. Well we can't just put "While Index <> - 1" because we will always add the startTag.Length onto the IndexOf to get the index of the text in it. So -1 would really be - 1 + startTag.Length, or switch it around to be easier like in the code.

    Then comes the first line of the loop:
    Code:
                StringsFound.Add(Source.Substring(Index, Source.IndexOf(endTag, Index) - Index))
    It starts off with StringsFound.Add, so as you can tell where going to add the string we just found to the list. If no string was found then the loop will never run thanks to its condition. Now, we still don't have the string to add, just it's starting index, so within the Add command were also going to find the rest of the string in between the tags at the same time. We start off with a substring of the Source because we already know the starting index of that string thanks to when we declared Index. Then for the length of the substring, you search for the endTag using IndexOf and then put it's index. Notice that you don't add the endtags length like we did with startTag, this is because the index of endTag is the same index as the very end of the string we need, so we don't need to change it.

    Notice that there is an extra parameter in the IndexOf though, this is because in the future were going to move onto the next set of Tags, we don't want to get the index of the same endTag the whole time! The second parameter is what index to start looking for the endTag, this is easy, we want to start looking for the endtag right after the word starts, so just use Index.

    Then comes the last line of the loop:
    Code:
                Index = Source.IndexOf(startTag, Index) + startTag.Length
    It's nearly the same as the line that we declared Index on, in fact it does the same thing. Can you spot the difference? Yes there is a second parameter for this IndexOf. We do that for the same reason as we did in the first line of the loop, because we don't want to find the same startTag over and over again. So since we know that the index of the very beginning of the string in between comes after the startTag we found, then using that same index will get us the startTag that comes next after it.

    And that's it for the loop. The last line simply returns the List(Of String), StringsFound, as the result of the function.
    Last edited by Vectris; Sep 8th, 2009 at 07:35 PM.
    If your problem is solved, click the Thread Tools button at the top and mark your topic as Resolved!

    If someone helped you out, click the button on their post and leave them a comment to let them know they did a good job

    __________________
    My Vb.Net CodeBank Submissions:
    Microsoft Calculator Clone
    Custom TextBox Restrictions
    Get the Text inbetween HTML Tags (or two words)

  2. #2
    Addicted Member
    Join Date
    Jun 2008
    Location
    Macedonia
    Posts
    188

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    What if have more with tag dog how to take all???

    asdf<lol>rrrrr<dog>ruff</dog>akdje</lol><dog>mouse</dog>

  3. #3

    Thread Starter
    Fanatic Member Vectris's Avatar
    Join Date
    Dec 2008
    Location
    USA
    Posts
    941

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    You could probably modify the code somehow to do that, or learn the RegEx way. If you wanna try this way then look at IndexOf() and it's second parameter with starting indexes.
    If your problem is solved, click the Thread Tools button at the top and mark your topic as Resolved!

    If someone helped you out, click the button on their post and leave them a comment to let them know they did a good job

    __________________
    My Vb.Net CodeBank Submissions:
    Microsoft Calculator Clone
    Custom TextBox Restrictions
    Get the Text inbetween HTML Tags (or two words)

  4. #4
    Addicted Member
    Join Date
    Jun 2008
    Location
    Macedonia
    Posts
    188

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    I use this is work for me is not the best code but that all that have in my mind and sry for english.

    Vb.net Code:
    1. Private Function GetTagContents(ByVal Source As String, ByVal startTag As String, ByVal endTag As String) As String
    2.  
    3. Dim firstIndex As Integer = Source.IndexOf(startTag) + startTag.Length
    4. Dim text As String = txtString.Text
    5.  
    6.         txtString.Text = text.Remove(0, Source.IndexOf(startTag) + startTag.Length + (Source.Substring(firstIndex, Source.IndexOf(endTag) - firstIndex)).Length + endTag.Length)
    7.  
    8. Return Source.Substring(firstIndex, Source.IndexOf(endTag) - firstIndex)
    9.  
    10. End Function
    11.  
    12. Private Sub btnPokaziRez_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnPokaziRez.Click
    13.  
    14. Dim s As String = txtString.Text
    15. Dim i As Integer = s.IndexOf(txtPocTag.Text)
    16.  
    17. Do While (i <> -1)
    18. ListBox1.Items.Add(GetTagContents(txtString.Text, txtPocTag.Text, txtZavTag.Text))
    19.             i = s.IndexOf(txtPocTag.Text, i + 1)
    20.         Loop
    21. End Sub

  5. #5

    Thread Starter
    Fanatic Member Vectris's Avatar
    Join Date
    Dec 2008
    Location
    USA
    Posts
    941

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    Ok I updated the code so that it will find all the text in between tags, not just the first result. It will now return a List(Of String) with all the text in between tags that it finds.

    Don't forget to check the .Count in case no results are found.
    If your problem is solved, click the Thread Tools button at the top and mark your topic as Resolved!

    If someone helped you out, click the button on their post and leave them a comment to let them know they did a good job

    __________________
    My Vb.Net CodeBank Submissions:
    Microsoft Calculator Clone
    Custom TextBox Restrictions
    Get the Text inbetween HTML Tags (or two words)

  6. #6
    Addicted Member
    Join Date
    Jun 2008
    Location
    Macedonia
    Posts
    188

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    Yes this is good code then my code tnx anyway.

  7. #7
    New Member CASchryver's Avatar
    Join Date
    Jul 2009
    Location
    PA
    Posts
    9

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    Sweet. Was just thinking about figuring out how to do this about an hour ago and then I stumble upon this when I wasn't even looking for it. Thanks for the code!

  8. #8
    Hyperactive Member csKanna's Avatar
    Join Date
    Dec 2005
    Location
    Tech-Tips-Now.com
    Posts
    339

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    please ignore.

    wrong thread
    Last edited by csKanna; Aug 22nd, 2009 at 02:25 AM.
    Kanna

  9. #9
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    When looking for the endTag, shouldn't you use Index + 1 as the start index in the IndexOf function? When the tags are different it won't be a problem, but if the start and end tags are the same it keeps on finding the same start tag over and over, right? Didn't try it, but that's what I thought lol.

  10. #10
    Lively Member Blupig's Avatar
    Join Date
    Apr 2008
    Posts
    118

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    A friend of mine actually created something called GetBetweenAll, which adds a series of parsed strings to an array (for example, if you wanted to grab a bunch of items that were enveloped in the same tags within a table or something).

    I don't think he'd mind if I posted it here (it's a little bit inefficient in terms of adding the items to a list, but that's easily modifiable):

    Vb.net Code:
    1. Public Sub GBA(ByRef strSource As String, ByRef strStart As String, ByRef strEnd As String, _
    2.                          ByVal lstAdd As ListBox, Optional ByRef startPos As Integer = 0)
    3.         Dim iPos As Integer, iEnd As Integer, strResult As String, lenStart As Integer = strStart.Length
    4.  
    5.         Do Until iPos = -1
    6.             strResult = String.Empty
    7.             iPos = strSource.IndexOf(strStart, startPos)
    8.             iEnd = strSource.IndexOf(strEnd, iPos + lenStart)
    9.             If iPos <> -1 AndAlso iEnd <> -1 Then
    10.                 strResult = strSource.Substring(iPos + lenStart, iEnd - (iPos + lenStart))
    11.                 lstAdd.Items.Add(strResult)
    12.                 startPos = iPos + lenStart
    13.             End If
    14.         Loop
    15.  
    16.     End Function

  11. #11

    Thread Starter
    Fanatic Member Vectris's Avatar
    Join Date
    Dec 2008
    Location
    USA
    Posts
    941

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    @blupig
    So you mean it basically does the same thing as my code? I'd rather you post your on topic for it then.

    Mines less lines so I don't think it's as useful, that is if they do the same thing. Props to your friend for writing it, but I'd rather you make a topic for it.

    @Nick
    I'll look at that and test it out. There are several other things that could cause problems such as an unbalanced amount of start to stop tags. I sort of assume that the user of this code is going to be using it in a good-tag environment where the tag numbers would match and the start and end would be different. Still I look at it later and post back with what I find.
    Last edited by Vectris; Sep 8th, 2009 at 07:33 PM.
    If your problem is solved, click the Thread Tools button at the top and mark your topic as Resolved!

    If someone helped you out, click the button on their post and leave them a comment to let them know they did a good job

    __________________
    My Vb.Net CodeBank Submissions:
    Microsoft Calculator Clone
    Custom TextBox Restrictions
    Get the Text inbetween HTML Tags (or two words)

  12. #12
    New Member
    Join Date
    Sep 2009
    Posts
    6

    Re: Get the Text inbetween two words (such as HTML Tags) without RegEx

    @Vectris, any example how to apply this code especially the usage. I cannot figure it out right now how to apply this, I am scraping a web page contents.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width