Results 1 to 27 of 27

Thread: spliting strings

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92

    spliting strings

    I need to split html code. I have the code as a string , i need to get all information between "text4" and "text4" There are several strings to extract so i need to go through the complete page. How do I do this?

  2. #2
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    I'm sure something incredibly similar to this has been posted before, have a look

    If you can't find it, i'll try to help

  3. #3

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92

    I have been looking

    I havent come up with anything yet. I am trying this code but im not sure how to use it properly.

    Private Sub Command2_Click()
    stemp = text1.Text


    tmp = Mid$(stemp, InStr(stemp, "text4") + 1)
    box1.Text = Left(tmp, InStr(tmp, "text4") - 1)


    End Sub

  4. #4
    INXSIVE Bruce Fox's Avatar
    Join Date
    Sep 2001
    Location
    Melbourne, Australia
    Posts
    7,429
    Post an extract of the Text your trying to parse.

  5. #5

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92

    here is an extract of the text

    I need to get the "NEEDED INFO"out.

    All of the information I need is between "text4" for each record , Then each field needed with in the record starts with "Text5"




    <td width="20" valign="top"><span class="text4">5.
    </span></td><td valign="top"><span class="text5"><b>NEEDED INFO(COMPANY)</b></span></td><td><img src="../images/pixel.gif" width="1" height="1"></td><td align="right" rowspan="3">
    <table border="0" cellpadding="0" cellspacing="0"><tr><td><A href="map.asp?SEARCH_AFFILIATE_DATA.x=0&amp;DBAFFILIATEID=%7B3EA51E13%2DC923%2D4FA5%2DBCEB%2DF693C62 3D24F%7D&amp;INNERCODE=072595A101A&amp;language=En" title="Street Level Map"><img width="25" height="22" border="0" src="../images/directorymap.gif" alt="Street Level Map"></A></td><td><A href="iti.asp?Search_iti.x=0&amp;DBAFFILIATEID=%7B3EA51E13%2DC923%2D4FA5%2DBCEB%2DF693C623D24F%7D&am p;INNERCODE=072595A101A&amp;ITI_START_COUNTRYCODE=US&amp;ITI_START_CITYNAME=COLUMBUS&amp;ITI_START_Z IPCODE=39702&amp;ITI_START_STATE=MS&amp;ITI_START_ADDRESS=&amp;SESSIONID={1F5033B2-C668-43E4-A89B-725ADC1FD992}"><img width="25" height="22" border="0" src="../images/directorydriveme.gif"></A></td></tr></table></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="2"><span class="text5">NEEDED INFO(ADDRESS1)<br>NEEDED INFO(ADDRESS2)</span></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="2"><span class="text5">NEEDED INFO (CITY<STATE<ZIP) </span></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="3"><span class="text5">
    NEEDEDINFO (PHONE AND FAX)</span></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="3"><span class="text5"><A target="_blank" href="http://www.caseih.com/DEALERS/johnsonimp">http://www.caseih.com/DEALERS/johnsonimp</A></span></td></tr><tr><td colspan="4"><img src="../images/pixel.gif" width="1" height="10"></td></tr></table></td><td><img src="../images/pixel.gif" width="8" height="10"></td><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td></tr><tr><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td><td colspan="3" class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td></tr><tr><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td><td><img src="../images/pixel.gif" width="8" height="10"></td><td valign="top"><table width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td width="20" valign="top"><span class="text4">6.

  6. #6
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629
    So what should be the final output? You need to eliminate some tags and the info nested in them?

  7. #7
    New Member
    Join Date
    Sep 2003
    Location
    Berkhamsted, England
    Posts
    4
    So you want almost all of the html, starting with ">5. and ending with ">6. ?

    That doesn't seem right to me...

  8. #8

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92
    If you will notice all of the needed info is between "text4" and "text4"

    this is a contact record so between the text4 and text 4 are several items that start with text5. I need to pull the name ,address , city ,state ,zip , email and website . All which start with Text5 (with in Text 4.)

  9. #9
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629
    I have this code lying around... what it does is remove all HTML tags leaving the text.

    VB Code:
    1. Private Sub Command1_Click()
    2.  
    3.    Dim strWorking As String
    4.    Dim strOutput As String
    5.    Dim lngPosLessThan As Long
    6.    
    7.    strWorking = Text1.Text
    8.    Do While Len(strWorking) > 0
    9.       If Left(strWorking, 1) = "<" Then
    10.          strWorking = Mid(strWorking, InStr(1, strWorking, ">") + 1)
    11.       Else
    12.          lngPosLessThan = InStr(1, strWorking, "<")
    13.          If lngPosLessThan > 0 Then
    14.             'Move non-Tag string to strOutput
    15.             strOutput = strOutput & Left(strWorking, lngPosLessThan - 1)
    16.             strWorking = Mid(strWorking, InStr(1, strWorking, "<"))
    17.          Else   'no other tag in string.
    18.             strOutput = strOutput & strWorking
    19.             strWorking = ""
    20.          End If
    21.       End If
    22.    Loop
    23.    Text2.Text = strOutput
    24. End Sub

    In case your willing to enhance it to fit your needs.

  10. #10
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    This was my method for stripping HTML tags, may also be of some use

    VB Code:
    1. Function RemoveHTML(strHTML As String) As String
    2. Do
    3.     tagOpen = InStr(1, strHTML, "<")
    4.     tagClose = InStr(1, strHTML, ">")
    5.     strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
    6.     Debug.Print strHTML
    7. Loop Until InStr(1, strHTML, "<") = 0
    8. RemoveHTML = strHTML
    9. End Function

    If it doesn't help at all, I'll write some custom code for you ;/

  11. #11
    Hyperactive Member
    Join Date
    Nov 2002
    Location
    india
    Posts
    418
    hi,

    if each record is separated by text4 then

    use this

    split(entiretext,"text4")

  12. #12
    Frenzied Member Spajeoly's Avatar
    Join Date
    Mar 2003
    Location
    Utah
    Posts
    1,068
    Originally posted by da_silvy
    This was my method for stripping HTML tags, may also be of some use

    VB Code:
    1. Function RemoveHTML(strHTML As String) As String
    2. Do
    3.     tagOpen = InStr(1, strHTML, "<")
    4.     tagClose = InStr(1, strHTML, ">")
    5.     strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
    6.     Debug.Print strHTML
    7. Loop Until InStr(1, strHTML, "<") = 0
    8. RemoveHTML = strHTML
    9. End Function

    If it doesn't help at all, I'll write some custom code for you ;/
    Can I make a small suggestion?

    VB Code:
    1. Function RemoveHTML(strHTML As String) As String
    2. Do
    3.     tagOpen = InStr(1, strHTML, "<")
    4.     tagClose = InStr(tagOpen, strHTML, ">")
    5.     strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
    6.     Debug.Print strHTML
    7. Loop Until InStr(1, strHTML, "<") = 0
    8. RemoveHTML = strHTML
    9. End Function

    Would that tiny change not help it not find a stray '>' that might be before the tagOpen?

    Just a thought

  13. #13
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    Far from perfect but it should do what you want, you'll need to tweak it yourself

    VB Code:
    1. Private Sub Form_Load()
    2.     Dim StartPos As Integer, EndPos As Integer
    3.     Dim SearchText As String, IsRecord As Boolean
    4.     Dim strText As String
    5.     StartPos = 1
    6.    
    7.     SearchText = Text1
    8.  
    9.     SearchText = Replace(SearchText, vbCrLf, " ")
    10.     ' Clean up double spaces ?
    11.     While InStr(1, SearchText, "  ")
    12.         SearchText = Replace(SearchText, "  ", " ")
    13.     Wend
    14.  
    15.     While InStr(StartPos, SearchText, "<span class=""") <> 0
    16.         StartPos = InStr(StartPos, SearchText, "<span class=""")
    17.         EndPos = InStr(StartPos, SearchText, "</span>")
    18.        
    19.         If Mid(SearchText, StartPos + Len("<span class="""), Len("text#")) = "text4" Then
    20.             IsRecord = True
    21.         Else
    22.             IsRecord = False
    23.         End If
    24.        
    25.         StartPos = StartPos + Len("<span class=""text#"">")
    26.         'strText = Mid(SearchText, StartPos, EndPos - StartPos)
    27.         strText = RemoveHTML(Trim(Mid(SearchText, StartPos, EndPos - StartPos)))
    28.        
    29.         ' Handle your stuff here
    30.         If IsRecord Then
    31.             List1.AddItem strText
    32.         Else
    33.             List1.AddItem vbTab & strText
    34.         End If
    35.        
    36.         StartPos = EndPos
    37.     Wend
    38.  
    39.     Text1 = SearchText
    40.  
    41. End Sub
    42.  
    43.  
    44. Function RemoveHTML(strHTML As String) As String
    45. Do
    46.     If InStr(1, strHTML, "<") = 0 Or InStr(1, strHTML, ">") = 0 Then Exit Do
    47.     tagOpen = InStr(1, strHTML, "<")
    48.     tagClose = InStr(1, strHTML, ">")
    49.     strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
    50.     Debug.Print strHTML
    51. Loop Until InStr(1, strHTML, "<") = 0
    52. RemoveHTML = strHTML
    53. End Function

  14. #14
    Frenzied Member yrwyddfa's Avatar
    Join Date
    Aug 2001
    Location
    England
    Posts
    1,253
    Well-formed HTML is actually a subset of XML. This means you should be able to use an XML parser to analysis the HTML file.

    (saves you writing awkward code)

  15. #15
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    Originally posted by Spajeoly
    Can I make a small suggestion?

    VB Code:
    1. Function RemoveHTML(strHTML As String) As String
    2. Do
    3.     tagOpen = InStr(1, strHTML, "<")
    4.     tagClose = InStr(tagOpen, strHTML, ">")
    5.     strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
    6.     Debug.Print strHTML
    7. Loop Until InStr(1, strHTML, "<") = 0
    8. RemoveHTML = strHTML
    9. End Function

    Would that tiny change not help it not find a stray '>' that might be before the tagOpen?

    Just a thought
    heh, it was a 5 minute function reply to a codebank, ideally it shouldn't actually happen, any actual ">"'s and "<"'s would be html characters i.e. &gt; etc

    it's a tighter piece of code with that small adjustment though, gw

    the code isn't perfect, it fails if there's no < or > in there either, for which there's a small fix in my last post ;p

  16. #16
    Fanatic Member JPicasso's Avatar
    Join Date
    Aug 2001
    Location
    Kalamazoo, MI
    Posts
    843
    Originally posted by jayakumar
    hi,

    if each record is separated by text4 then

    use this

    split(entiretext,"text4")

    This way is easiest.
    Merry Christmas

  17. #17
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    It doesn't actually do what he wants, if you check the rest of the posts.

  18. #18
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    I couldn't send this via PM so here it is

    VB Code:
    1. Dim StartPos As Integer, EndPos As Integer, BRPos As Integer
    2.     Dim SearchText As String, IsRecord As Boolean, strBR() As String
    3.     Dim FirstField As Boolean, DealerCount As Integer, DealerPrefix As String
    4.     Dim strText As String
    5.    
    6.     Dim strInput As String
    7.     Open "c:\caseih.txt" For Input As #1
    8.  
    9.    
    10.     Do Until EOF(1)
    11.         Input #1, strInput
    12.         SearchText = SearchText & strInput & vbCrLf
    13.     Loop
    14.    
    15.     Close #1
    16.  
    17.     StartPos = 1
    18.     DealerCount = 0
    19.  
    20.    
    21.     SearchText = Mid(SearchText, InStr(1, SearchText, "../images/powered.gif"))
    22.     SearchText = Replace(SearchText, vbCrLf, "")
    23.    
    24.     While InStr(StartPos, SearchText, "<span class=""") <> 0
    25.         StartPos = InStr(StartPos, SearchText, "<span class=""")
    26.         EndPos = InStr(StartPos, SearchText, "</span>")
    27.        
    28.         FirstField = False
    29.         If IsRecord Then FirstField = True
    30.         If Mid(SearchText, StartPos + Len("<span class="""), Len("text#")) = "text4" Then
    31.             IsRecord = True
    32.         Else
    33.             IsRecord = False
    34.         End If
    35.        
    36.         StartPos = StartPos + Len("<span class=""text#"">")
    37.         strText = Mid(SearchText, StartPos, EndPos - StartPos)
    38.         'strText = RemoveHTML(Trim(Mid(SearchText, StartPos, EndPos - StartPos)))
    39.        
    40.         strText = Replace(strText, "<b>", "")
    41.         strText = Replace(strText, "</b>", "")
    42.        
    43.         ' Handle your stuff here
    44.         If Not IsRecord Then
    45.             DealerPrefix = ""
    46.             If FirstField Then
    47.                 DealerCount = DealerCount + 1
    48.                 DealerPrefix = CStr(DealerCount) & "."
    49.             End If
    50.            
    51.             If InStr(1, LCase(strText), "<a ") = 0 Then
    52.                 BRPos = InStr(1, strText, "<br>")
    53.                 If BRPos <> 0 Then
    54.                     strBR = Split(strText, "<br>")
    55.                     For i = 0 To UBound(strBR)
    56.                         If Len(Trim(strBR(i))) <> 0 Then
    57.                             List1.AddItem DealerPrefix & vbTab & Trim(strBR(i))
    58.                             DealerPrefix = ""
    59.                         End If
    60.                     Next
    61.                 Else
    62.                     List1.AddItem DealerPrefix & vbTab & strText
    63.                 End If
    64.             End If
    65.         End If
    66.         StartPos = EndPos
    67.     Wend

    I saved the website to a text file then used that, it parses it all, you just need to decide what you want to do with the information.

  19. #19

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92
    Hey
    looks like it will work but on this line i get an error.
    PHP Code:
    SearchText Mid(SearchTextInStr(1SearchText"../images/powered.gif")) 

    The error says Invalid prcedure call or argument. Runtime error 5

  20. #20
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    I was parsing the entire website ;o

    There are occurences of spans using that class before the dealer information

    That "powered.gif" is the powered by map blah blah, and the dealers follow...

    You can just remove that line, the code's no different (really) from what I posted before.

  21. #21

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92

    I guess im still doing something wrong.

    I guess i am still doing something wrong.

    Program goes from
    PHP Code:
        While InStr(StartPosSearchText"<span class=""") <> 
    to
    PHP Code:
    End Sub 

    here is complete code .

    PHP Code:


    Private Sub Command1_Click()
    Dim StartPos As IntegerEndPos As IntegerBRPos As Integer
        Dim SearchText 
    As StringIsRecord As BooleanstrBR() As String
        Dim FirstField 
    As BooleanDealerCount As IntegerDealerPrefix As String
        Dim strText 
    As String
            Dim strInput 
    As String
        Open 
    "c:\caseih.txt" For Input As #1

        
        
    Do Until EOF(1)
            
    Input #1, strInput
            
    SearchText SearchText strInput vbCrLf
        Loop
            Close 
    #1

        
    StartPos 1
        DealerCount 
    0

        
        
    'SearchText = Mid(SearchText, InStr(1, SearchText, "../images/powered.gif"))
        SearchText = Replace(SearchText, vbCrLf, "")
        
        While InStr(StartPos, SearchText, "<span class=""") <> 0
            StartPos = InStr(StartPos, SearchText, "<span class=""")
            EndPos = InStr(StartPos, SearchText, "</span>")
            
            FirstField = False
            If IsRecord Then FirstField = True
            If Mid(SearchText, StartPos + Len("<span class="""), Len("text#")) = "text4" Then
                IsRecord = True
            Else
                IsRecord = False
            End If
                    StartPos = StartPos + Len("<span class=""text#"">")
            strText = Mid(SearchText, StartPos, EndPos - StartPos)
            '
    strText RemoveHTML(Trim(Mid(SearchTextStartPosEndPos StartPos)))
                    
    strText Replace(strText"<b>""")
            
    strText Replace(strText"</b>""")
            
            
    ' Handle your stuff here
            If Not IsRecord Then
                DealerPrefix = ""
                If FirstField Then
                    DealerCount = DealerCount + 1
                    DealerPrefix = CStr(DealerCount) & "."
                End If
                            If InStr(1, LCase(strText), "<a ") = 0 Then
                    BRPos = InStr(1, strText, "<br>")
                    If BRPos <> 0 Then
                        strBR = Split(strText, "<br>")
                        For i = 0 To UBound(strBR)
                            If Len(Trim(strBR(i))) <> 0 Then
                                List1.AddItem DealerPrefix & vbTab & Trim(strBR(i))
                                DealerPrefix = ""
                            End If
                        Next
                    Else
                        List1.AddItem DealerPrefix & vbTab & strText
                    End If
                End If
            End If
            StartPos = EndPos
        Wend

    End Sub 

  22. #22
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    have you set the search text or anything?

    you can't use my code "as is", you have to do some work ._.

  23. #23

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92

    searchtext

    what do you mean by setting the searchtext, it is the string of HTML , what do you mean.
    im sorry but i dont understand what to do.

    thanks for your help.

  24. #24
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    well you have to adjust the code for what you are doing

    where are you getting the text which you want to parse from?

  25. #25
    Fanatic Member JPicasso's Avatar
    Join Date
    Aug 2001
    Location
    Kalamazoo, MI
    Posts
    843
    Split([entireHTMLstring],"text4") gives you an array containing
    the text between all the text4's.

    if you then run a Split(text4array(1,2,3,4,etc),"text5") it will give you the text between all the text5's.

    course, then you gotta clean it up, cause I don't think you want half-tags in it.

    use the replace function.
    Merry Christmas

  26. #26

    Thread Starter
    Lively Member
    Join Date
    Feb 2003
    Posts
    92

    Here is the text i am trying to parse.

    da_silvy,
    here is the text i am trying to parse.

    Any help is appreciated.
    Attached Files Attached Files

  27. #27
    Conquistador
    Join Date
    Dec 1999
    Location
    Australia
    Posts
    4,527
    That website's a joke, it's not html transitional at all.

    VB Code:
    1. Dim StartPos As Double, EndPos As Double, BRPos As Double
    2.     Dim SearchText As String, IsRecord As Boolean, strBR() As String
    3.     Dim FirstField As Boolean, DealerCount As Integer, DealerPrefix As String
    4.     Dim strText As String
    5.    
    6.     Dim strInput As String
    7.     Open "c:\caseih.htm" For Input As #1
    8.  
    9.    
    10.     Do Until EOF(1)
    11.         Input #1, strInput
    12.         SearchText = SearchText & strInput & vbCrLf
    13.     Loop
    14.    
    15.     Close #1
    16.  
    17.     StartPos = 1
    18.     DealerCount = 0
    19.  
    20.    
    21.     SearchText = Mid(SearchText, InStr(1, SearchText, "/powered.gif"))
    22.     SearchText = Replace(SearchText, vbCrLf, "")
    23.    
    24.     While InStr(StartPos, LCase(SearchText), "<span class=") <> 0
    25.         StartPos = InStr(StartPos, LCase(SearchText), "<span class=")
    26.         EndPos = InStr(StartPos, LCase(SearchText), "</span>")
    27.        
    28.         FirstField = False
    29.         If IsRecord Then FirstField = True
    30.         If Mid(LCase(SearchText), StartPos + Len("<span class="), Len("text#")) = "text4" Then
    31.             IsRecord = True
    32.         Else
    33.             IsRecord = False
    34.         End If
    35.        
    36.         StartPos = StartPos + Len("<span class=text#>")
    37.         strText = Mid(SearchText, StartPos, EndPos - StartPos)
    38.         'strText = RemoveHTML(Trim(Mid(SearchText, StartPos, EndPos - StartPos)))
    39.        
    40.         strText = Replace(strText, "<b>", "")
    41.         strText = Replace(strText, "</b>", "")
    42.        
    43.         ' Handle your stuff here
    44.         If Not IsRecord Then
    45.             DealerPrefix = ""
    46.             If FirstField Then
    47.                 DealerCount = DealerCount + 1
    48.                 DealerPrefix = CStr(DealerCount) & "."
    49.             End If
    50.            
    51.             If InStr(1, LCase(strText), "<a ") = 0 Then
    52.                 BRPos = InStr(1, strText, "<br>")
    53.                 If BRPos <> 0 Then
    54.                     strBR = Split(strText, "<br>")
    55.                     For i = 0 To UBound(strBR)
    56.                         If Len(Trim(strBR(i))) <> 0 Then
    57.                             List1.AddItem DealerPrefix & vbTab & Trim(strBR(i))
    58.                             DealerPrefix = ""
    59.                         End If
    60.                     Next
    61.                 Else
    62.                     List1.AddItem DealerPrefix & vbTab & strText
    63.                 End If
    64.             End If
    65.         End If
    66.         StartPos = EndPos
    67.     Wend

    You need to fix up the clean up yourself.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width