Sep 17th, 2003, 08:17 PM
#1
Thread Starter
Lively Member
spliting strings
I need to split html code. I have the code as a string , i need to get all information between "text4" and "text4" There are several strings to extract so i need to go through the complete page. How do I do this?
Sep 17th, 2003, 08:38 PM
#2
Conquistador
I'm sure something incredibly similar to this has been posted before, have a look
If you can't find it, i'll try to help
Sep 17th, 2003, 08:48 PM
#3
Thread Starter
Lively Member
I have been looking
I havent come up with anything yet. I am trying this code but im not sure how to use it properly.
Private Sub Command2_Click()
stemp = text1.Text
tmp = Mid$(stemp, InStr(stemp, "text4") + 1)
box1.Text = Left(tmp, InStr(tmp, "text4") - 1)
End Sub
Sep 17th, 2003, 09:07 PM
#4
Post an extract of the Text your trying to parse.
Sep 18th, 2003, 06:41 AM
#5
Thread Starter
Lively Member
here is an extract of the text
I need to get the "NEEDED INFO"out.
All of the information I need is between "text4" for each record , Then each field needed with in the record starts with "Text5"
<td width="20" valign="top"><span class="text4">5.
</span></td><td valign="top"><span class="text5"><b>NEEDED INFO(COMPANY)</b></span></td><td><img src="../images/pixel.gif" width="1" height="1"></td><td align="right" rowspan="3">
<table border="0" cellpadding="0" cellspacing="0"><tr><td><A href="map.asp?SEARCH_AFFILIATE_DATA.x=0&DBAFFILIATEID=%7B3EA51E13%2DC923%2D4FA5%2DBCEB%2DF693C62 3D24F%7D&INNERCODE=072595A101A&language=En" title="Street Level Map"><img width="25" height="22" border="0" src="../images/directorymap.gif" alt="Street Level Map"></A></td><td><A href="iti.asp?Search_iti.x=0&DBAFFILIATEID=%7B3EA51E13%2DC923%2D4FA5%2DBCEB%2DF693C623D24F%7D&am p;INNERCODE=072595A101A&ITI_START_COUNTRYCODE=US&ITI_START_CITYNAME=COLUMBUS&ITI_START_Z IPCODE=39702&ITI_START_STATE=MS&ITI_START_ADDRESS=&SESSIONID={1F5033B2-C668-43E4-A89B-725ADC1FD992}"><img width="25" height="22" border="0" src="../images/directorydriveme.gif"></A></td></tr></table></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="2"><span class="text5">NEEDED INFO(ADDRESS1)<br>NEEDED INFO(ADDRESS2)</span></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="2"><span class="text5">NEEDED INFO (CITY<STATE<ZIP) </span></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="3"><span class="text5">
NEEDEDINFO (PHONE AND FAX)</span></td></tr><tr><td width="20"><img src="../images/pixel.gif" width="10" height="1"></td><td colspan="3"><span class="text5"><A target="_blank" href="http://www.caseih.com/DEALERS/johnsonimp">http://www.caseih.com/DEALERS/johnsonimp </A></span></td></tr><tr><td colspan="4"><img src="../images/pixel.gif" width="1" height="10"></td></tr></table></td><td><img src="../images/pixel.gif" width="8" height="10"></td><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td></tr><tr><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td><td colspan="3" class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td></tr><tr><td class="bordersearch"><img src="../images/pixel.gif" width="1" height="1"></td><td><img src="../images/pixel.gif" width="8" height="10"></td><td valign="top"><table width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td width="20" valign="top"><span class="text4">6.
Sep 18th, 2003, 10:52 AM
#6
So what should be the final output? You need to eliminate some tags and the info nested in them?
Sep 18th, 2003, 10:52 AM
#7
New Member
So you want almost all of the html, starting with ">5. and ending with ">6. ?
That doesn't seem right to me...
Sep 18th, 2003, 04:27 PM
#8
Thread Starter
Lively Member
If you will notice all of the needed info is between "text4" and "text4"
this is a contact record so between the text4 and text 4 are several items that start with text5. I need to pull the name ,address , city ,state ,zip , email and website . All which start with Text5 (with in Text 4.)
Sep 18th, 2003, 06:58 PM
#9
I have this code lying around... what it does is remove all HTML tags leaving the text.
VB Code:
Private Sub Command1_Click()
Dim strWorking As String
Dim strOutput As String
Dim lngPosLessThan As Long
strWorking = Text1.Text
Do While Len(strWorking) > 0
If Left(strWorking, 1) = "<" Then
strWorking = Mid(strWorking, InStr(1, strWorking, ">") + 1)
Else
lngPosLessThan = InStr(1, strWorking, "<")
If lngPosLessThan > 0 Then
'Move non-Tag string to strOutput
strOutput = strOutput & Left(strWorking, lngPosLessThan - 1)
strWorking = Mid(strWorking, InStr(1, strWorking, "<"))
Else 'no other tag in string.
strOutput = strOutput & strWorking
strWorking = ""
End If
End If
Loop
Text2.Text = strOutput
End Sub
In case your willing to enhance it to fit your needs.
Sep 19th, 2003, 12:30 AM
#10
Conquistador
This was my method for stripping HTML tags, may also be of some use
VB Code:
Function RemoveHTML(strHTML As String) As String
Do
tagOpen = InStr(1, strHTML, "<")
tagClose = InStr(1, strHTML, ">")
strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
Debug.Print strHTML
Loop Until InStr(1, strHTML, "<") = 0
RemoveHTML = strHTML
End Function
If it doesn't help at all, I'll write some custom code for you ;/
Sep 19th, 2003, 07:26 AM
#11
Hyperactive Member
hi,
if each record is separated by text4 then
use this
split(entiretext,"text4")
Sep 19th, 2003, 07:39 AM
#12
Frenzied Member
Originally posted by da_silvy
This was my method for stripping HTML tags, may also be of some use
VB Code:
Function RemoveHTML(strHTML As String) As String
Do
tagOpen = InStr(1, strHTML, "<")
tagClose = InStr(1, strHTML, ">")
strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
Debug.Print strHTML
Loop Until InStr(1, strHTML, "<") = 0
RemoveHTML = strHTML
End Function
If it doesn't help at all, I'll write some custom code for you ;/
Can I make a small suggestion?
VB Code:
Function RemoveHTML(strHTML As String) As String
Do
tagOpen = InStr(1, strHTML, "<")
tagClose = InStr(tagOpen, strHTML, ">")
strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
Debug.Print strHTML
Loop Until InStr(1, strHTML, "<") = 0
RemoveHTML = strHTML
End Function
Would that tiny change not help it not find a stray '>' that might be before the tagOpen?
Just a thought
Sep 19th, 2003, 07:56 AM
#13
Conquistador
Far from perfect but it should do what you want, you'll need to tweak it yourself
VB Code:
Private Sub Form_Load()
Dim StartPos As Integer, EndPos As Integer
Dim SearchText As String, IsRecord As Boolean
Dim strText As String
StartPos = 1
SearchText = Text1
SearchText = Replace(SearchText, vbCrLf, " ")
' Clean up double spaces ?
While InStr(1, SearchText, " ")
SearchText = Replace(SearchText, " ", " ")
Wend
While InStr(StartPos, SearchText, "<span class=""") <> 0
StartPos = InStr(StartPos, SearchText, "<span class=""")
EndPos = InStr(StartPos, SearchText, "</span>")
If Mid(SearchText, StartPos + Len("<span class="""), Len("text#")) = "text4" Then
IsRecord = True
Else
IsRecord = False
End If
StartPos = StartPos + Len("<span class=""text#"">")
'strText = Mid(SearchText, StartPos, EndPos - StartPos)
strText = RemoveHTML(Trim(Mid(SearchText, StartPos, EndPos - StartPos)))
' Handle your stuff here
If IsRecord Then
List1.AddItem strText
Else
List1.AddItem vbTab & strText
End If
StartPos = EndPos
Wend
Text1 = SearchText
End Sub
Function RemoveHTML(strHTML As String) As String
Do
If InStr(1, strHTML, "<") = 0 Or InStr(1, strHTML, ">") = 0 Then Exit Do
tagOpen = InStr(1, strHTML, "<")
tagClose = InStr(1, strHTML, ">")
strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
Debug.Print strHTML
Loop Until InStr(1, strHTML, "<") = 0
RemoveHTML = strHTML
End Function
Sep 19th, 2003, 10:19 AM
#14
Frenzied Member
Well-formed HTML is actually a subset of XML. This means you should be able to use an XML parser to analysis the HTML file.
(saves you writing awkward code)
Sep 19th, 2003, 10:38 AM
#15
Conquistador
Originally posted by Spajeoly
Can I make a small suggestion?
VB Code:
Function RemoveHTML(strHTML As String) As String
Do
tagOpen = InStr(1, strHTML, "<")
tagClose = InStr(tagOpen, strHTML, ">")
strHTML = Replace(strHTML, Mid(strHTML, tagOpen, tagClose - tagOpen + 1), "")
Debug.Print strHTML
Loop Until InStr(1, strHTML, "<") = 0
RemoveHTML = strHTML
End Function
Would that tiny change not help it not find a stray '>' that might be before the tagOpen?
Just a thought
heh, it was a 5 minute function reply to a codebank, ideally it shouldn't actually happen, any actual ">"'s and "<"'s would be html characters i.e. > etc
it's a tighter piece of code with that small adjustment though, gw
the code isn't perfect, it fails if there's no < or > in there either, for which there's a small fix in my last post ;p
Sep 19th, 2003, 02:11 PM
#16
Fanatic Member
Originally posted by jayakumar
hi,
if each record is separated by text4 then
use this
split(entiretext,"text4")
This way is easiest.
Sep 19th, 2003, 10:27 PM
#17
Conquistador
It doesn't actually do what he wants, if you check the rest of the posts.
Sep 20th, 2003, 05:30 AM
#18
Conquistador
I couldn't send this via PM so here it is
VB Code:
Dim StartPos As Integer, EndPos As Integer, BRPos As Integer
Dim SearchText As String, IsRecord As Boolean, strBR() As String
Dim FirstField As Boolean, DealerCount As Integer, DealerPrefix As String
Dim strText As String
Dim strInput As String
Open "c:\caseih.txt" For Input As #1
Do Until EOF(1)
Input #1, strInput
SearchText = SearchText & strInput & vbCrLf
Loop
Close #1
StartPos = 1
DealerCount = 0
SearchText = Mid(SearchText, InStr(1, SearchText, "../images/powered.gif"))
SearchText = Replace(SearchText, vbCrLf, "")
While InStr(StartPos, SearchText, "<span class=""") <> 0
StartPos = InStr(StartPos, SearchText, "<span class=""")
EndPos = InStr(StartPos, SearchText, "</span>")
FirstField = False
If IsRecord Then FirstField = True
If Mid(SearchText, StartPos + Len("<span class="""), Len("text#")) = "text4" Then
IsRecord = True
Else
IsRecord = False
End If
StartPos = StartPos + Len("<span class=""text#"">")
strText = Mid(SearchText, StartPos, EndPos - StartPos)
'strText = RemoveHTML(Trim(Mid(SearchText, StartPos, EndPos - StartPos)))
strText = Replace(strText, "<b>", "")
strText = Replace(strText, "</b>", "")
' Handle your stuff here
If Not IsRecord Then
DealerPrefix = ""
If FirstField Then
DealerCount = DealerCount + 1
DealerPrefix = CStr(DealerCount) & "."
End If
If InStr(1, LCase(strText), "<a ") = 0 Then
BRPos = InStr(1, strText, "<br>")
If BRPos <> 0 Then
strBR = Split(strText, "<br>")
For i = 0 To UBound(strBR)
If Len(Trim(strBR(i))) <> 0 Then
List1.AddItem DealerPrefix & vbTab & Trim(strBR(i))
DealerPrefix = ""
End If
Next
Else
List1.AddItem DealerPrefix & vbTab & strText
End If
End If
End If
StartPos = EndPos
Wend
I saved the website to a text file then used that, it parses it all, you just need to decide what you want to do with the information.
Sep 20th, 2003, 08:48 AM
#19
Thread Starter
Lively Member
Hey
looks like it will work but on this line i get an error.
PHP Code:
SearchText = Mid ( SearchText , InStr ( 1 , SearchText , "../images/powered.gif" ))
The error says Invalid prcedure call or argument. Runtime error 5
Sep 20th, 2003, 09:54 AM
#20
Conquistador
I was parsing the entire website ;o
There are occurences of spans using that class before the dealer information
That "powered.gif" is the powered by map blah blah, and the dealers follow...
You can just remove that line, the code's no different (really) from what I posted before.
Sep 20th, 2003, 11:30 AM
#21
Thread Starter
Lively Member
I guess im still doing something wrong.
I guess i am still doing something wrong.
Program goes from
PHP Code:
While InStr ( StartPos , SearchText , "<span class=""" ) <> 0
to
here is complete code .
PHP Code:
Private Sub Command1_Click ()
Dim StartPos As Integer , EndPos As Integer , BRPos As Integer
Dim SearchText As String , IsRecord As Boolean , strBR () As String
Dim FirstField As Boolean , DealerCount As Integer , DealerPrefix As String
Dim strText As String
Dim strInput As String
Open "c:\caseih.txt" For Input As #1
Do Until EOF ( 1 )
Input #1, strInput
SearchText = SearchText & strInput & vbCrLf
Loop
Close #1
StartPos = 1
DealerCount = 0
'SearchText = Mid(SearchText, InStr(1, SearchText, "../images/powered.gif"))
SearchText = Replace(SearchText, vbCrLf, "")
While InStr(StartPos, SearchText, "<span class=""") <> 0
StartPos = InStr(StartPos, SearchText, "<span class=""")
EndPos = InStr(StartPos, SearchText, "</span>")
FirstField = False
If IsRecord Then FirstField = True
If Mid(SearchText, StartPos + Len("<span class="""), Len("text#")) = "text4" Then
IsRecord = True
Else
IsRecord = False
End If
StartPos = StartPos + Len("<span class=""text#"">")
strText = Mid(SearchText, StartPos, EndPos - StartPos)
' strText = RemoveHTML ( Trim ( Mid ( SearchText , StartPos , EndPos - StartPos )))
strText = Replace ( strText , "<b>" , "" )
strText = Replace ( strText , "</b>" , "" )
' Handle your stuff here
If Not IsRecord Then
DealerPrefix = ""
If FirstField Then
DealerCount = DealerCount + 1
DealerPrefix = CStr(DealerCount) & "."
End If
If InStr(1, LCase(strText), "<a ") = 0 Then
BRPos = InStr(1, strText, "<br>")
If BRPos <> 0 Then
strBR = Split(strText, "<br>")
For i = 0 To UBound(strBR)
If Len(Trim(strBR(i))) <> 0 Then
List1.AddItem DealerPrefix & vbTab & Trim(strBR(i))
DealerPrefix = ""
End If
Next
Else
List1.AddItem DealerPrefix & vbTab & strText
End If
End If
End If
StartPos = EndPos
Wend
End Sub
Sep 20th, 2003, 11:37 AM
#22
Conquistador
have you set the search text or anything?
you can't use my code "as is", you have to do some work ._.
Sep 20th, 2003, 12:31 PM
#23
Thread Starter
Lively Member
searchtext
what do you mean by setting the searchtext, it is the string of HTML , what do you mean.
im sorry but i dont understand what to do.
thanks for your help.
Sep 20th, 2003, 01:26 PM
#24
Conquistador
well you have to adjust the code for what you are doing
where are you getting the text which you want to parse from?
Sep 22nd, 2003, 07:53 AM
#25
Fanatic Member
Split([entireHTMLstring],"text4") gives you an array containing
the text between all the text4's.
if you then run a Split(text4array(1,2,3,4,etc),"text5") it will give you the text between all the text5's.
course, then you gotta clean it up, cause I don't think you want half-tags in it.
use the replace function.
Sep 22nd, 2003, 05:09 PM
#26
Thread Starter
Lively Member
Here is the text i am trying to parse.
da_silvy,
here is the text i am trying to parse.
Any help is appreciated.
Attached Files
Sep 23rd, 2003, 03:34 AM
#27
Conquistador
That website's a joke, it's not html transitional at all.
VB Code:
Dim StartPos As Double, EndPos As Double, BRPos As Double
Dim SearchText As String, IsRecord As Boolean, strBR() As String
Dim FirstField As Boolean, DealerCount As Integer, DealerPrefix As String
Dim strText As String
Dim strInput As String
Open "c:\caseih.htm" For Input As #1
Do Until EOF(1)
Input #1, strInput
SearchText = SearchText & strInput & vbCrLf
Loop
Close #1
StartPos = 1
DealerCount = 0
SearchText = Mid(SearchText, InStr(1, SearchText, "/powered.gif"))
SearchText = Replace(SearchText, vbCrLf, "")
While InStr(StartPos, LCase(SearchText), "<span class=") <> 0
StartPos = InStr(StartPos, LCase(SearchText), "<span class=")
EndPos = InStr(StartPos, LCase(SearchText), "</span>")
FirstField = False
If IsRecord Then FirstField = True
If Mid(LCase(SearchText), StartPos + Len("<span class="), Len("text#")) = "text4" Then
IsRecord = True
Else
IsRecord = False
End If
StartPos = StartPos + Len("<span class=text#>")
strText = Mid(SearchText, StartPos, EndPos - StartPos)
'strText = RemoveHTML(Trim(Mid(SearchText, StartPos, EndPos - StartPos)))
strText = Replace(strText, "<b>", "")
strText = Replace(strText, "</b>", "")
' Handle your stuff here
If Not IsRecord Then
DealerPrefix = ""
If FirstField Then
DealerCount = DealerCount + 1
DealerPrefix = CStr(DealerCount) & "."
End If
If InStr(1, LCase(strText), "<a ") = 0 Then
BRPos = InStr(1, strText, "<br>")
If BRPos <> 0 Then
strBR = Split(strText, "<br>")
For i = 0 To UBound(strBR)
If Len(Trim(strBR(i))) <> 0 Then
List1.AddItem DealerPrefix & vbTab & Trim(strBR(i))
DealerPrefix = ""
End If
Next
Else
List1.AddItem DealerPrefix & vbTab & strText
End If
End If
End If
StartPos = EndPos
Wend
You need to fix up the clean up yourself.
Posting Permissions
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
Forum Rules
Click Here to Expand Forum to Full Width