|
-
Feb 16th, 2000, 09:23 PM
#1
Thread Starter
New Member
I'm trying to retrieve text from HTML file.
The text in the HTML file exists after the
<tt> tag and ends before </tt> tag.
Any time I loop trough the file to get the text I get one chunk of data. What I'm trying to say is, I'm not able to loop through the file line by line and check for the existance of the <tt> tag to read the text after it.
Is there anybody who can help me Please.
-
Feb 16th, 2000, 10:51 PM
#2
Hyperactive Member
I'm not sure how to format this so it's going to look a little ugly. Let me know if you have any questions.
function GetTT(strFileName as string)
Dim strText As String, strLine As String
Dim booFound as boolean, intJ as integer
dim intFreeFile as integer
intFreeFile = FreeFile
Open strFilename For Input As #intFreeFile
Do Until EOF(intFreeFile)
Line Input #intFreeFile, strLine
'strLine is each line of the file
intJ = InStr(strLine, "tt")
If intJ = 0 Then 'wasn't found in this line
if boofound = true then
'have found first tt so add
strText = strText & mid(strline, 1, intj -1)
strLine = Mid$(strLine, intJ + 1)
else
'haven't found the first <tt>
end if
Else
if boofound = true then
'already found begining, so add
strText = strText & mid(strline, 1, intj -1)
strLine = Mid$(strLine, intJ + 1)
exit do
else
boofound = true
strText = Mid$(strLine, 1, intJ - 1)
strLine = Mid$(strLine, intJ + 1)
'have found first instance of it
end if
End If
loop
I haven't tested this and the formatting sucks but try it out.
-
Feb 16th, 2000, 10:56 PM
#3
Hyperactive Member
oops I realized something, when find first example of tt, should grab everything else after it, not before it. As well, should check the restof the line for the end: Use this:
dim intI as integer
intJ = InStr(strLine, "tt")
If intJ = 0 Then 'wasn't found in this line
if boofound = true then
'have found first tt so add
strText = strText & strline)
else
'haven't found the first <tt>
end if
Else
if boofound = true then
'already found begining, so add
strText = strText & mid(strline, 1, intj -1)
exit do
else
boofound = true
strText = Mid$(strLine, intJ + 3) 'goes after the <tt>
intI = InStr(strText, "tt")
if intI <> 0 then
'this means that the end <tt> is on this line
strtext = mid(strtext, 1, intI-1)
exit do
end if
'have found first instance of it
end if
[This message has been edited by netSurfer (edited 02-17-2000).]
-
Feb 16th, 2000, 11:23 PM
#4
Thread Starter
New Member
When I try to Display a message to view what is in "strline", I don't get the data line by line
All what I get is one chunck of data that is
Not the the first line in te file and is not the whole file either.
It is just one chunck of the file and that is it.
How can I read this file line by line and get data from every <tt>...</tt>.
it seems to me that there is no carriage return at the end of each line or something.
How can I overcome this problem ?
Please help. I've been working on this for a while with no results !!!
-
Feb 16th, 2000, 11:31 PM
#5
Hyperactive Member
You have the same problem I had. The problem I had was that the web file only had chr(10) and it needs chr(13) too. Try this to change the file:
Function Readweb(strFile As String)
Dim fnum As Long
Dim WholeFile As String
Dim char As String
Dim lines() As String
Dim i As Integer
Dim charPos As Integer
Dim intI As Integer
Dim intFreeFile As Integer
'this will take a web created file and convert it so that I can read it
'the web file is missing chr(13) - carriage return and I need to add it
'it leaves the original file - makes a new .tmp file
fnum = FreeFile
Open strFile For Binary As fnum ' Open file.
WholeFile = Space(LOF(fnum))
Get fnum, , WholeFile
Close fnum
charPos = InStr(WholeFile, Chr(10))
While charPos <> 0
charPos = InStr(WholeFile, Chr(10))
ReDim Preserve lines(i + 1)
If charPos <> 0 Then
If WholeFile <> "" Then
lines(i) = Left(WholeFile, charPos - 1)
'trim the text
WholeFile = Mid(WholeFile, charPos + 1)
i = i + 1
End If
End If
Wend
intFreeFile = FreeFile
'this has taken the whole file and dumped it into lines() array. Need to open the lines array and remove all the "
Dim intJ As Integer, strTemp As String, strTemp2 As String
For intI = 0 To i 'this is the array
strTemp = lines(intI)
'that should have taken the string out
charPos = InStr(strTemp, Chr(34)) 'get "
While charPos <> 0
strTemp2 = strTemp2 & Mid(strTemp, 1, charPos - 1) 'get up to the "
strTemp = Mid(strTemp, charPos + 1) 'drop off everything else
charPos = InStr(strTemp, Chr(34))
Wend
strTemp2 = strTemp2 & strTemp
If strTemp2 <> "" Then
lines(intI) = strTemp2
End If
strTemp2 = ""
Next intI
strFile = Mid(strFile, 1, Len(strFile) - 3) & ".tmp"
Open strFile For Output Access Write As #intFreeFile
For intI = 0 To i
Print #intFreeFile, lines(intI)
Next intI
exit function
This is a function I use that opens a web file, adds the proper chars at the end of each line, it then goes through and checks to remove " - this is just a interal thing I needed to do - you could replace it with the code I gave you before but instead of reading the file, read the lines array like I do. If you're not sure what I mean let me know. I'm just at work so don't have the time to post all of it. It then writes the changes to a temp table. You probably don't have to do this.
-
Feb 16th, 2000, 11:44 PM
#6
Thread Starter
New Member
That is exactly the problem here.
But I don't need to get rid of the ", So where exactly the other code you sent me should go ?
-
Feb 17th, 2000, 01:35 AM
#7
Thread Starter
New Member
Shouldn't the "Space(LOF(fnum))" give me
the whole file ?
I'm getting only a part of it.
When I display "strtemp" to see what is in the array, I get a part of the file. It stops at the first time where it finds "</tt>".
How can I keep going to the end of the file.
-
Feb 17th, 2000, 01:52 AM
#8
Hyperactive Member
you're right the Exit Do should be Exit For
This is if you want to break out after the first </tt>
StrTemp is only a piece of the file at a time. The file is dumped into the lines() array and strTemp reads the array one element at a time. Depending on what you want to do with information, you would need to do different things. What I would do, just before
Next intI
put:
strTTArray(inti) = strText
strText = ""
then at top:
dim strTTArray() as string
then in between:
End If
Wend
and
Dim intJ As Integer, strTemp As String, strTemp2 As String, intK as integer
put this:
rdim strTTArray(i)
Then remove the Exit For's
This will go through the file, putting the info inbetween <tt> and </tt> in the strTTArray. Then you can do what you want with this.
Hope this helps.
-
Feb 17th, 2000, 02:22 AM
#9
Thread Starter
New Member
This is all the code I have so far. I'm getting some of the info between <tt> and </tt> into the strTTArray but not all of it.
Sometimes I get something in the array that is not from the info between <tt> and </tt>.
All what I need is the info between <tt> and
</tt> from every line, so I don't have to add more code to get just a peace of it.
Sorry for asking too much.
Function Readweb(strFile As String)
Dim fnum As Long
Dim WholeFile As String
Dim char As String
Dim lines() As String
Dim i As Integer
Dim charPos As Integer
Dim intI As Integer
Dim intFreeFile As Integer
Dim strTTArray() As String
'this will take a web created file and convert it so that I can read it
'the web file is missing chr(13) - carriage return and I need to add it
'it leaves the original file - makes a new .tmp file
fnum = FreeFile
Open strFile For Binary As fnum ' Open file.
WholeFile = Space(LOF(fnum))
Get fnum, , WholeFile
Close fnum
charPos = InStr(WholeFile, Chr(10))
While charPos <> 0
charPos = InStr(WholeFile, Chr(10))
ReDim Preserve lines(i + 1)
If charPos <> 0 Then
If WholeFile <> "" Then
lines(i) = Left(WholeFile, charPos - 1)
'trim the text
WholeFile = Mid(WholeFile, charPos + 1)
i = i + 1
End If
End If
Wend
ReDim strTTArray(i)
Dim intJ As Integer, strTemp As String, strTemp2 As String, intK As Integer
For intI = 0 To i 'this is the array
strTemp = lines(intI)
'MsgBox strTemp
intJ = InStr(strTemp, "tt")
If intJ = 0 Then 'wasn't found in this line
If boofound = True Then
'have found first tt so add
strText = strText & strTemp
Else
'haven't found the first <tt>
End If
Else
If boofound = True Then
'already found begining, so add
strText = strText & Mid(strTemp, 1, intJ - 1)
'Exit For
Else
boofound = True
strText = Mid$(strTemp, intJ + 3) 'goes after the <tt>
intK = InStr(strText, "tt")
If intK <> 0 Then
'this means that the end <tt> is on this line
strText = Mid(strText, 1, intK - 1)
'Exit For
End If
'have found first instance of it
End If
End If
'now strText has should have all the text between <tt> and </tt>
'do what you want with it
strTTArray(intI) = strText
strText = ""
MsgBox strTTArray(intI)
Next intI
Exit Function
End Function
-
Feb 17th, 2000, 04:16 AM
#10
Hyperactive Member
Tell you what, I'm at work so can't really test this right now but, if you email me the code that you have as well as a sample HTML page that has those tags, I'll play with it a bit at home and can send it back to you. Sorry I can't do it now but I don't have enough time.
------------------
'cos Buzby says so!'
-
Feb 17th, 2000, 04:29 AM
#11
Thread Starter
New Member
I have e-mailed the code and a sample of the
HTML file.
Thank you for your Help, you're really appreciated
-
Feb 17th, 2000, 04:31 AM
#12
Hyperactive Member
Great, I will take a look as soon as I get home. I should be able to send you back something tonight. I'll try to remember to post the finished function here too.
-
Feb 17th, 2000, 11:40 AM
#13
Hyperactive Member
Ok, I believe that this code should work. Let me know if it doesn't and I'll tweak it. The main difference is I use 2 variables to tell me whether the starting tag has been found or not.
Function ReadWeb(strFile As String)
Dim fnum As Long
Dim WholeFile As String
Dim char As String
Dim lines() As String
Dim i As Integer
Dim charPos As Integer
Dim intI As Integer
Dim intFreeFile As Integer
Dim strTTArray() As String
'this will take a web created file and convert it so that I can read it
'the web file is missing chr(13) - carriage return and I need to add it
'it leaves the original file - makes a new .tmp file
fnum = FreeFile
Open strFile For Binary As fnum ' Open file.
WholeFile = Space(LOF(fnum))
Get fnum, , WholeFile
Close fnum
charPos = InStr(WholeFile, Chr(10))
While charPos <> 0
charPos = InStr(WholeFile, Chr(10))
ReDim Preserve lines(i + 1)
If charPos <> 0 Then
If WholeFile <> "" Then
lines(i) = Left(WholeFile, charPos - 1)
'trim the text
WholeFile = Mid(WholeFile, charPos + 1)
i = i + 1
End If
End If
Wend
ReDim strTTArray(i)
Dim intJ As Integer, strTemp As String, strTemp2 As String, intK As Integer
Dim booFirst As Boolean, booBetween As Boolean
For intI = 0 To i 'this is the array
strTemp = lines(intI) 'get the first line
'MsgBox strTemp
intJ = InStr(strTemp, "tt") 'check for tt
If intJ = 0 Then 'wasn't found in this line
'need to add this line to the strText if have already found the begining tag
If booFirst = True Then
strText = strText & strTemp
End If
Else
'found instance of tt
If booFirst = True Then
'already found begining tag, so this is end tag
strText = strText & Mid(strTemp, 1, intJ - 1)
booFirst = False
'set first to false so that it looks for the <tt> again
booBetween = True
'have found closing tag
Else 'this is beginging tag
booFirst = True
booBetween = False
strText = Mid$(strTemp, intJ + 3) 'goes after the <tt>
intK = InStr(strText, "tt") 'have to check to see if tag ends on this line
If intK <> 0 Then
'this means that the end <tt> is on this line
strText = Mid(strText, 1, intK - 1)
booFirst = False
booBetween = True
'so have found closing tag
End If
End If
End If
If booBetween = True Then
'this means that have grabbed data inbetween the 2 tags, store it and clear variables
strTTArray(intI) = strText
strText = ""
booBetween = False
booFirst = False
MsgBox strTTArray(intI)
End If
Next intI
'now strttarray should contain all the data inbetween the TT tags
End Function
-
Feb 17th, 2000, 12:05 PM
#14
Thread Starter
New Member
I'm not really sure where to place the code you gave me before into this last peace of code.
Please help.
-
Feb 17th, 2000, 12:09 PM
#15
Addicted Member
http://members.xoom.com/micahcarrick...s/mcHTMLv1.bas
That's a module that I wrote a while ago for my first project in VB. It was a program that scans through HTML and adds Width= and Height= properties to the IMG tags.
Somewhere in that module is a function that will return a string of text from a tag given the tag.
You can take that out of it and/or modify it if you need to. It has a bunch of other functions too.
Like I said though ... I wrote it when I first started in VB so it may have some minor syntactical impracticalities. (hehe)
------------------
Micah Carrick
http://micah.carrick.com
[email protected]
ICQ: 53480225
-
Feb 17th, 2000, 12:19 PM
#16
Hyperactive Member
Ok, here you go:
Function Readweb(strFile As String)
Dim fnum As Long
Dim WholeFile As String
Dim char As String
Dim lines() As String
Dim i As Integer
Dim charPos As Integer
Dim intI As Integer
Dim intFreeFile As Integer
'this will take a web created file and convert it so that I can read it
'the web file is missing chr(13) - carriage return and I need to add it
'it leaves the original file - makes a new .tmp file
fnum = FreeFile
Open strFile For Binary As fnum ' Open file.
WholeFile = Space(LOF(fnum))
Get fnum, , WholeFile
Close fnum
charPos = InStr(WholeFile, Chr(10))
While charPos <> 0
charPos = InStr(WholeFile, Chr(10))
ReDim Preserve lines(i + 1)
If charPos <> 0 Then
If WholeFile <> "" Then
lines(i) = Left(WholeFile, charPos - 1)
'trim the text
WholeFile = Mid(WholeFile, charPos + 1)
i = i + 1
End If
End If
Wend
Dim intJ As Integer, strTemp As String, strTemp2 As String, intK as integer
For intI = 0 To i 'this is the array
strTemp = lines(intI)
intJ = InStr(strTemp, "tt")
If intJ = 0 Then 'wasn't found in this line
if boofound = true then
'have found first tt so add
strText = strText & strtemp)
else
'haven't found the first <tt>
end if
Else
if boofound = true then
'already found begining, so add
strText = strText & mid(strTemp, 1, intj -1)
exit do
else
boofound = true
strText = Mid$(strtemp, intJ + 3) 'goes after the <tt>
intK = InStr(strText, "tt")
if intK <> 0 then
'this means that the end <tt> is on this line
strtext = mid(strtext, 1, intK-1)
exit for
end if
'have found first instance of it
end if
next inti
'now strText has should have all the text between <tt> and </tt>
'do what you want with it
Next intI
exit function
I don't know how many times you want to loop this but you could then right strText into a file or into another array, it's up to you.
Ok, I'm an idiot, I should have edited it better the first teim.
[This message has been edited by netSurfer (edited 02-17-2000).]
-
Feb 17th, 2000, 12:32 PM
#17
So Unbanned
So manily you're trying to make it filter out all the tags?
------------------
DiGiTaIErRoR
VB, QBasic, Iptscrae, HTML
Quote: There are no stupid questions, just stupid people.
-
Feb 17th, 2000, 12:41 PM
#18
Thread Starter
New Member
No man you are not Idiot, you are good.
I'm working on it right now, if I needed more help I'll let you know.
Thank you very much.
-
Feb 17th, 2000, 12:42 PM
#19
Hyperactive Member
I hope it works, i haven't had a chance to actually test what I gave you. If you have any questions, feel free to email me.
-
Feb 17th, 2000, 12:46 PM
#20
Thread Starter
New Member
In the code where you wrote "exit Do"
Shouldn't this be "exit function", because you are not looping through the file any more.Instead you are using the array ?
Just checking.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|