Results 1 to 20 of 20

Thread: How do I retrieve text from HTML File, Help Please

  1. #1

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    I'm trying to retrieve text from HTML file.
    The text in the HTML file exists after the
    <tt> tag and ends before </tt> tag.

    Any time I loop trough the file to get the text I get one chunk of data. What I'm trying to say is, I'm not able to loop through the file line by line and check for the existance of the <tt> tag to read the text after it.

    Is there anybody who can help me Please.


  2. #2
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    I'm not sure how to format this so it's going to look a little ugly. Let me know if you have any questions.

    function GetTT(strFileName as string)

    Dim strText As String, strLine As String
    Dim booFound as boolean, intJ as integer
    dim intFreeFile as integer

    intFreeFile = FreeFile

    Open strFilename For Input As #intFreeFile

    Do Until EOF(intFreeFile)

    Line Input #intFreeFile, strLine

    'strLine is each line of the file

    intJ = InStr(strLine, "tt")
    If intJ = 0 Then 'wasn't found in this line
    if boofound = true then
    'have found first tt so add
    strText = strText & mid(strline, 1, intj -1)
    strLine = Mid$(strLine, intJ + 1)
    else
    'haven't found the first <tt>
    end if
    Else
    if boofound = true then
    'already found begining, so add
    strText = strText & mid(strline, 1, intj -1)
    strLine = Mid$(strLine, intJ + 1)
    exit do
    else
    boofound = true
    strText = Mid$(strLine, 1, intJ - 1)
    strLine = Mid$(strLine, intJ + 1)
    'have found first instance of it
    end if
    End If
    loop

    I haven't tested this and the formatting sucks but try it out.

  3. #3
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    oops I realized something, when find first example of tt, should grab everything else after it, not before it. As well, should check the restof the line for the end: Use this:

    dim intI as integer

    intJ = InStr(strLine, "tt")
    If intJ = 0 Then 'wasn't found in this line
    if boofound = true then
    'have found first tt so add
    strText = strText & strline)
    else
    'haven't found the first <tt>
    end if
    Else
    if boofound = true then
    'already found begining, so add
    strText = strText & mid(strline, 1, intj -1)
    exit do
    else
    boofound = true
    strText = Mid$(strLine, intJ + 3) 'goes after the <tt>
    intI = InStr(strText, "tt")
    if intI <> 0 then
    'this means that the end <tt> is on this line
    strtext = mid(strtext, 1, intI-1)
    exit do
    end if
    'have found first instance of it
    end if

    [This message has been edited by netSurfer (edited 02-17-2000).]

  4. #4

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    When I try to Display a message to view what is in "strline", I don't get the data line by line
    All what I get is one chunck of data that is
    Not the the first line in te file and is not the whole file either.

    It is just one chunck of the file and that is it.

    How can I read this file line by line and get data from every <tt>...</tt>.

    it seems to me that there is no carriage return at the end of each line or something.

    How can I overcome this problem ?

    Please help. I've been working on this for a while with no results !!!

  5. #5
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    You have the same problem I had. The problem I had was that the web file only had chr(10) and it needs chr(13) too. Try this to change the file:

    Function Readweb(strFile As String)
    Dim fnum As Long
    Dim WholeFile As String
    Dim char As String
    Dim lines() As String
    Dim i As Integer
    Dim charPos As Integer
    Dim intI As Integer
    Dim intFreeFile As Integer

    'this will take a web created file and convert it so that I can read it
    'the web file is missing chr(13) - carriage return and I need to add it
    'it leaves the original file - makes a new .tmp file

    fnum = FreeFile

    Open strFile For Binary As fnum ' Open file.
    WholeFile = Space(LOF(fnum))
    Get fnum, , WholeFile
    Close fnum

    charPos = InStr(WholeFile, Chr(10))

    While charPos <> 0
    charPos = InStr(WholeFile, Chr(10))
    ReDim Preserve lines(i + 1)
    If charPos <> 0 Then
    If WholeFile <> "" Then
    lines(i) = Left(WholeFile, charPos - 1)
    'trim the text
    WholeFile = Mid(WholeFile, charPos + 1)
    i = i + 1
    End If
    End If
    Wend

    intFreeFile = FreeFile
    'this has taken the whole file and dumped it into lines() array. Need to open the lines array and remove all the "
    Dim intJ As Integer, strTemp As String, strTemp2 As String

    For intI = 0 To i 'this is the array
    strTemp = lines(intI)
    'that should have taken the string out
    charPos = InStr(strTemp, Chr(34)) 'get "
    While charPos <> 0
    strTemp2 = strTemp2 & Mid(strTemp, 1, charPos - 1) 'get up to the "
    strTemp = Mid(strTemp, charPos + 1) 'drop off everything else
    charPos = InStr(strTemp, Chr(34))
    Wend
    strTemp2 = strTemp2 & strTemp
    If strTemp2 <> "" Then
    lines(intI) = strTemp2
    End If
    strTemp2 = ""
    Next intI

    strFile = Mid(strFile, 1, Len(strFile) - 3) & ".tmp"
    Open strFile For Output Access Write As #intFreeFile
    For intI = 0 To i
    Print #intFreeFile, lines(intI)
    Next intI

    exit function

    This is a function I use that opens a web file, adds the proper chars at the end of each line, it then goes through and checks to remove " - this is just a interal thing I needed to do - you could replace it with the code I gave you before but instead of reading the file, read the lines array like I do. If you're not sure what I mean let me know. I'm just at work so don't have the time to post all of it. It then writes the changes to a temp table. You probably don't have to do this.

  6. #6

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    That is exactly the problem here.
    But I don't need to get rid of the ", So where exactly the other code you sent me should go ?

  7. #7

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    Shouldn't the "Space(LOF(fnum))" give me
    the whole file ?

    I'm getting only a part of it.
    When I display "strtemp" to see what is in the array, I get a part of the file. It stops at the first time where it finds "</tt>".
    How can I keep going to the end of the file.

  8. #8
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    you're right the Exit Do should be Exit For

    This is if you want to break out after the first </tt>

    StrTemp is only a piece of the file at a time. The file is dumped into the lines() array and strTemp reads the array one element at a time. Depending on what you want to do with information, you would need to do different things. What I would do, just before

    Next intI

    put:


    strTTArray(inti) = strText
    strText = ""

    then at top:
    dim strTTArray() as string


    then in between:
    End If
    Wend

    and

    Dim intJ As Integer, strTemp As String, strTemp2 As String, intK as integer

    put this:

    rdim strTTArray(i)


    Then remove the Exit For's


    This will go through the file, putting the info inbetween <tt> and </tt> in the strTTArray. Then you can do what you want with this.

    Hope this helps.

  9. #9

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    This is all the code I have so far. I'm getting some of the info between <tt> and </tt> into the strTTArray but not all of it.

    Sometimes I get something in the array that is not from the info between <tt> and </tt>.

    All what I need is the info between <tt> and
    </tt> from every line, so I don't have to add more code to get just a peace of it.

    Sorry for asking too much.

    Function Readweb(strFile As String)
    Dim fnum As Long
    Dim WholeFile As String
    Dim char As String
    Dim lines() As String
    Dim i As Integer
    Dim charPos As Integer
    Dim intI As Integer
    Dim intFreeFile As Integer
    Dim strTTArray() As String


    'this will take a web created file and convert it so that I can read it
    'the web file is missing chr(13) - carriage return and I need to add it
    'it leaves the original file - makes a new .tmp file

    fnum = FreeFile

    Open strFile For Binary As fnum ' Open file.


    WholeFile = Space(LOF(fnum))
    Get fnum, , WholeFile
    Close fnum

    charPos = InStr(WholeFile, Chr(10))

    While charPos <> 0
    charPos = InStr(WholeFile, Chr(10))
    ReDim Preserve lines(i + 1)
    If charPos <> 0 Then
    If WholeFile <> "" Then
    lines(i) = Left(WholeFile, charPos - 1)
    'trim the text
    WholeFile = Mid(WholeFile, charPos + 1)
    i = i + 1
    End If
    End If
    Wend

    ReDim strTTArray(i)

    Dim intJ As Integer, strTemp As String, strTemp2 As String, intK As Integer

    For intI = 0 To i 'this is the array

    strTemp = lines(intI)
    'MsgBox strTemp
    intJ = InStr(strTemp, "tt")
    If intJ = 0 Then 'wasn't found in this line
    If boofound = True Then
    'have found first tt so add
    strText = strText & strTemp

    Else
    'haven't found the first <tt>
    End If
    Else
    If boofound = True Then
    'already found begining, so add
    strText = strText & Mid(strTemp, 1, intJ - 1)

    'Exit For
    Else
    boofound = True
    strText = Mid$(strTemp, intJ + 3) 'goes after the <tt>
    intK = InStr(strText, "tt")
    If intK <> 0 Then
    'this means that the end <tt> is on this line
    strText = Mid(strText, 1, intK - 1)
    'Exit For
    End If

    'have found first instance of it
    End If

    End If

    'now strText has should have all the text between <tt> and </tt>
    'do what you want with it

    strTTArray(intI) = strText

    strText = ""

    MsgBox strTTArray(intI)
    Next intI

    Exit Function


    End Function

  10. #10
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    Tell you what, I'm at work so can't really test this right now but, if you email me the code that you have as well as a sample HTML page that has those tags, I'll play with it a bit at home and can send it back to you. Sorry I can't do it now but I don't have enough time.

    ------------------
    'cos Buzby says so!'

  11. #11

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    I have e-mailed the code and a sample of the
    HTML file.

    Thank you for your Help, you're really appreciated

  12. #12
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    Great, I will take a look as soon as I get home. I should be able to send you back something tonight. I'll try to remember to post the finished function here too.

  13. #13
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    Ok, I believe that this code should work. Let me know if it doesn't and I'll tweak it. The main difference is I use 2 variables to tell me whether the starting tag has been found or not.


    Function ReadWeb(strFile As String)
    Dim fnum As Long
    Dim WholeFile As String
    Dim char As String
    Dim lines() As String
    Dim i As Integer
    Dim charPos As Integer
    Dim intI As Integer
    Dim intFreeFile As Integer
    Dim strTTArray() As String

    'this will take a web created file and convert it so that I can read it
    'the web file is missing chr(13) - carriage return and I need to add it
    'it leaves the original file - makes a new .tmp file
    fnum = FreeFile
    Open strFile For Binary As fnum ' Open file.

    WholeFile = Space(LOF(fnum))
    Get fnum, , WholeFile
    Close fnum
    charPos = InStr(WholeFile, Chr(10))
    While charPos <> 0
    charPos = InStr(WholeFile, Chr(10))
    ReDim Preserve lines(i + 1)
    If charPos <> 0 Then
    If WholeFile <> "" Then
    lines(i) = Left(WholeFile, charPos - 1)
    'trim the text
    WholeFile = Mid(WholeFile, charPos + 1)
    i = i + 1
    End If
    End If
    Wend

    ReDim strTTArray(i)
    Dim intJ As Integer, strTemp As String, strTemp2 As String, intK As Integer
    Dim booFirst As Boolean, booBetween As Boolean

    For intI = 0 To i 'this is the array
    strTemp = lines(intI) 'get the first line
    'MsgBox strTemp
    intJ = InStr(strTemp, "tt") 'check for tt
    If intJ = 0 Then 'wasn't found in this line
    'need to add this line to the strText if have already found the begining tag
    If booFirst = True Then
    strText = strText & strTemp
    End If
    Else
    'found instance of tt
    If booFirst = True Then
    'already found begining tag, so this is end tag
    strText = strText & Mid(strTemp, 1, intJ - 1)
    booFirst = False
    'set first to false so that it looks for the <tt> again
    booBetween = True
    'have found closing tag
    Else 'this is beginging tag
    booFirst = True
    booBetween = False
    strText = Mid$(strTemp, intJ + 3) 'goes after the <tt>
    intK = InStr(strText, "tt") 'have to check to see if tag ends on this line
    If intK <> 0 Then
    'this means that the end <tt> is on this line
    strText = Mid(strText, 1, intK - 1)
    booFirst = False
    booBetween = True
    'so have found closing tag
    End If
    End If
    End If

    If booBetween = True Then
    'this means that have grabbed data inbetween the 2 tags, store it and clear variables
    strTTArray(intI) = strText
    strText = ""
    booBetween = False
    booFirst = False
    MsgBox strTTArray(intI)
    End If
    Next intI

    'now strttarray should contain all the data inbetween the TT tags

    End Function


  14. #14

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    I'm not really sure where to place the code you gave me before into this last peace of code.

    Please help.

  15. #15
    Addicted Member
    Join Date
    May 1999
    Location
    Californ-I- A
    Posts
    207

    Post

    http://members.xoom.com/micahcarrick...s/mcHTMLv1.bas

    That's a module that I wrote a while ago for my first project in VB. It was a program that scans through HTML and adds Width= and Height= properties to the IMG tags.

    Somewhere in that module is a function that will return a string of text from a tag given the tag.

    You can take that out of it and/or modify it if you need to. It has a bunch of other functions too.

    Like I said though ... I wrote it when I first started in VB so it may have some minor syntactical impracticalities. (hehe)



    ------------------
    Micah Carrick
    http://micah.carrick.com
    [email protected]
    ICQ: 53480225


  16. #16
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    Ok, here you go:

    Function Readweb(strFile As String)
    Dim fnum As Long
    Dim WholeFile As String
    Dim char As String
    Dim lines() As String
    Dim i As Integer
    Dim charPos As Integer
    Dim intI As Integer
    Dim intFreeFile As Integer

    'this will take a web created file and convert it so that I can read it
    'the web file is missing chr(13) - carriage return and I need to add it
    'it leaves the original file - makes a new .tmp file

    fnum = FreeFile

    Open strFile For Binary As fnum ' Open file.
    WholeFile = Space(LOF(fnum))
    Get fnum, , WholeFile
    Close fnum

    charPos = InStr(WholeFile, Chr(10))

    While charPos <> 0
    charPos = InStr(WholeFile, Chr(10))
    ReDim Preserve lines(i + 1)
    If charPos <> 0 Then
    If WholeFile <> "" Then
    lines(i) = Left(WholeFile, charPos - 1)
    'trim the text
    WholeFile = Mid(WholeFile, charPos + 1)
    i = i + 1
    End If
    End If
    Wend

    Dim intJ As Integer, strTemp As String, strTemp2 As String, intK as integer

    For intI = 0 To i 'this is the array
    strTemp = lines(intI)

    intJ = InStr(strTemp, "tt")
    If intJ = 0 Then 'wasn't found in this line
    if boofound = true then
    'have found first tt so add
    strText = strText & strtemp)
    else
    'haven't found the first <tt>
    end if
    Else
    if boofound = true then
    'already found begining, so add
    strText = strText & mid(strTemp, 1, intj -1)
    exit do
    else
    boofound = true
    strText = Mid$(strtemp, intJ + 3) 'goes after the <tt>
    intK = InStr(strText, "tt")
    if intK <> 0 then
    'this means that the end <tt> is on this line
    strtext = mid(strtext, 1, intK-1)
    exit for
    end if
    'have found first instance of it
    end if
    next inti

    'now strText has should have all the text between <tt> and </tt>
    'do what you want with it

    Next intI

    exit function

    I don't know how many times you want to loop this but you could then right strText into a file or into another array, it's up to you.


    Ok, I'm an idiot, I should have edited it better the first teim.

    [This message has been edited by netSurfer (edited 02-17-2000).]

  17. #17
    So Unbanned DiGiTaIErRoR's Avatar
    Join Date
    Apr 1999
    Location
    /dev/null
    Posts
    4,111

    Post

    So manily you're trying to make it filter out all the tags?

    ------------------
    DiGiTaIErRoR
    VB, QBasic, Iptscrae, HTML
    Quote: There are no stupid questions, just stupid people.

  18. #18

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    No man you are not Idiot, you are good.

    I'm working on it right now, if I needed more help I'll let you know.

    Thank you very much.

  19. #19
    Hyperactive Member
    Join Date
    Jun 1999
    Location
    Calgary Alberta
    Posts
    359

    Post

    I hope it works, i haven't had a chance to actually test what I gave you. If you have any questions, feel free to email me.

  20. #20

    Thread Starter
    New Member
    Join Date
    Feb 2000
    Posts
    13

    Post

    In the code where you wrote "exit Do"

    Shouldn't this be "exit function", because you are not looping through the file any more.Instead you are using the array ?

    Just checking.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width