Results 1 to 3 of 3

Thread: PARSING HTML CODE FOR TAGS.. HOW?

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Jun 2000
    Posts
    99
    How can a program parse HTML code for <A> tags? Then, once inside the tag, find the string: href="mailto:". I need to extract a database of people with e-mail adresses and I need to isolate only the e-mail adress, excluding "?subject=". For example,

    <A href="mailto:[email protected]?subject=hello">

    And vb finds only, "[email protected]"? Some code would help, im fairly new to VB
    ___________________________
    Chris

  2. #2
    Fanatic Member gwdash's Avatar
    Join Date
    Aug 2000
    Location
    Minnesota
    Posts
    666
    You can use the InStr Function to do this. For example
    Code:
    Private Function GetEMail(InputStr)
    Dim i As Integer, b As Integer
    Dim FinalStr As String
    i = InStr(1, InputStr, ":")
    If i Then
    i = i + 1
    b = InStr(i, InputStr, "?")
    If b Then
    FinalStr = Mid(InputStr, i, b - i)
    GetEMail = FinalStr
    Exit Function
    End If
    End If
    GetEMail = "Invalid Address"
    End Function
    The GetEMail function accepts the original string and returns the url only
    You can modify this function to suit your needs

  3. #3
    Former Admin/Moderator MartinLiss's Avatar
    Join Date
    Sep 1999
    Location
    San Jose, CA
    Posts
    33,431
    Code:
        Dim strHTML As String
        Dim intStartPos As Integer
        Dim intEndPos As Integer
        
        strHTML = "<A href=""mailto:[email protected]?subject=hello"">"
        
        ' InStr returns the starting position of the "mailto:" string
        intStartPos = InStr(1, strHTML, "mailto:")
        
        ' Calculate the starting position of the email
        ' address by adding the length of "mailto:"
        intStartPos = intStartPos + Len("mailto:")
        
        ' I don't know much about HTML, so I'm going to assume
        ' that the "?" will always follow the email address,
        ' so this locates the "?"
        intEndPos = InStr(intStartPos, strHTML, "?")
        
        MsgBox Mid$(strHTML, intStartPos, intEndPos - intStartPos)
        
        ' If the "?" is not always there, then you can create similar
        ' code to look for .com, .edu, .org, .gov instead

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width