Results 1 to 3 of 3

Thread: String Extraction II

  1. #1
    Guest
    Here's the situation. I'm taking web pages and extracting the data out of them. I need to take everything enclosed in the HTML tags and write them to another file. I know how to read and write to the new file, I just can't seem to figure out how to get only the part of the string enclosed in the HTML tags. For example:
    For this line...<TD WIDTH="20"><B><U>1</U></B></TD>
    I'll need to get just the number 1

    For this line...<TD ALIGN="RIGHT" WIDTH="60">Half Ounce</TD>
    I'll need to get just "Half Ounce"

    Does anyone know how to accomplish this?

  2. #2
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Try this out, you have to open your file in binary and set #1 to the filenumber you want to use
    Code:
    X=instr(text,"<")
    Do while X
      X=instr(x+1,text,">")
      Y=instr(x+1,text,"<")
      If Y=0 then 
        Y=len(text)
        Put#1,mid(text,x+1,y-x-1)
        Y=0
      else
        Put#1,,mid(text,x+1,y-x-1)
      end if
      X=Y
    Loop
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  3. #3
    Junior Member
    Join Date
    Dec 1999
    Location
    Germany
    Posts
    17
    Use Microsoft HTML (MSHTML.TLB) so you don't need to extract everything manually you have a wonderful object-model with that you can work...
    Frank
    VB-progress: -> -> ->

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width