Results 1 to 10 of 10

Thread: Using VB to parse HTML pages?

  1. #1

    Thread Starter
    New Member
    Join Date
    Sep 2000
    Location
    Lafayette, Indiana
    Posts
    4

    Talking

    Need to know if/how to parse generic html files to a full-blown html page. Currently I use PERL with the HTML::TreeBuilder Mod which builds the html structure, stores html data into hashes to be referenced at will.

    I think there are similar libraries for c++ but wondering if Visual Basic has anything along the ability to parse files, to simplify the development of an app that I have already made with PERL. Any input would be great.

    (Where is Visual 7.0 when you need it!!!!...Visual PERL to be included with it :-)

    Thank you,
    Joe Y.
    [email protected]

  2. #2
    Hyperactive Member
    Join Date
    Mar 2000
    Location
    Canada
    Posts
    264
    So, let me get it ...

    you have a bunch of HTML files, and you want to parse them out and get some information out of them ?

    If answer="yes" then
    You can use VB to open the file for reading, put it all in a
    variiable and use mid/left/right/instr functions to get
    your information.
    else
    Please explain.
    end if

    :-)
    In the beginning the universe was created. This has made a lot of people very angry and is generally regarded as a bad idea.

    - Douglas Adams
    The Hitchhiker's Guide to the Galaxy

  3. #3

    Thread Starter
    New Member
    Join Date
    Sep 2000
    Location
    Lafayette, Indiana
    Posts
    4
    Sorry for the poor english, been programming to much lately.

    The situation is this:

    Files are exported to very generic HTML files (stories)
    and what I need to do is take that generic code, parse it (maybe place each file into an array of an array (termed 'hashes' in perl) to index and call to paragraghs directly, headlines, etc.) into a template HTML page ready for the web.

    I have programed this in PERL and use it everyday, the problem is this, it's not pretty. It just works like a horse. I have looked at programing the GUI in Visual C++ and inport PERL into the C++ code. <~~Major Pain!

    VB is rather easy for setting up the GUI, and all I need to do is program for the captured events.

    So, in a less wordy version (0.01 alpha)

    DIM UglyHTML
    DIM TemplateHTML
    DIM PretyHTML

    open UglyHTML

    rawHTML= UglyHTML 'set rawHTML to hold the input html code


    open TemplateHTML
    tempHTML = TemplateHTML & rawHTML

    'from the template HTML add the rawHTML to it

    output pretty HTML from the tempHTML, save, close.

    :-}




  4. #4
    Monday Morning Lunatic parksie's Avatar
    Join Date
    Mar 2000
    Location
    Mashin' on the motorway
    Posts
    8,169
    Why don't you use Tk and make a GUI in Perl?
    I refuse to tie my hands behind my back and hear somebody say "Bend Over, Boy, Because You Have It Coming To You".
    -- Linus Torvalds

  5. #5
    Hyperactive Member
    Join Date
    Mar 2000
    Posts
    292
    Or maybe use your existing Perl program and just call it from your VB GUI?
    "People who think they know everything are a great annoyance to those of us who do."

  6. #6

    Thread Starter
    New Member
    Join Date
    Sep 2000
    Location
    Lafayette, Indiana
    Posts
    4
    I do not want to use Tk or Win32::GUI mod for perl, to do the GUI. Would rather have a portable .exe that has very little dependancies on what other stuff is installed (although, I have not completely written off going that way)

    'noone' --- that is a great approach, although, I do not know how to call non-window programs or scripts...How would one handle error events from the DOS window?

    Thanks again :-}

  7. #7
    Hyperactive Member
    Join Date
    Mar 2000
    Location
    Canada
    Posts
    264
    O.k, Maybe I am a bit heavy on understading ... (bin 12 hours at work yesterday and today will be the same my monitor died and than I descided to get home).

    If you need to get a block of text and "convert" it to HTML (is that it ?)

    Then there are lots of commands that you can use to parse text:

    stringVal = Left (string,number)
    stringVal = right (string,number)
    intVal=instr (string1,string2) ' returns the place of string 2 in string 1
    srtingVal = mid (string,place1,place2) ' returns the string between the two places

    intval = len (string) ' returns the length of a string

    string1 = "<html> & string1

    know that textboxes in VB are restricted to a certain size (don't remember how much but not too big ..) you might want to use arrays to store the information.

    I hope it helped a bit, tell me if you need more ..
    In the beginning the universe was created. This has made a lot of people very angry and is generally regarded as a bad idea.

    - Douglas Adams
    The Hitchhiker's Guide to the Galaxy

  8. #8
    Monday Morning Lunatic parksie's Avatar
    Join Date
    Mar 2000
    Location
    Mashin' on the motorway
    Posts
    8,169
    Joseph - there is a compiler somewhere which packages perl.exe along with all the dependencies (although it's a bit big). I think if you make sure you have VBScript installed on the target machine you can use regexps that way.
    I refuse to tie my hands behind my back and hear somebody say "Bend Over, Boy, Because You Have It Coming To You".
    -- Linus Torvalds

  9. #9
    Hyperactive Member
    Join Date
    Mar 2000
    Posts
    292
    I was thinking you could use shell to call your perl proggy and capture the command line output to a file.
    "People who think they know everything are a great annoyance to those of us who do."

  10. #10

    Thread Starter
    New Member
    Join Date
    Sep 2000
    Location
    Lafayette, Indiana
    Posts
    4

    Talking

    Weeelll, All...

    I think I have it. Just need to stomp about in VB for a bit but the basic logic is there. Just need to program a bit in BASIC again to learn the functions available to us.

    The over all direction I'm heading to is this...

    1. open directory -> get filenames
    2. filenames -> map HTML Tree (for links)

    3. open template HTML file based on filename -> Place
    generic HTML into 'content' section of Template.

    4. place HREF's on dynamic Indexes
    5. have a cup of coffee. ;-)

    Thank you all for the help, I'll try to answer any question about PERL...but I'm a reltively newbie

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width