Click to See Complete Forum and Search --> : Using VB to parse HTML pages?
Joseph Youngquist
Sep 29th, 2000, 11:23 AM
Need to know if/how to parse generic html files to a full-blown html page. Currently I use PERL with the HTML::TreeBuilder Mod which builds the html structure, stores html data into hashes to be referenced at will.
I think there are similar libraries for c++ but wondering if Visual Basic has anything along the ability to parse files, to simplify the development of an app that I have already made with PERL. Any input would be great.
(Where is Visual 7.0 when you need it!!!!...Visual PERL to be included with it :-)
Thank you,
Joe Y.
jyoungqu@journal-courier.com
asabi
Sep 30th, 2000, 01:43 AM
So, let me get it ...
you have a bunch of HTML files, and you want to parse them out and get some information out of them ?
If answer="yes" then
You can use VB to open the file for reading, put it all in a
variiable and use mid/left/right/instr functions to get
your information.
else
Please explain.
end if
:-)
Joseph Youngquist
Sep 30th, 2000, 04:43 PM
Sorry for the poor english, been programming to much lately.
The situation is this:
Files are exported to very generic HTML files (stories)
and what I need to do is take that generic code, parse it (maybe place each file into an array of an array (termed 'hashes' in perl) to index and call to paragraghs directly, headlines, etc.) into a template HTML page ready for the web.
I have programed this in PERL and use it everyday, the problem is this, it's not pretty. It just works like a horse. I have looked at programing the GUI in Visual C++ and inport PERL into the C++ code. <~~Major Pain!
VB is rather easy for setting up the GUI, and all I need to do is program for the captured events.
So, in a less wordy version (0.01 alpha)
DIM UglyHTML
DIM TemplateHTML
DIM PretyHTML
open UglyHTML
rawHTML= UglyHTML 'set rawHTML to hold the input html code
open TemplateHTML
tempHTML = TemplateHTML & rawHTML
'from the template HTML add the rawHTML to it
output pretty HTML from the tempHTML, save, close.
:-}
parksie
Sep 30th, 2000, 05:13 PM
Why don't you use Tk and make a GUI in Perl?
noone
Sep 30th, 2000, 06:59 PM
Or maybe use your existing Perl program and just call it from your VB GUI?
Joseph Youngquist
Oct 1st, 2000, 10:38 AM
I do not want to use Tk or Win32::GUI mod for perl, to do the GUI. Would rather have a portable .exe that has very little dependancies on what other stuff is installed (although, I have not completely written off going that way)
'noone' --- that is a great approach, although, I do not know how to call non-window programs or scripts...How would one handle error events from the DOS window?
Thanks again :-}
asabi
Oct 1st, 2000, 11:19 AM
O.k, Maybe I am a bit heavy on understading ... (bin 12 hours at work yesterday and today will be the same my monitor died and than I descided to get home).
If you need to get a block of text and "convert" it to HTML (is that it ?)
Then there are lots of commands that you can use to parse text:
stringVal = Left (string,number)
stringVal = right (string,number)
intVal=instr (string1,string2) ' returns the place of string 2 in string 1
srtingVal = mid (string,place1,place2) ' returns the string between the two places
intval = len (string) ' returns the length of a string
string1 = "<html> & string1
know that textboxes in VB are restricted to a certain size (don't remember how much but not too big ..) you might want to use arrays to store the information.
I hope it helped a bit, tell me if you need more ..
parksie
Oct 1st, 2000, 01:50 PM
Joseph - there is a compiler somewhere which packages perl.exe along with all the dependencies (although it's a bit big). I think if you make sure you have VBScript installed on the target machine you can use regexps that way.
noone
Oct 1st, 2000, 04:19 PM
I was thinking you could use shell to call your perl proggy and capture the command line output to a file.
Joseph Youngquist
Oct 3rd, 2000, 08:33 AM
Weeelll, All...
I think I have it. Just need to stomp about in VB for a bit but the basic logic is there. Just need to program a bit in BASIC again to learn the functions available to us.
The over all direction I'm heading to is this...
1. open directory -> get filenames
2. filenames -> map HTML Tree (for links)
3. open template HTML file based on filename -> Place
generic HTML into 'content' section of Template.
4. place HREF's on dynamic Indexes
5. have a cup of coffee. ;-)
Thank you all for the help, I'll try to answer any question about PERL...but I'm a reltively newbie
vbforums.com
Copyright Internet.com Inc., All Rights Reserved.