|
-
Feb 27th, 2001, 11:15 AM
#1
Thread Starter
Junior Member
I'm starting the process of creating a CDROM distribution and will be writing the systems in VB.
One of the functions of the CD should be free text searching on the 10,000 (or so) documents stored on it(html or text).
I'm pretty comfortable with our Website implementation of this as I´ll be using ColdFusion with Verity as the indexing software. Works a treat. Trust me.
Anyone have any fine ideas on products that´ll allow me to implement the same speedy search results / indexing on the CDROM using a VB program?
I've seen a few products on the market but at prices of $6000+ (with extra licensing costs too! Marvellous...)
As this CD is likely to be for fairly limited distribution it's simply not worth the cost. Any one seen any MUCH cheaper products that'll do the job?
-
Feb 27th, 2001, 12:10 PM
#2
Retired VBF Adm1nistrator
Why dont you just write a search algorithm yourself ?
First go though the drive and record all filenames + paths in an array.
Then go through the array, read the entire file into a buffer :
Code:
Open file(i) For Binary Access Read As #1
Get #1,,Buffer
If (InStr(Buffer, SomeWord) <> 0) Then
'Word was found in that file ...
End If
Close #1
- jamie
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
Feb 28th, 2001, 05:28 AM
#3
Retired VBF Adm1nistrator
Well, another thing you could do, is list all files + put into an array. Then, lind of like below :
Code:
Dim var_array() As String
Open file(i) For Binary Access Read As #1
Get #1,,Buffer
var_array() = Split(Buffer, " ")
Close #1
The above piece of code would then put every word (a word being any string separated by a space), into an array. Then you could iterate through the array using a loop, and add the word to a dictionary.
The dictionary object is quite cool 
It can add about 4500 words to a dictionary in less than 2 seconds (well on this P-III 650 anyway).
You would do it something like :
Code:
Private d As New Dictionary
Private Function Add(x As String, y As String)
d.Add x, y
End Function
Private Function SomeFunction()
For i = 0 to UBound(var_array)
If (d.Exists(var_array(i)) = False) Then
Add var_array(i), filename_it_was_found_in
End If
Next i
End Function
You could store the word list then on the cdrom somewhere (encrypted I'd say), and then load at runtime. In one of my apps, I load 4300 entries into a dictionary at runtime. Takes less than a second.
- jamie
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
Feb 28th, 2001, 06:24 AM
#4
Thread Starter
Junior Member
Hey, smart!
Yep, I like that one!
I was worried about access times more than anything, but judging by your example timings, maybe I shouldn´t be!
...Checked out dictionary object & you're right! It's a good one...
I guess maybe I could add another dimension for the number of times the word appears in the file (increment on each find) and I'd have the basis of a "weighted" search as well.
(Maybe that rules out the dictionary object, as it needs 3 dimensions - but I'll keep thinking!)
Thanks very much for the help on this!!
Cheers, Shaun.
-
Feb 28th, 2001, 07:19 AM
#5
Retired VBF Adm1nistrator
Well,
you could do something like :
Code:
Add word(i), file1;file2;file3
Seperate the files that the word appears in with a semi-colon, and then use the split function later :
Code:
Dim var_array() As String
var_array() = Split(d.Item(word(i)).Key, ";")
for i = 0 to ubound(var_array)
Debug.Print "Word : " & word(i) & " appears in : " & var_array(i)
next i
The above usage of the dictionary object is probably wrong. Its been a while since I've coded with it, but its something along those lines.
In relation to a multidimensional dynamic array, I'd avoid it. They just eat memory.
- jamie
Last edited by plenderj; Feb 28th, 2001 at 07:32 AM.
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
Feb 28th, 2001, 07:30 AM
#6
Thread Starter
Junior Member
Yep! and Yep! again!
Even better, you're right!!
Thanks again! 
Shaun.
-
Feb 28th, 2001, 07:40 AM
#7
Retired VBF Adm1nistrator
I dunno if you know this, but with dynamic multidimensional arrays you can only change the last dimension, and it will change it for everything.
For Example :
There is 2-dimensional array called var_array()
ie. var_array(x, y)
Lets say the array has been dimensioned so that
var_array(99, 99) is the upper bound of the entire array.
So at the moment, thats 99*99 = 9801 array elemts.
Each array element will take up a minimum of 1 byte of memory (byte data type uses lowest amount of memory).
Then you do :
ReDim Preserve var_array(99, 128).
The total number of elements is now : 128*99 = 12672
One might think that index 99 of the first dimension now has 128, and the others still have 99, but it will apply the new dimension to all elements.
Then if you were to use more than 2 dimensions, you're just wasting memory big-time.
So I use them sparingly.
- jamie
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
Mar 2nd, 2001, 07:42 AM
#8
Thread Starter
Junior Member
1/2 way there...
...OK,OK...proof that this forum is cool...
Largely, thanks to the advice given, I'm halfway there!

Created an database indexed on all of the words, all parsed nice and neat, along with the files they appear and the number of times...Been testing the results and they seem accurate.
It's pretty slow to do the indexing (there's quite a few parsing algortihms needed) but as long as retrieval is quick I don't care! 
Now to the retrieval program...
Thanks again & I'll be avoiding the arrays too!
-
Mar 2nd, 2001, 07:49 AM
#9
Retired VBF Adm1nistrator
Well you only have to do the parsing once.
So I'd spend as much time on it as possible, and have every word etc. indexed.
Then in future you just look up the index.
But you know that bit already 
- jamie
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|