PHP User Warning: fetch_template() calls should be replaced by the vB_Template class. Template name: bbcode_highlight in ..../includes/functions.php on line 4197
[RESOLVED] Text Search Ideas?-VBForums
Results 1 to 26 of 26

Thread: [RESOLVED] Text Search Ideas?

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Resolved [RESOLVED] Text Search Ideas?

    I have an application with about 1.5 MB of text in about 40 documents.

    These are stored with about another 1 MB of images in a Jet 4.0 database, each image being more or less a banner illustration for each document. The data is read-only right now though minor user editing might be added in the future.

    Users can enter search patterns used with the Jet SQL LIKE operator to locate the documents that match, which is case-insensitive. This becomes more important later when the number of documents triples. Performance is fine right now even when I insert 3 copies of everything to bloat the database out for testing.


    Once the user has the list of matching documents they can pick and read among them. They can also search within the documents (find, find next).

    I am looking for suggestions about using the same wildcard search within documents that is used to get the list of matching documents.

    The searched/displayed text is plain text. It is displayed in a RichTextBox. It is Unicode data but all of the characters are Windows-1252 ANSI subset so they'd all display fine even in a TextBox and converting the text to ANSI for searching would be possible if it helped.


    To give an example:

    User does a document search asking for: blood-*rush

    This gets changed into: [MemoField] LIKE "%blood-%rush%"

    ... in the SQL query. It matches phrases like "blood-soaked gold rush" just fine.

    The user chooses one of the matched documents and now wants to "find, find next" within the document using the same case-insensitive pattern they had entered already.

    Any ideas?

    The RichTextBox.Find method works for basic finds, but it doesn't do pattern matching as far as I can tell.

  2. #2
    Frenzied Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    1,360

    Re: Text Search Ideas?

    The FindText method of the TOM ITextRange object seems to support regex search - maybe that would do that trick? See the tomMatchPattern flag comments (haven't tried it myself though): https://docs.microsoft.com/en-us/win...range-findtext

  3. #3

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Yeah, that looks interesting but I can't find a single thing documenting what form of "regular expressions" it accepts and testing it didn't seem to do as I had hoped.

    It may only apply to Word's implementation of TOM. It might also be in a very late RichEdit version (8.0? 9.0?) but I haven't found any evidence of that. RichTextBox uses RichEdit 2.0, and I still have no description of the regular expression syntax used anyway.

    I may have been hoping for too much.

  4. #4
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    18,392

    Re: Text Search Ideas?

    Thinking that if your criteria contains simple wildcards and nothing much fancier, you could probably replicate it with a relatively small routine. Idea...

    1. Parse criteria out into Steps usable for InStr()
    2. Execute each step in a loop, exiting if any step fails
    i.e., again thinking out loud
    Criteria: %blood-%rush%
    Step 1: x = InStr(text, "blood-")
    Step 2: If x = 0 then abort else x = InStr(x, text, "rush")
    Loop ends with 2 steps & if x = 0 then no match else keep position of match in step 1 for "find next", "find previous"

    I think you can see where I was going with that
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  5. #5

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    That's an idea.

    Right now they can search for documents using basically the entire Like pattern capability. *, ?, #, []'ed character lists/ranges.

    It isn't perfect because I accept the VB/VBA Like pattern and try to convert it to an ANSI SQL-92 pattern that uses the %/_ instead of */? symbols. If they had a % in their input pattern it becomes an _ (underscore, "any single char"). If they had a leading or trailing * I remove those before adding % at both ends.

    Code:
        With TextBox
            Text = .Text
            If Left$(Text, 1) = "*" Then Text = Mid$(Text, 2)
            If Right$(Text, 1) = "*" Then Text = Left$(Text, Len(Text) - 1)
            .Text = Text
            Pattern = "%" _
                    & Replace$(Replace$(Replace$(Text, _
                                                 "?", _
                                                 "_"), _
                                        "%", _
                                        "_"), _
                               "*", _
                               "%") _
                    & "%"
        End With

  6. #6

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Come to think of it, since SQL Server doesn't accept the # match symbol and it uses ^ instead of ! for "not" I suppose that Jet SQL will do the same. I'll have to test that and see.

  7. #7

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Weird.

    The # doesn't work as a "digit" wild card. However [^a-z] fails and [!a-z] works! So not entirely SQL Server compatible I guess.

  8. #8
    Frenzied Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    1,360

    Re: Text Search Ideas?

    Bummer - I tried to get any kind of regex working with ITextRange.FindText, but nothing worked for me either (not even with Krool's RTB wrapper). Maybe it is Word only after all.

  9. #9
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    2,056

    Re: Text Search Ideas?

    Hi,

    not sure if I understand your requirements, but this would search in a RTB for a word or a part of the word

    Code:
    Private Sub Command4_Click()
       Dim Wort(2) As String
       Dim i As Long
       Dim j As Long
       Dim k As Long
       Dim z As Long
          Wort(0) = "om" 'part of a Word
          Wort(1) = " God " 'whole word
          Wort(2) = "inn" 'part of a word
     
          For i = LBound(Wort) To UBound(Wort)
             j = 1
             Do
    '            k = InStr(j, UCase(RichTextBox1.Text), UCase(Wort(i)))
                k = InStr(j, RichTextBox1.Text, Wort(i))
    
                If k = 0 Then
                   Exit Do
                End If
                With RichTextBox1
                   .SelStart = k - 1
                   .SelLength = Len(Wort(i))
                   .SelColor = vbBlue
                End With
                j = k + 1
                z = z + 1
             Loop
          Next
          MsgBox z
    End Sub
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  10. #10

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    If you just want to do simple searches like that you can use the Find() method. That works fine. It just can't search for occurrences of a Like pattern.

    I may just settle for these simple searches though.

    I have an idea that can use the patterns to search. It sounds slow but may be good enough. But the problem is that even though I'll know where to start the selection of the found text I won't know how many characters to select. The * can match runs of zero to many characters.

  11. #11
    Frenzied Member
    Join Date
    Feb 2017
    Posts
    1,843

    Re: Text Search Ideas?

    I agree with LaVolpe, you need to decompose the search string.

    Also ... when you have several words or elements in the search string, is it important that they be close to each other?
    Because if that doesn't matter, you can highlight all the matchings words and navigate them sequencially (with some button or F3), but if the proximity matters, you need to define how many words away can be one word from the others.

    Also, if proximity matters, you would need to re-filter the result of the SQL search.

  12. #12
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    2,056

    Re: Text Search Ideas?

    then you could try to search the RTB with Regex

    this pattern would return all words starting with H nomatter how many letters after it
    Code:
    H(\S+)\s?
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  13. #13
    Frenzied Member wqweto's Avatar
    Join Date
    May 2011
    Posts
    1,555

    Re: Text Search Ideas?

    Here is an idea how to convert wildcards to (escaped) regexp patterns:

    thinBasic Code:
    1. Option Explicit
    2.  
    3. '=========================================================================
    4. ' Methods
    5. '=========================================================================
    6.  
    7. Private Function pvRtbFindForward(oCtl As RichTextBox, sWildcard As String, Optional StartPos As Long) As Boolean
    8.     Dim oMatches        As Object
    9.    
    10.     Set oMatches = pvInitRegExp(pvToPattern(sWildcard)).Execute(Mid$(oCtl.Text, StartPos + 1))
    11.     With oMatches
    12.         If .Count > 0 Then
    13.             oCtl.SelStart = .Item(0).FirstIndex + StartPos
    14.             oCtl.SelLength = .Item(0).Length
    15.             '--- success
    16.             pvRtbFindForward = True
    17.         End If
    18.     End With
    19. End Function
    20.  
    21. Private Function pvRtbFindReverse(oCtl As RichTextBox, sWildcard As String, Optional StartPos As Long) As Boolean
    22.     Dim oMatches        As Object
    23.    
    24.     If StartPos <= 0 Then
    25.         Exit Function
    26.     End If
    27.     Set oMatches = pvInitRegExp(pvToPattern(StrReverse(sWildcard))).Execute(StrReverse(Left$(oCtl.Text, StartPos - 1)))
    28.     With oMatches
    29.         If .Count > 0 Then
    30.             oCtl.SelStart = StartPos - 1 - .Item(0).FirstIndex - .Item(0).Length
    31.             oCtl.SelLength = .Item(0).Length
    32.             '--- success
    33.             pvRtbFindReverse = True
    34.         End If
    35.     End With
    36. End Function
    37.  
    38. Private Function pvToPattern(sWildcard As String) As String
    39.     Dim esc             As String
    40.     Dim vSplit          As Variant
    41.     Dim lIdx            As Long
    42.    
    43.     '-- split wildcard to [text1, symbol1, text2, symbol2, text3, ...] array
    44.     '-- note: esc can be a random symbol in U+E000 to U+F8FF range -- Private Use Area (PUA)
    45.     esc = ChrW$(&HE1B6)
    46.     vSplit = Split(pvInitRegExp("[*?%]").Replace(sWildcard, esc & "$&" & esc), esc)
    47.     '-- escape plain-texts and convert wildcard symbols to regexp patterns
    48.     For lIdx = 0 To UBound(vSplit)
    49.         Select Case vSplit(lIdx)
    50.         Case "*"
    51.             '-- note: *? suffix is for non-greedy any length match
    52.             vSplit(lIdx) = ".*?"
    53.         Case "?", "%"
    54.             vSplit(lIdx) = "."
    55.         Case Else
    56.             vSplit(lIdx) = pvEscapeText(vSplit(lIdx))
    57.         End Select
    58.     Next
    59.     pvToPattern = Join(vSplit, vbNullString)
    60. End Function
    61.  
    62. Private Function pvEscapeText(ByVal sPattern As String) As String
    63.     pvEscapeText = pvInitRegExp("[.*+?^${}()/|[\]\\]").Replace(sPattern, "\$&")
    64. End Function
    65.  
    66. Private Function pvInitRegExp(sPattern As String) As Object
    67.     Set pvInitRegExp = CreateObject("VBScript.RegExp")
    68.     pvInitRegExp.Global = True
    69.     pvInitRegExp.IgnoreCase = True
    70.     pvInitRegExp.Pattern = sPattern
    71. End Function
    72.  
    73. '=========================================================================
    74. ' Control events
    75. '=========================================================================
    76.  
    77. Private Sub Text1_Change()
    78.     pvRtbFindForward RichTextBox1, Text1.Text
    79. End Sub
    80.  
    81. Private Sub Command1_Click()
    82.     pvRtbFindForward RichTextBox1, Text1.Text, RichTextBox1.SelStart + 1
    83. End Sub
    84.  
    85. Private Sub Form_KeyDown(KeyCode As Integer, Shift As Integer)
    86.     Select Case KeyCode + Shift * &H1000&
    87.     Case vbKeyF3
    88.         Command1_Click
    89.     Case vbKeyF3 + vbShiftMask * &H1000&
    90.         pvRtbFindReverse RichTextBox1, Text1.Text, RichTextBox1.SelStart + RichTextBox1.SelLength
    91.     End Select
    92. End Sub
    93.  
    94. Private Sub Form_Load()
    95.     KeyPreview = True
    96.     RichTextBox1.HideSelection = False
    97. End Sub
    98.  
    99. Private Sub Form_Resize()
    100.     Dim dblTop          As Double
    101.    
    102.     If WindowState <> vbMinimized Then
    103.         dblTop = 60
    104.         Text1.Move dblTop, dblTop, ScaleWidth - Command1.Width - 2 * dblTop
    105.         Command1.Move Text1.Left + Text1.Width, dblTop, Command1.Width, Text1.Height
    106.         dblTop = Text1.Top + Text1.Height + 60
    107.         RichTextBox1.Move 0, dblTop, ScaleWidth, ScaleHeight - dblTop
    108.     End If
    109. End Sub
    You'll need a RichTextBox1, Text1 and Command1 and a reference to Microsoft Rich Textbox Control 6.0 for the sample to run. It implements both forward F3 and backward search w/ Shift+F3

    cheers,
    </wqw>

  14. #14

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    I'm not after really powerful search syntax, this application isn't a programmer's tool. I wanted it to be simple to use, preferably be able to use the pattern typed in one TextBox for both searches.

    I can't just ignore things that only work in one search and let them slide, because it would be confusing to search for documents that match only to find that nothing matches anywhere within one of those documents.


    Now I think the answer might be to restrict the pattern used in the SQL search to the limits of RTB.Find() method searching, i.e. no wildcard symbols.

    That discards a lot of power but might achieve the more important goal of using a single pattern entered in one TextBox and not producing confusing "found it, but can't find it" results.

    That would be anticlimactic, dumbing down the first kind of search to match the second kind. But I think it might be a better decision. Searching will mainly be for individual words or short phrases by non-programmer end users anyway so wildcard patterns are probably overkill.

  15. #15

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Hmm, sounds like RTB.Find() can only find-forward so find-previous is a lot more work.

    I might look to TOM for find/next/prev.

    EM_FINDTEXTEX w/o FR_DOWN flag requires RichEdit 2.0 and the RTB wraps that but puts it into 1.0 emulation mode. I don't want to wrap a Win32 RichEdit control if I don't have to.

  16. #16
    Frenzied Member
    Join Date
    Feb 2017
    Posts
    1,843

    Re: Text Search Ideas?

    Or get the plain text and use InStr and InStrRev.

  17. #17
    PowerPoster
    Join Date
    Jun 2013
    Posts
    4,379

    Re: Text Search Ideas?

    I'd use a Browser-Control as the "Visualizer" (easy to include via Controls.Add) -
    and for the DB-Search an SQLite-Fulltext-Search can be easily applied.

    These two make a good team, since SQLites FTS5 comes with a built-in HiLite-SQL-function,
    which allows to "mark" the found tokens (within plain-text-documents) with arbitrary pre- and suffixes.

    A simple <prefix>...<suffix> pair (to visualize text as "highlited HTML") would be:
    <b>...</b> (to mark the found tokens which were a match, in bold)...

    In the code-snippet below (which only requires an empty Form and a RC5-reference)
    I've choosen: <span class='fts-found' style='background-color:yellow'> ... </span>

    The "unique span-classname" ('fts-found') will then make it possible -
    (after the "enhanced plain-text-content" was rendered as HTML),
    to easily find and navigate between the "found and spanned tokens":
    - by getting a List of them from the BrowserControl-Document via: .getElementsByClassName("fts-found")
    - and on that returned List, one can then <F3> via zerobased indexes, doing: FoundList(Idx).scrollIntoView

    Code:
    Option Explicit
    
    Private Cnn As cConnection, CmdInsert As cCommand, CmdSearch As cSelectCommand
    Private WithEvents WB As VBControlExtender, Doc As Object
    
    Private Sub Form_Load()
      Caption = "Click at the Form-Area (not the Browser-Area)"
     
      Set Cnn = New_c.Connection(, DBCreateInMemory) 'SQLite InMemory-Connection
          Cnn.Execute "Create Virtual Table Docs Using fts5(DocTxt)" 'a Fulltext-Table (with a single Field: DocTxt)
          
      'now two Commands (one for inserts, and one for the search)
      Set CmdInsert = Cnn.CreateCommand("Insert Into Docs Values(?)")
      Set CmdSearch = Cnn.CreateSelectCommand("Select highlight(Docs, 0, ?, ?) As HiText FROM Docs(?)")
      
      'Ok, we also need a webbrowser-control
      Set WB = Controls.Add("Shell.Explorer.2", "WB")
          WB.Visible = True
          WB.object.navigate2 "about:blank"
      Do Until WB.object.readyState = 4: DoEvents: Loop
      Set Doc = WB.object.Document
      
      'and finally we insert a single PlainText-Document into the FTS-Table (for testing)
      Dim DocLines(99) As String
          DocLines(0) = "Subject: fool me once..."
          DocLines(1) = "..."
          DocLines(98) = "Regards from,"
          DocLines(99) = "foo@bar.com"
      InsertNewDoc Join(DocLines, vbCrLf)
    End Sub
    
    Private Sub Form_Click()
      With FindDocs("fo*")
        If .RecordCount = 0 Then MsgBox "No Text-Document(s) found": Exit Sub
        
        'visualize the highlited Plain-Text (of the first found DoxText-Record) in the Browser-Control
        Doc.Body.innerhtml = Replace(.Fields(0).Value, vbCrLf, "<br>")
        
        'all the "yellow-spans" sqlite has surrounded the found tokens with,
        'were given the class-name: "fts-found", which makes it quite easy -
        'to retrieve a filtered list from the HTML-Document via getElementsByClassName()
        Dim FoundList As Object, i As Long
        Set FoundList = Doc.getElementsByClassName("fts-found")
        '...and this could be used with repeated <F3> KeyPresses
        For i = 0 To FoundList.length - 1
          FoundList(i).scrollIntoView
          MsgBox "Scrolled to Element: " & (i + 1)
        Next
      End With
    End Sub
    
    'just two small helpers around the appropriate Command-Objects
    Sub InsertNewDoc(DocText As String)
      CmdInsert.SetText 1, DocText
      CmdInsert.Execute
    End Sub
    Function FindDocs(sToFind As String) As cRecordset
      CmdSearch.SetText 1, "<span class='fts-found' style='background-color:yellow'>"
      CmdSearch.SetText 2, "</span>"
      CmdSearch.SetText 3, Replace(Replace(sToFind, "-", "NOT "), " or ", " OR ", , , 1) '<- make it more google-compatible
      On Error GoTo 1 'there could be errors due to incomplete search-terms...
         Set FindDocs = CmdSearch.Execute
    1 If Err Then Set FindDocs = Cnn.OpenRecordset("Select 0 Where 0") 'in that case return a valid Rs with RecordCount 0
    End Function
    Olaf
    Last edited by Schmidt; Oct 10th, 2019 at 09:19 AM. Reason: Update of the Find-Routine (google-search-compatiblity + some hardening)

  18. #18

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Yes, those are some other possible options. Thanks.

  19. #19
    Frenzied Member wqweto's Avatar
    Join Date
    May 2011
    Posts
    1,555

    Re: Text Search Ideas?

    Quote Originally Posted by dilettante View Post
    I'm not after really powerful search syntax, this application isn't a programmer's tool. I wanted it to be simple to use, preferably be able to use the pattern typed in one TextBox for both searches.
    My idea was to convert a simple wildcard pattern (e.g. aa*bb) to a correct regexp pattern (i.e. aa.*bb) which includes first escaping the "plain-text" part (which might contain regexp control symbols like [ ] \ .) and then convert wilcards characters with simple mapping * -> .*? (using ? for non-greedy match), ? -> . and % -> . -- this is all done in the pvToPattern function (which is the only interesting function in the code snippet above).

    cheers,
    </wqw>

  20. #20

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Good point. I appreciate the input.

  21. #21
    Frenzied Member
    Join Date
    Feb 2017
    Posts
    1,843

    Re: Text Search Ideas?

    It seems to me that you don't want to develop a very sophisticated search system, and that you have it already done and working.
    But since you are asking for ideas, I just wanted to add to the pool the idea of a completely different approach.

    It is building an inverted index.
    This is the technology used by web search engines. It can handle a large number of documents.
    But it would need much coding and a database.
    This technology can be further developed to search by lemma and by synonyms.
    It can be used to make a small preview of the text found on each document and to rank the documents based on some criteria (such as if the words repeat more times, or if they are closer to each other).

    One issue is that the index is built at a time, and if the documents are changed after that time, the index is not automatically updated.
    In the traditional form, you need to built the whole index again to reflect any change (when documents changed or new documents were added).

    It can be developed an index that is updated immediately upon document changes, but that would add even more complexity.

    Basically, an inverted index is this:
    You have a dictionary with all the words, a number (Word_ID) for each word of the dictionary.
    Then, at the time of the indexing, for each document you look for each word in the document and build a database storing the Word_ID of each word and the position.
    Then instead of text you have numbers.
    You can have also another table with only what words are used on each document, and how many times they repeat (without the positions of the words inside the document).

    Then it makes the searches very fast, because you look for indexed numbers, not text.
    In some milliseconds you can have the list of documents that use some set of words. And it can handle the data of many documents.

    Then when you have one document opened, you can use the same information to highlight the words inside the document (if you have the positions of the words indexed).

    Perhaps this idea is for the future, if you need to add the ability to handle much more documents.

  22. #22

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    I looked at that and then did some testing with 40 documents, then 60, then 100. I was glad I did because even searching a Jet MEMO column using LIKE in SQL it is far quicker than I had hoped. The data will be read-only in normal use so no locking overhead.

    Worked great for almost zero effort aside from sanitizing the pattern string input.


    The issue was that if I allowed fancy patterns with wildcards and such I needed an exact equivalent for find/next/prev within a selected/viewed document. Then there wouldn't be any confusion over two search pattern syntaxes and one TextBox could be used for both search types.

    Turned out the users don't want wildcards anyway. I brought it up in a prototype and two people got excited then immediately got confused. The other two testers wanted no part of it.

    The entire problem was simplified. Now I only need to transform any characters they type that would be wildcard symbols and I'm all done. Jet SQL finds documents,TOM can find/next/prev within a document.


    All I have to do now is add highlighting/bookmarking selected paragraphs, which they want stored separately from the other data.

  23. #23
    Hyperactive Member
    Join Date
    Mar 2018
    Posts
    340

    Re: Text Search Ideas?

    Quote Originally Posted by dilettante View Post
    I brought it up in a prototype and two people got excited then immediately got confused. The other two testers wanted no part of it.
    always happens. Then they will ask you to add wildcards 6 months from now when they realize the need more robust search

  24. #24

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    You could be right.

    Right now they have no search at all though, so they are a bit giddy to have any improvement over the old web site currently holding the data hostage. Their pattern of use has them bouncing all over among the documents looking things up and comparing them. This is so much quicker with desktop local storage vs. hosted data captivity.


    I'll mark this resolved even though my resolution has been to retrench to lowered expectations. Thanks to all of you who generously offered me alternative approaches.
    Last edited by dilettante; Oct 10th, 2019 at 07:04 PM.

  25. #25
    Addicted Member
    Join Date
    Aug 2016
    Posts
    174

    Re: Text Search Ideas?

    Quote Originally Posted by dilettante View Post
    I looked at that and then did some testing with 40 documents, then 60, then 100. I was glad I did because even searching a Jet MEMO column using LIKE in SQL it is far quicker than I had hoped. The data will be read-only in normal use so no locking overhead.

    Worked great for almost zero effort aside from sanitizing the pattern string input.


    The issue was that if I allowed fancy patterns with wildcards and such I needed an exact equivalent for find/next/prev within a selected/viewed document. Then there wouldn't be any confusion over two search pattern syntaxes and one TextBox could be used for both search types.

    Turned out the users don't want wildcards anyway. I brought it up in a prototype and two people got excited then immediately got confused. The other two testers wanted no part of it.

    The entire problem was simplified. Now I only need to transform any characters they type that would be wildcard symbols and I'm all done. Jet SQL finds documents,TOM can find/next/prev within a document.


    All I have to do now is add highlighting/bookmarking selected paragraphs, which they want stored separately from the other data.
    Global Search from whole database and local search from current opened rtf is a bonus for users. Good to have!

    It seems to be a knowledge management project.

  26. #26

    Thread Starter
    PowerPoster
    Join Date
    Feb 2006
    Posts
    20,549

    Re: Text Search Ideas?

    Quote Originally Posted by DaveDavis View Post
    It seems to be a knowledge management project.
    Actually it is more humorous than that, at least to me:

    Fan-created TV series transcripts they pore over obsessively as they repeatedly analyze episodes trying to support their theories regarding mysteries, events, and character motivations. To assist them in arguing with each other they have accepted the content at one web site as canon. Luckily the site they've agreed on allows me to scrape the data for them as long as I "pace" the process with a 60 second delay between downloads.

    The content is a bit of a travesty of typos and punctuation flaws so I have scripted a cleanup process to patch up the text. Aside from misspellings and some problems with accented letters the text has tons of line-break errors.

    I get to choose the screencap pictures myself as long as I avoid "blank" ones. Not as critical to them since the text is king, but they want them to make the application "look prettier." I suspect most users will just select the "hide pictures" option quickly anyway.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width