Page 2 of 2 FirstFirst 12
Results 41 to 66 of 66

Thread: VB6 Automated Source Code Processing Helper

  1. #41
    PowerPoster
    Join Date
    Jan 2020
    Posts
    3,746

    Re: VB6 Automated Source Code Processing Helper

    vbflexgrid
    VbPcre2

    What automated code downloads multiple projects from open source websites or multiple download addresses, and then compiles them into multiple DLLs.

  2. #42
    PowerPoster
    Join Date
    Jan 2020
    Posts
    3,746

    Re: VB6 Automated Source Code Processing Helper

    The primary goal of this project is to have a comprehensive wrapper for PCRE2 in an ActiveX DLL for use in VB6 or other COM supporting languages.

    The secondary goal of this project is to be a drop-in replacement for the VBSscript RegExp object.
    Can also use the regular expression of JS? Which of these three methods is faster? It's more convenient.

  3. #43

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by xiaoyao View Post
    If you can provide the complete source code, don't use other DLLs.
    Sorry, but my stuff probably isn't going to be of interest to you if you don't want to use other DLLs/OCXs. I almost always use at least RC5/RC6 in the stuff I publish, because it saves a lot of time and hassle, and I don't mind having the dependency.

    Quote Originally Posted by xiaoyao View Post
    If you can implement all the lexical analysis of the source code, and even add new syntax, that would be perfect.
    That's going way beyond the scope of the goal of this project. It's only intended to give you access to the logical lines of your source (with a bit of extra metadata to help you determine the type of line) so you can make pattern based substitutions, test for things that don't meet your coding standards, automatically insert boilerplate code, etc... So it's "dumb" on purposes - the brains are the ISourceProcessor implementing classes that you write yourself to do whatever you want with the code.

    Quote Originally Posted by xiaoyao View Post
    If you use multithreading, the speed should be several times faster, right? For example, a CPU with 6 cores uses 12 multithreads for processing.
    Multithreading is a possibility that I might look into...it adds some complexity, and I'm not sure whether the overhead will guarantee a massive win, but it might be worth a try.

  4. #44

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by xiaoyao View Post
    The primary goal of this project is to have a comprehensive wrapper for PCRE2 in an ActiveX DLL for use in VB6 or other COM supporting languages.

    The secondary goal of this project is to be a drop-in replacement for the VBSscript RegExp object.
    Can also use the regular expression of JS? Which of these three methods is faster? It's more convenient.
    There's a thread comparing the speed of various regex engines here starting here: https://www.vbforums.com/showthread....=1#post5444643

    VBPCRE2 does not perform very well - I need take a pass at optimizing it, but I haven't had the time/inclination. Instead, the minimal PCRE2 (.BAS modules only, no Classes) performs much better: https://www.vbforums.com/showthread....=1#post5444745

    That said, it might depend on your workload which engine does the best (although it looks like @wqweto's vbPeg (introduced in this post) kicks some serious "you know what".

  5. #45
    PowerPoster
    Join Date
    Jan 2020
    Posts
    3,746

    Re: VB6 Automated Source Code Processing Helper

    VbPcre2,There is no problem with this source code. At first, I thought I needed to download something I didn't know.
    In the past, I downloaded web pages to collect commodity information in my own way, and the code often needs to be modified. If I use real expressions, I just need to make a regular expression formula table.
    It would also be convenient if you could parse VB6 project files with regular expressions. There are also control properties for the form file.

    Every function, every procedure, every program code can be parsed with regular expressions, which is also very convenient.

  6. #46
    Frenzied Member
    Join Date
    Aug 2020
    Posts
    1,421

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by jpbro View Post
    Thanks @SDO - it's still very much a work in progress, so make sure you have backups before you use it on your important stuff!

    Also, you might want to wait an hour or two before you spend some time with it, I have another update coming shortly.
    Quote Originally Posted by jpbro View Post
    I just updated the source code in the first post.

    I primarily focused on performance improvements for this release, so the biggest feature of the latest version is that it is significantly more efficient. On my machine, the example processor now takes about 4 minutes for a project with almost 1000 source files and approximately 775,000 lines. The previous version took about 14 minutes, so not bad! This was achieved through a few means:

    • I removed the dependency on the class based VBPCRE2.dll PCRE2 wrapper, and I am now using a minimal straight to pcre2-16.dll .BAS wrapper.
    • I swapped out some RC6.cArrayList use for plain-old VB6 String Arrays. The cArrayList was overkill because I'm not inserting/adding/removing elements which are activities where the cArrayList really shines.
    • I took some logic out of the VB6 code for finding lines and moved it into SQL statements (and added some hopefully useful indexes for the queries to work against, though I haven't had time to check the query plans).
    • Enabled all the usual compiler optimizations.
    • Mapped the source code string data to in integer SafeArray. Loops/tests are against the array which avoids a lot of string operations/comparisons.


    I also fixed a few more bugs and added some other niceties:

    • Better progress and elapsed time indicators on the Progress window.
    • Added total source lines count the Stats tab.
    • The Log tab now auto updates to show the # of rows as log level types are toggled on/off.
    • I've added a System folder where you should drop pcre2-16.dll. VBPCRE2.dll is no longer required.


    I think that's everything for now, enjoy
    I tested VB6SourceProcessor4 and it worked very well. I scanned my projects with it and it checked out that one of my cls files was stored in the wrong path, which was great.

    A few months ago, an important file in one of my projects was stored in the wrong path (c:\windows\system), and when I reinstalled windows, the file was lost.

    Now, VB6SourceProcessor4 has helped me avoid similar pitfalls.

    Also, the day before yesterday, I found a small bug in VB6SourceProcessor2, and I was about to report it, but this bug no longer exists in VB6SourceProcessor4.

    You shared another outstanding tool, just like you did before. I believe VB6SourceProcessor will be of great help to my new project (a project analysis tool), thank you, jpbro.
    Last edited by SearchingDataOnly; Mar 23rd, 2023 at 10:02 PM.

  7. #47
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by jpbro View Post
    WORK IN PROGRESS

    This is a work in progress - it does a pretty good job of doing what it is trying to do, but there are definitely holes in the implementation. If you can provide any source code that fails, I will be happy to fix my code to accommodate it!

    WHAT IS THIS?

    This project parses VBG & VBP files, building a list of all source code files (with limitations, see the KNOWN/EXPECTED PROBLEMS section below). You can then loop through each source file, and each line of code therein to perform just about any line-level custom processing you can imagine. The project includes fast PCRE2 regex support for finding matching lines of code (in forward and reverse directions).

    When you find a line you are looking for, you can optionally replace the text with something else, and when you are done you can save the file back to disk.
    ... .. .
    ...
    ..
    .
    Hope some of you find this project useful!
    Dear JpBro,

    First of all, very many thanks for conceptualising a project of this sort. Of course, Thanks a TON for the project itself.

    Actually, I came here by chance. How I came here is as follows:

    1. Recently, I started using VbScript RegEx as UDF in Sql queries (All thanks, as ever, to great Olaf; https://www.vbforums.com/showthread....=1#post5588333)

    2. When I wished to use lookbehinds, I understood that vbr (Vbscript Regex) would not support the same. I immediately remembered your vbpcre2 which I had used some time ago in a project. So, I started using it as an UDF in Sql queries (thanks again to great Olaf for guiding me on the same) but then I found it too slow when compared to vbr.

    3. Before reporting the same in the vbpcre2 thread, I thought I will make some explorations at my end so that I can share my findings in case my observation that pcre2 is slower than vbr was wrong.

    4. Thereafter, before reporting my observation, I thought I will better search in the net on "vb6 pcre2 is slower than regex" and see what it says.

    5. The above search listed this thread as the first page and that's how I am here.

    6. Well, now that I know (after quickly going through some posts of this thread) that vbpcre2 is indeed slower than vbr, I have a few questions:
    --
    a. You have written in one of the posts that you started accessing pcre2-16.dll directly instead of from the class-based wrapper (vbpcre2.dll). Is it possible for me also to do the same? If so, how?

    b. By accessing pcre2-16.dll directly, did it increase the speed of processing to be as fast as vbr at least, if not faster? Or, is it still slower?

    c. Any plans of upgrading vbpcre2 to the latest pcre2 version?

    d. You have written that you are not getting time to work on increasing the speed of the existing vbpcre2.dll. I sincerely prayed just now that you do get the time - either for increasing the speed of vbpcre2.dll OR for increasing the speed of processing of your direct accessing of pcre2-16.dll so that the speed is at least as fast as vbr and if possible faster than vbr itself. That would be a fantastic big big big boon to the society. I prayed well for Olaf to get time too, somehow, to join hands with you (much the same way Tanner was able to do, years back) and succeed in increasing the speed of either vbpcre2 or 'the process of direct accessing of pcre2-16.dll' to be much much faster than VBScript Regex.
    --

    Coming back to your parser project (which is what this thread is all about), I am sure finding out methods not having "On error goto" at the top will be very useful for me (since so far I have been doing it manually only). Once I download the project and get time to start exploring it (I have noted your warnings well), I am sure I will find many more uses. As for suggestions (as asked by you), I am sure there is a tool/feature already existing, neatly tabulating the number of lines in each .frm, .bas, .cls, etc. and the total number of lines under each of these heads (frm, cls, bas, etc.) and finally in the whole project itself. As I explore, I will get to know anyway, whether this feature exists or not. Nevertheless, since I am here now writing, just thought of quickly letting you know this need, in case it is not already existing. Thanks a TON, once again.

    God Bless you, jpbro and tanner. God Bless all.

    Happy Christmas week! . And, a very Happy New year!

    Kind Regards.

  8. #48

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by softv View Post
    First of all, very many thanks for conceptualising a project of this sort. Of course, Thanks a TON for the project itself.
    Thank you for the kind words. This project is very rough, and I don't have much time to improve it so don't thank me too much yet

    Quote Originally Posted by softv View Post
    Actually, I came here by chance. How I came here is as follows:
    Thank you also for the detailed steps on how you got here, it gives me some good context after being away from the project for so long. Maybe I'm just an LLM after all!

    Quote Originally Posted by softv View Post
    a. You have written in one of the posts that you started accessing pcre2-16.dll directly instead of from the class-based wrapper (vbpcre2.dll). Is it possible for me also to do the same? If so, how?
    Yup! Just add the *Pcre.bas and *Regex.bas modules to your project and you can work against it.

    Just dump VbPrce.dll - it's ancient code and basically junk.


    Quote Originally Posted by softv View Post
    b. By accessing pcre2-16.dll directly, did it increase the speed of processing to be as fast as vbr at least, if not faster? Or, is it still slower?
    It increased speed dramatically, but I didn't benchmark it against anything else (feel free to run and post benchmarks if you care to). That said, based on my past results in performance teste after some optimization passes, I do ok but if you want peak performance you should look elsewhere.

    Quote Originally Posted by softv View Post
    c. Any plans of upgrading vbpcre2 to the latest pcre2 version?
    0% plans at this point unless you have a compelling reason?

    I hope that helps a bit, and I appreciate the rest of your post, but it will take more time than I have to respond properly right now. Happy New Year to you too!
    Last edited by jpbro; Dec 29th, 2023 at 12:53 AM.

  9. #49
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    Thanks a lot for your kind replies, JpBro, even amidst your hectic schedules.

    I indeed wanted to do some benchmarking. So, I started using 'Pcre.bas' and 'RegEx.bas' but understood after some time that there is no option to make 'RegexMatch' function to do global matching. I was assuredly thinking that global matching would be definitely present in RegEx.bas since I thought VB6SourceProcessor would be definitely doing global matching. So, I spent a few hours in digging more - going through the VB6SourceProcessor code and VBPCRE2's 'cPcre2' class code. That led me to get to "my own understanding" that unless 'Execute2' kind of code (present in cPcre2 class) is present in 'RegEx.bas', global matching is not possible. When I find time, I shall try to include that kind of code in RegEx.bas and then do the benchmarking.

    All said and done, if I am completely wrong in my above "my own understanding" and global pattern matching is indeed possible via RegEx.bas itself, my sincere apologies. In that case, if and when you find time, kindly let me know how to do global matching using the existing RegEx.bas code itself.

    When time permits, I shall also study what Olaf has written in Post #29 (https://www.vbforums.com/showthread....=1#post5390811) and see whether it is possible to implement that solution as UDF too in SQLite SQL queries. If possible to implement, then I shall get to know more details from Olaf regarding Jscript9 - whether using Jscript9 is seamlessly supported in all versions of all Windows OSes (incl. Win11), etc.

    By the way, I wanted to write the following in my earlier message itself but forgot. i.e. I simply loved the following which you have written in https://github.com/jpbro/VbPcre2. Had a hearty laugh.
    --
    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    Some people when confronted with a problem, a regular expression library, and a need for access to said library in VB6 think "I know I'll write a wrapper for PCRE2 in VB6". Now they have three problems
    --


    // 0% plans at this point unless you have a compelling reason? //
    Well, I just thought that there may be some more advanced pattern matchings possible in the latest version of pcre2. Nothing else. From what you have written, it seems that that is not the case.

    Thanks for your New Year wishes.

    Kind Regards.

  10. #50

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by softv View Post
    I indeed wanted to do some benchmarking. So, I started using 'Pcre.bas' and 'RegEx.bas' but understood after some time that there is no option to make 'RegexMatch' function to do global matching. I was assuredly thinking that global matching would be definitely present in RegEx.bas since I thought VB6SourceProcessor would be definitely doing global matching. So, I spent a few hours in digging more - going through the VB6SourceProcessor code and VBPCRE2's 'cPcre2' class code. That led me to get to "my own understanding" that unless 'Execute2' kind of code (present in cPcre2 class) is present in 'RegEx.bas', global matching is not possible. When I find time, I shall try to include that kind of code in RegEx.bas and then do the benchmarking.

    All said and done, if I am completely wrong in my above "my own understanding" and global pattern matching is indeed possible via RegEx.bas itself, my sincere apologies. In that case, if and when you find time, kindly let me know how to do global matching using the existing RegEx.bas code itself.
    It's been a few years since I dug into pcre, but AFAIR there is no flag that replicates a "/g" or global search (if that functionality exists in a new version, that would be a compelling reason to update though!). I think you just have to loop against the previous end index+1 and build your own global resultset. That's probably one of the places where my old VbPcre2 code was inefficient and could be improved. You seem quite smart, so if you compare the full VbPcre2 matching code to the minimal module stuff, I think you will find your answer.

    Anyway, give it a shot and you might spark another benchmarking thread...I think wqweto won the last one, but perhaps you will win the next

    [QUOTE=softv;5626996]When time permits, I shall also study what Olaf has written in Post #29 (https://www.vbforums.com/showthread....=1#post5390811) and see whether it is possible to implement that solution as UDF too in SQLite SQL queries. If possible to implement, then I shall get to know more details from Olaf regarding Jscript9 - whether using Jscript9 is seamlessly supported in all versions of all Windows OSes (incl. Win11), etc.

    Quote Originally Posted by softv View Post
    By the way, I wanted to write the following in my earlier message itself but forgot. i.e. I simply loved the following which you have written in https://github.com/jpbro/VbPcre2. Had a hearty laugh.
    --
    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    Some people when confronted with a problem, a regular expression library, and a need for access to said library in VB6 think "I know I'll write a wrapper for PCRE2 in VB6". Now they have three problems
    --
    Hehe, and I didn't know it at the time, but it turns out the full quote should be:

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    Some people when confronted with a problem, a regular expression library, and a need for access to said library in VB6 think "I know I'll write a wrapper for PCRE2 in VB6". Now they have three problems.

    Some people - upon "solving" the above problems, would never have thought that they would be supporting their "solutions" 5 years later
    I'm joking of course! It's always nice when anything I've put out there gets a bit of use in the real world.

  11. #51
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    There is a clear winner here, on the eve of the New Year. And, that is you, my dear JpBro! Well, based on "my own present needs", as far as "my own benchmarking tests" go, you have won indeed, so far.

    Well, first of all, thanks for replying, so promptly again. I wonder how experts like you get time to reply so readily and promptly!

    // if you compare the full VbPcre2 matching code to the minimal module stuff, I think you will find your answer. //
    Yes, even before seeing your reply, I had read this answer (https://stackoverflow.com/a/70247959) 2 days back, at https://stackoverflow.com/questions/...ing-using-pcre - and that helped understand the need for the 'exec2' code in your vbpcre2, where extracting all matched strings is done based on ovector offsets.

    Actually, based on my present need, which is just to search for string patterns in a database's words, I wrote a "pcrTest" routine (a minimal version of your RegExMatch routine) which would just tell whether match happened or not (i.e. just return True or False), given a string and a pattern.

    Based on the above, for a non-English database (with around 40K unique words), my test results (all times in milliseconds), for pcr, jsr and vbr resp., were as follows:
    ==========
    190-220, 150-170, 330-360 (when run in IDE)
    110-130, 140-160, 110-130 (when run as exe) with Jscript9

    190-220, 1500+, 330-360 (when run in IDE) with Jscript
    110-130, 340+, 110-130 (when run as exe)
    ==========
    pcr - using pcrTest, jsr - using ActiveScript, vbr - using VbRegEx

    All timings were obtained using New_C.Timing. Was using GetTickCount earlier. 'New_c.Timing' was greatly convenient, after getting to know about it.

    In my pattern, 2 negative lookaheads and 1 negative lookbehind were present in the case of pcr.
    2 negative lookaheads alone were present in the case of jsr and vbr since they did not support lookbehinds.

    I thought Jscript at least would support lookbehinds but it did not. May be that is also IE based Jscript which does not support lookbehind. I tried to see whether ActiveScript supports V8 for 'language'. It did not seem to support. Have to ask Olaf whether there is any way I can make ActiveScript use a flavour of Jscript which supports lookbehinds. I think if that flavor is V8, it would support all other possible features of RegEx too, as in PCRE2.

    Coming to my pattern, since it has an extra lookbehind, that is more overhead (as per my understanding on what I read from net) but yet pcrTest excelled! In the jsr and vbr patterns, I handled the lookbehind using sqlite's substring. As far as I have read and understood from the net, it seems substring is somewhat faster than lookbehind (in the particular case pattern I have considered). If my understanding is correct, then, that adds further more applause to pcrTest!

    Note-1: The pattern was returning around 40 records from the database (with around 40K rows), as result. As written in earlier messages, the pattern was passed to a UDF in a SqlQuery.
    Note-2: Strangely, with substring used, Jscript did not perform well, compared to Jscript9, as one can see from the results.
    Note-3: If and when I get time, I shall remember to run the tests in my Win11 system too and see the results.
    Note-4: The timings include the process of loading the result rows in Krool's FlexGrid too. As of now, I am using Olaf's BindTo methodology (https://www.vbforums.com/showthread....=1#post5514140) for the same. I am yet to explore and use the recent 'FlexGrid' enhancements of Krool's.

    For an English-only database (just a test database), my test results (all times in milliseconds), for pcr, jsr and vbr resp., were as follows:
    ==========
    700 to 720, 680 to 700, 790-810 (when run in IDE) 'with Jscript9
    570-590, 630-640, 550-570 (when run as exe) 'JSCRIPT9
    570-590, 640-670, 550-570 (when run as exe) 'JSCRIPT
    ==========

    One positive lookahead was present in my pattern for all of pcr, jsr and vbr.
    The pattern was some random simple pattern "^(?=xyz).*" on a database of 50001 rows where all records are "xyzabc" except the last one which is "abcxyz". So, the number of records returned are 50K for the aforesaid pattern.

    Actually, after doing my tests yesterday itself with just Jscript9 alone, I was thinking of sharing them alone today. But, today morning, after getting interest to see what you really meant by wqweto's tests, I went and saw that portion of the thread. So, I included Jscript also in my tests. By the way, the benchmarking test project provided in post #190 (by dreammanor) gives a very long time of 18+ seconds in my system (Win10, with 16GB RAM) for wqweto's method. I tried twice or thrice. I did not have time to explore to see why. May be a small tweak in wqweto's code, perhaps to meet some recent requirements, would do the trick. I dont know. Just as I wrote the aforesaid, I tested it again now and the results (in millisecs) are 4.6, 7.7, 5.6, 7.4, 19.2 for vbr, js9, js, pcr, wqw resp.

    // that would be a compelling reason to update though! //
    yes indeed. That would be a really compelling reason. In this context, I saw this - https://github.com/jpcre2/jpcre2 - it has a global matching option. Whoever has time (you, wqweto, ...) can look into how it is done in c++ wrapper and port it in vb6, if in case you find what is done in the c++ wrapper therein is better than all that is done so far with regard to vb pcre2 wrappers. I saw this too - https://www.pcre.org/current/doc/htm...est.html#SEC11 (Finding all matches in a string). So, may be one or other expert, if and when they find time, can study the code of pcre2Test too and see in what best optimized way it can be ported in vb6.

    // You seem quite smart //
    "Uhoh! I need to handle this". hahaha! . Honestly, I am just a fledgling coder, compared to you all experts. I am just someone trying to make as best use as I can, of you all experts' monumental free sharing of your invaluable expertise, in my projects, and see what best benefits I can provide for free to the society, through my free apps. I wish you all experts were near me (with all the time in the world! ) so that things like what I am asking here and there in the vbforums can be readily provided in a platter (as soon as asked) by experts like you so that the society gets benefited faster and faster. Well, I know that is not possible, as such. So, I chug along (so to say), plodding (so to say), with limited expertise (compared to you all experts), doing whatever best I can, in creating free apps, to help the society, with all of you experts' free help (which is being offered aplenty in our vbForums in so many different ways). And, feel immensely happy (for the opportunity to immensely Thank the Glory of the Love of the Lord Almighty for everything), when benefited users share their heartfelt and hearty happiness.

    // I'm joking of course! //
    Truly, its always a great feeling when experts like you add a bit of fun in your replies and coding. Particularly, when one is coding alone, with need for complex regexes too sometimes (as in your vb6SourceProcessor), one indeed needs a heavy dose of laughter, now and then. . Thanks a lot for providing the same. I have started loving your way of writing like that (Uhoh!) before starting to handle special cases.

    // It's always nice when anything I've put out there gets a bit of use in the real world. //
    Coming to your vsp (vb6SourceProcessor), I have given an initial run to it and felt very glad that it accumulates lines for each and every form, bas, etc. So, I can work on it myself, when I find time, to build that kind of neat table which I mentioned about in one of my earlier posts. Having said that, if ever you find time to enhance your vsp, to meet my and many other users' needs, that would be great too. Recently, I tried RubberDuck but it could not parse my project's codes fully. So, I could not use any of its features. So, that way, your vsp would be of great help for me, in times to come, I believe. Already, the stats it has given me, for my present ongoing project is great and very useful. Thanks a TON. God Bless you. God Bless all.

    In all Humbleness.

    Kind Regards.
    Last edited by softv; Dec 31st, 2023 at 10:21 AM.

  12. #52
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by softv View Post
    For an English-only database (just a test database), my test results (all times in milliseconds), for pcr, jsr and vbr resp., were as follows:
    ==========
    700 to 720, 680 to 700, 790-810 (when run in IDE) 'with Jscript9
    570-590, 630-640, 550-570 (when run as exe) 'JSCRIPT9
    570-590, 640-670, 550-570 (when run as exe) 'JSCRIPT
    ==========

    One positive lookahead was present in my pattern for all of pcr, jsr and vbr.
    The pattern was some random simple pattern "^(?=xyz).*" on a database of 50001 rows where all records are "xyzabc" except the last one which is "abcxyz". So, the number of records returned are 50K for the aforesaid pattern.
    Not sure, how to interpret your test-results, because I get different timings (for 50,001 records)

    FWIW...
    Here's my JScript9-based UDF:
    Code:
    Option Explicit
    
    Implements RC6.IFunction
    
    Private WithEvents SC As cActiveScript, CO As Object
     
    Private Sub Class_Initialize()
      Set SC = New_c.ActiveScript("JScript9", False, False)
          SC.AddCode "var oRegEx=null;" & vbCrLf & _
                     "function RegExInit(sPat){" & vbCrLf & _
                     "    oRegEx = new RegExp(sPat, 'ig')" & vbCrLf & _
                     "}" & vbCrLf & _
                     "function RegExSearch(s){" & vbCrLf & _
                     "    return (s.search(oRegEx) != -1)" & vbCrLf & _
                     "}"
      Set CO = SC.CodeObject 'func-calls work fastest, when we use the CodeObject
    End Sub
     
    Private Property Get iFunction_DefinedNames() As String
      iFunction_DefinedNames = "RegExp" 'tell SQLite, which functionname we are using
    End Property
    
    Private Sub iFunction_Callback(ByVal ZeroBasedNameIndex As Long, ByVal ParamCount As Long, UDF As cUDFMethods)
      If ParamCount <> 2 Then UDF.SetResultError "RegExp needs two parameters!": Exit Sub
      
      If UDF.GetType(2) = SQLite_NULL Then 'if the second param (the Field or Expression to search) is Null...
         UDF.SetResultNull '...then return a Null here as well
         
      Else 'normal case (no further sanity-checks)
         Static stPat As String
         If stPat <> UDF.GetText(1) Then 'make sure. to re-init the js-RegEx-Object only, when the pattern changes
            stPat = UDF.GetText(1): CO.RegExInit UDF.GetText(1)
         End If
         UDF.SetResultInt32 CO.RegExSearch(UDF.GetText(2))
      End If
    End Sub
    The above JScript9-based regexp-matcher then produces similar timing-results to the VBScript-based RegExp-COM-Object implementation.
    (roughly 55msec to find the 50,000 matches) - though keep in mind, that the JScript9-regexp is more powerful regarding "support for complex patterns".

    Here's InMemory-DB-based Test-Form-Code:
    Code:
    Option Explicit
    
    Private Cnn As cConnection, Rs As cRecordset
    
    Private Sub Form_Load()
     
      Set Cnn = New_c.Connection(, DBCreateInMemory)
          Cnn.AddUserDefinedFunction New cRegExpUDF2
      
      Cnn.Execute "Create Table T(ID Integer Primary Key, Fld1 Text)"
      Dim i As Long
      For i = 1 To 50000
        Cnn.ExecCmd "Insert Into T(Fld1) Values(?)", "xyzabc"
      Next
      Cnn.ExecCmd "Insert Into T(Fld1) Values(?)", "abcxyz" 'make it 50001 records
    End Sub
     
    Private Sub Form_Click()
      New_c.Timing True
        Set Rs = Cnn.GetRs("Select Count(*) From T Where Fld1 RegExp ?", "^(?=xyz).*")
      Me.Caption = "FoundMatches: " & Rs(0).Value & New_c.Timing
    End Sub
    Sorry to jpbro, for posting all this regexp-stuff in this thread here...

    Olaf

  13. #53
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    Following is an extract from https://www.vbforums.com/showthread....=1#post5627191 (post #286)
    --
    Edit-1:
    Just now, after posting the above, I checked out my referred post in JpBro's thread and I notice a new reply therein, from you!!!, Olaf. I will now go and read that fully.

    Edit-2:
    First of all, as ever, thanks a lot for your reply in JpBro's thread. I had not made that init/reinit optimisation (as in your code) in 'pcrTest' either. So, I started to make them (so that my future benchmarking will be on even scales). And, "to whatever level I could" do^ the optimisation, with my limited knowledge, and "to whatever level I have tested so far", as per my initial findings, I see a dramatic increase in processing speed in JpBro's pcrTest, w.r.t the non-English database. Once I have done a thorough testing for both non-English and English databases, I will share my results in JpBro's thread. At that time, I would request you to kindly help me know (if possible and if and when you find time) the ideal manner in which the pcrTest code shall be, so that it will carry the highest optimisation. In case my initial findings prove later to be not correct, I shall inform that also here. Thank you soooooo much once again for taking time, even amidst your hectic schedules, to give such clear-cut code snippets. Remaining in all humbleness, as always.
    (^) I dont know whether I have handled the memory free-ings correctly. That's one more reason for which I have requested your kind help above. I wanted to post pcrTest code in my last reply in JpBro's thread itself but forgot.
    --

    With reference to the above, dear Olaf, I have tested more and so far I have not found any changes in my initial findings.
    JpBro's 'pcrTest' continues to remain much more fast than it was earlier, w.r.t the non-English database (even if I use substring instead of lookbehind). W.r.t the English database, I found 'pcrTest' to have become slightly faster. However, I would definitely like to test for some more time tomorrow. So, I shall post my revised test-results and also the 'pcrTest' code, some time tomorrow. Kindly please bear with me until then.

    Thanks once again to both JpBro and you for all the help you both have rendered so far.

    In all humbleness.

    Kind regards.

  14. #54

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    I've put together a project with slightly modified version of Olaf's test code (changed so that only 25,000 of the 50,000 rows will match just to simulate a more realistic workload). I've also added in my modPcre2.bas and modRegex.bas files, with a slight modification to cache the compiled regex handles and only cleanup/rebuild when the pattern changes (or you pass an empty pattern to force a cleanup) - thanks to Olaf for inspiring that change with his Jscript9 example.

    Here it is:

    Jscript9VsPcre2Test.zip


    After compiling with all the usual optimizations selected, PCRE2 gets ~2x the performance of JScript9 for me. Assuming I've haven't made a mistake in my test code, it would be interesting to see what results you guys are getting:

    Name:  2024-01-01_16-49-05.jpg
Views: 141
Size:  13.3 KB


    Quote Originally Posted by Schmidt View Post
    Sorry to jpbro, for posting all this regexp-stuff in this thread here...

    Olaf
    No problem at all Though maybe it would have been better to have this conversation in the VBPCRE2 thread.

  15. #55
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by jpbro View Post
    After compiling with all the usual optimizations selected, PCRE2 gets ~2x the performance of JScript9 for me.

    Assuming I've haven't made a mistake in my test code, it would be interesting to see what results you guys are getting:
    Using your demo I get the same about factor 2 relation (in IDE/PCode ~1.5):
    JScript9: 57msec
    PCRE2: 24msec

    Have first wondered, why there was no logic for "Pattern-Precompile/Caching" in your UDF-Class,
    but then found it implemented directly in your *.bas-modules match-function...

    So, there's not much more left over to optimize for the both cases I guess...
    (unless the pcre-folks have worked wonders, and sped it up in recent years by another "factor x").

    A regex-engine which is "making waves" in recent benchmarks, is the rust-crate "rure" -
    Though have only found 64bit win-binaries in a longer google-session ...
    (so far tried to avoid installing rust, compiling it myself)

    Olaf

  16. #56

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by Schmidt View Post
    Using your demo I get the same about factor 2 relation (in IDE/PCode ~1.5):
    JScript9: 57msec
    PCRE2: 24msec
    Thanks for confirming the results.


    Quote Originally Posted by Schmidt View Post
    Have first wondered, why there was no logic for "Pattern-Precompile/Caching" in your UDF-Class,
    but then found it implemented directly in your *.bas-modules match-function...
    Yeah, I figured it was a good feature to have in the base module, so I put it in there.

    Quote Originally Posted by Schmidt View Post
    So, there's not much more left over to optimize for the both cases I guess...
    (unless the pcre-folks have worked wonders, and sped it up in recent years by another "factor x").
    I took a dive through the PCRE2 release notes, and didn't see much related to performance, so I suspect you are right.

    Quote Originally Posted by Schmidt View Post
    A regex-engine which is "making waves" in recent benchmarks, is the rust-crate "rure" -
    Though have only found 64bit win-binaries in a longer google-session ...
    (so far tried to avoid installing rust, compiling it myself)
    Thanks for the info, I'd not heard of "rure" before. I looked at the source and it doesn't seem too big (<700 lines?) unless I've missed something. I wonder if it would be possible to re-implement it in VB6 or if it makes use of any Rust language features that would make that impossible (I don't know enough about Rust to say)... I might be crazy enough to give it a shot if I can understand the Rust code well enough

  17. #57

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Oh nevermind - if the regex-automata stuff is part of the whole shebang, then it's a heck of a lot of code.

  18. #58

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by jpbro View Post
    I looked at the source and it doesn't seem too big (<700 lines?) unless I've missed something. I wonder if it would be possible to re-implement it in VB6 or if it makes use of any Rust language features that would make that impossible (I don't know enough about Rust to say)...
    LMAO I was looking at the rure.rs in the C API subfolder! I knew I was being optimistic :P I guess we could hope for a VB6 wrapper around this size, but we'd need a 32-bit DLL. Wish I had heard about this before the Christmas break, it might have been fun to try to build it and see how it performs.

  19. #59

    Thread Starter
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by softv View Post
    There is a clear winner here, on the eve of the New Year. And, that is you, my dear JpBro! Well, based on "my own present needs", as far as "my own benchmarking tests" go, you have won indeed, so far.
    Woohoo, I won something! JK, glad that something I worked on combined with your own study and work, produced a viable solution to your problem. You might want to replace modRegex.bas with the latest version in post #54 to see if it performs even better for you.

    Quote Originally Posted by softv View Post
    Well, first of all, thanks for replying, so promptly again. I wonder how experts like you get time to reply so readily and promptly!

    Happy to help/reply as quickly as I can. As you can see be the previous couple "stupid mistake" posts, I'm not sure that I'm an expert yet - especially compared to some of the gurus that frequent this forum. That said, I can't speak for them, but I personally take great enjoyment in developing new knowledge while helping people here, especially when they are inquisitive and show a propensity to internalize advice and go off and solve their specific problems themselves based on that advice like you have been doing Beyond the new stuff I learn personally, your investment and exploration of the issue shows growth, and that makes my time invested feel doubly worth it.

    Quote Originally Posted by softv View Post
    Actually, based on my present need, which is just to search for string patterns in a database's words, I wrote a "pcrTest" routine (a minimal version of your RegExMatch routine) which would just tell whether match happened or not (i.e. just return True or False), given a string and a pattern.

    Based on the above, for a non-English database (with around 40K unique words), my test results (all times in milliseconds), for pcr, jsr and vbr resp., were as follows:
    ==========
    190-220, 150-170, 330-360 (when run in IDE)
    110-130, 140-160, 110-130 (when run as exe) with Jscript9

    190-220, 1500+, 330-360 (when run in IDE) with Jscript
    110-130, 340+, 110-130 (when run as exe)
    ==========
    pcr - using pcrTest, jsr - using ActiveScript, vbr - using VbRegEx

    All timings were obtained using New_C.Timing. Was using GetTickCount earlier. 'New_c.Timing' was greatly convenient, after getting to know about it.
    There are some funky timings there, but I'd be interested to see your timing comparisons with my modified version of Olaf's test project in [URL="https://www.vbforums.com/showthread.php?869555-VB6-Automated-Source-Code-Processing-Helper&p=5627256&viewfull=1#post5627256"]post 54
    I'm reading through the rest of your notes, but I have to take a break so I might not respond until tomorrow.

    Happy New Year BTW

  20. #60
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    Dear JpBro,

    Happy Happy Happy New Year.

    I feel joyous that my findings gave you happiness. Thanks for all your kind words. You should have been near me to see how happy I was, reading them. Ultimately, that was yet another opportunity for me to thank the Lord Almighty. It also made me feel that, finally finally finally, "perhaps" I can place one humble request of mine to a vbForum expert. About the same later.

    Now, going through the recent replies from you and Olaf I am not sure whether you read my yesterday's post #53 of mine - https://www.vbforums.com/showthread....=1#post5627239 - if not read yet, kindly please read. Therein I have written that based on Olaf's wonderful tips, it became necessary for me to write my own optimized code for 'pcrTest' and that led to 'pcrTest' to run dramatically faster. I have also written that however I did not know whether all the optimization I did was done in the correct manner, since my coding proficiency is limited (compared to all you experts)

    Well, in the context of all that I have written in my post #53, I share the timings of my test-results below, first, and then the 'pcrTest' code. All timings given below are average timings (on running the same tests quite a few times and observing the timings). The timings are in milliseconds.

    For non-English Database (for pcr, jsr and vbr resp.)
    --
    63, 116, 69 (when I use substring in pcrTest pattern too so that the testing of "pcr,jsr,vbr" is on even scales)
    55, 116, 69 (wow! this is when I use lookbehind instead of substring in pcrTest pattern)
    --

    If 63ms was great, 55ms is further more great. Really fantastic. Outstanding.Because, it all now looks instantaneous here for me to see Krool's flxGrid display the required records (based on Olaf's magical 'BindTo' method) in just 55 milliseconds. Instantaneousness is always the case whenever matching records are lesser for a particular 'search string, pattern string' pair, which is a great feel visually. Thanks, JpBro.

    For English Database (for pcr, jsr and vbr resp.)
    --
    520, 600, 520
    --

    For the cnt(*) query of Olaf, just comparing pcr and jsr alone, resp.
    --
    52, 115
    --

    The code of 'pcrTest'
    pcrTest's code is at the bottom of this message, to which I have not made any changes since my post #53. I thought I will make them (based on your own revised optimised code), after hearing your/Olaf's comments on my own optimized code of 'pcrTest'.

    Actually, I started doing the optimization going by the 'Static' way only. But, later realised, that it will necessitate me to have a statement like "If p_RegexToMatch = vbNullString" (as in your code). So, I decided to just move the handles' declarations to the top of the module and have a separate cleanup routine (pcrCleanup). I decided to call this cleanup routine immediately after every call that is made to pcrTest in my project. In other words, this cleanup routine would be called only after the SqlQuery completes and fetches all the records to the flxGrid. And, I shall remember to call this cleanup routine every time pcrTest is called. I read in the internet that if IDE or app stops abruptly (for one or other reason), then unfreed memory handles will remain unfreed. So, I decided to call cleanup immediately after a call to pcrTest.

    So, my question (to both of you, dear JpBro and dear Olaf) is whether the above-explained approach of mine is also correct in every way? If it is correct, then, shall I go ahead with it? Because, that would obviate the need to have a line like "If p_RegexToMatch = vbNullString" (which may consume a few milliseconds, I felt, esp. for large number of records).

    But, on the other hand, if my approach is not correct and handles should never be declared at the module level (for some reason or other), and 'Static' is the right way to go about it, let me know the same too. Also, educate me on why handles should not be declared at the module level and cleaned up with a separate call, as done by me.

    I noticed one thing. You are cleaning up the handles, JpBro, whenever the pattern changes. I have not done it yet in my pcrTest. Either I missed it (OR) I had a different idea about these handles at the time of coding.

    So, if my approach is correct:
    --
    1. The one change I would have to do is to add the 'cleanup' call whenever pattern changes. Right? Kindly confirm. Also, I have passed "StrPtr(p_TextToSearch)" directly to the function call. Is that okay? Or, should I necessarily store it in l_StrPtr first, as you have done? If so, can those 2 lines involving l_StrPtr be optimized in any way?

    2. As such, if and when you and/or olaf find time, you may kindly please correct all the mistakes in my code and give me your own version of 'pcrTest' with the highest/ideal optimization.
    --

    By the way, olaf, in your code, for my testing needs, I changed the line inside the 'RegExSearch' function to "return (oRegEx.test(s))"


    // Woohoo, I won something! JK, glad that something I worked on combined with your own study and work, produced a viable solution to your problem. You might want to replace modRegex.bas with the latest version in post #54 to see if it performs even better for you. ... .. . I personally take great enjoyment in developing new knowledge while helping people here, especially when they are inquisitive and show a propensity to internalize advice and go off and solve their specific problems themselves based on that advice like you have been doing . Beyond the new stuff I learn personally, your investment and exploration of the issue shows growth, and that makes my time invested feel doubly worth it. //

    I am happy that you appreciate all that I am trying to do as a coder - to study and check out certain things on my own too, so that apart from you experts' invaluable times being saved (to whatever extent), I learn more too, both from experts like you and from the internet also, and more importantly pass on the benefits of those learnings to the society through my free apps. In this particular case of RegEx-ing, I am happy it helped me find you as a winner too, JpBro. . All this benchmarking actually started because there was no lookbehind in vbr! Otherwise, obviously, even few days back, I had no ideas on doing this kind of benchmarking at all. And, how the lack of lookbehind in vbr has helped in so many ways, ultimately helped the society! And, giving me one more opportunity to keep thanking the Lord Almighty.

    Seeing the way you write with such personal touch and joy, so encouragingly, so kindly, so humbly, sprinkled with humour too at places, - all of which goes along with my wavelength, I thought "perhaps" I can place the request (mentioned by me at the start of this message) to you. In that regard, is it okay for me to send a 'private message' to you via VbForums, introducing myself? (since that is one way I know, as of now, of sending personal messages to you, which I believe will pave the way for "greater and faster" benefits to the society in times to come). If not okay, absolutely no problems. Actually, I have thought now and then in the past as to whether I shall request any one of the vbForum experts as to whether it is all right for me to send personal messages to them. But, I resisted, because you are all working under hectic schedules, I know. So, it would be quite unfair, to even to think of such a request. So, so far, I have never placed such a request to anyone, expert or otherwise. This is the first time ever I am placing a request of this sort.

    Happy Happy Happy New Year, once again, to you, dear JpBro.

    In all humbleness.

    Kind Regards.

    Code:
    Option Explicit
    
    Public Type Matches
       FoundMatch As Boolean
       MatchStart As Long
       MatchLen As Long
       Match As String
       SubMatchCount As Long
       SubMatches() As String
    End Type
    
    Private Declare Sub win32_CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByRef Destination As Any, ByRef Source As Any, ByVal Length As Long)
    
    Private l_CompiledContextHandle As Long
    Private l_CompiledRegexHandle As Long
    Private l_MatchDataHandle As Long
    Private sPatPrev As String
    '
    Public Function RegexSplit(ByVal p_TextToSplit As String, Optional ByVal p_RegexToMatch As String) As String()
       Dim la_Split() As String
       Dim l_NextIndex As Long
       
       Do
          With RegExMatch(p_TextToSplit, p_RegexToMatch)
             If .FoundMatch Then
                ' Found a match
                
                ' Make sure we have enough space for text in our result arrya
                If l_NextIndex = 0 Then
                   ReDim la_Split(99)
                ElseIf l_NextIndex > UBound(la_Split) Then
                   ReDim Preserve la_Split(l_NextIndex * 2)
                End If
                
                la_Split(l_NextIndex) = Left$(p_TextToSplit, .MatchStart - 1)
                p_TextToSplit = Mid$(p_TextToSplit, .MatchStart + .MatchLen)
                
                l_NextIndex = l_NextIndex + 1
             
             Else
                ' No match found, exit loop
                Exit Do
             End If
          End With
       Loop
       
       If l_NextIndex = 0 Then
          ReDim la_Split(0)
          la_Split(0) = p_TextToSplit
       Else
          If Len(p_TextToSplit) Then
             If UBound(la_Split) < l_NextIndex Then ReDim Preserve la_Split(l_NextIndex)
             
             la_Split(l_NextIndex) = p_TextToSplit
             
             l_NextIndex = l_NextIndex + 1
          End If
          
          ReDim Preserve la_Split(l_NextIndex - 1)
       End If
       
       RegexSplit = la_Split
    End Function
    
    Public Function RegExMatch(ByVal p_TextToSearch As String, Optional ByVal p_RegexToMatch As String, Optional ByVal p_CaseSensitive As Boolean = False) As Matches
       ' Returns a Match UDT
       
       ' If .Matched = False then no matches were found
       ' If .Matched = True then:
       ' A match was found (with possible submatches depending on the regex).
       '    The full matched text will be stored in .Match as a string
       '    If there are sub-matches, then SubMatch count will be > 0
       '    You can retrieve sub-matches from the .SubMatches member using one-based indexing
       '    so .SubMatches(1) will return sub-match #1, .SubMatches(2) will return sub-match #2, etc...
       '    If .SubMatchCount = 0 then .SubMatches will not be dimensioned, so do not try to access it.
       
       Dim l_CompiledContextHandle As Long
       Dim l_CompiledRegexHandle As Long
       Dim l_MatchDataHandle As Long
       Dim l_MatchContextHandle As Long
       
       Dim l_ErrorNumber As Long
       Dim l_ErrorDesc As String
       Dim l_MatchCount As Long
       Dim l_OvectorPtr As Long
       Dim la_Ovector() As Long
       Dim l_StrPtr As Long
       Dim l_ErrorCode As Long
       Dim l_ErrorPosition As Long
       Dim l_MatchStart As Long
       Dim l_MatchLen As Long
       Dim l_Flags As Long
       
       Dim ii As Long ' Loop counter
          
       'On Error GoTo ErrorHandler
       
       l_CompiledContextHandle = pcre2_compile_context_create(0)
       If l_CompiledContextHandle = 0 Then Err.Raise "Could not compile PCRE context! Last DLL Error: " & Err.LastDllError
       
       If Not p_CaseSensitive Then
          l_Flags = PCRE_CO_CASELESS Or PCRE_CO_MULTILINE
       End If
       l_CompiledRegexHandle = pcre2_compile(StrPtr(p_RegexToMatch), Len(p_RegexToMatch), l_Flags, l_ErrorCode, l_ErrorPosition, l_CompiledContextHandle)
       If l_CompiledRegexHandle = 0 Then Err.Raise vbObjectError, , "Could not compile regex! Regex: " & p_RegexToMatch & vbNewLine & "Errorcode: " & l_ErrorCode & ", Error Position: " & l_ErrorPosition
       
       l_MatchDataHandle = pcre2_match_data_create_from_pattern(l_CompiledRegexHandle, 0)
       If l_MatchDataHandle = 0 Then Err.Raise vbObjectError, , "Could not allocate match data! Last DLL Error: " & Err.LastDllError
       
       l_StrPtr = StrPtr(p_TextToSearch)
       If l_StrPtr = 0 Then l_StrPtr = StrPtr("")
       
       l_MatchCount = pcre2_match(l_CompiledRegexHandle, l_StrPtr, Len(p_TextToSearch), 0, 0, l_MatchDataHandle, l_MatchContextHandle)
       
       Select Case l_MatchCount
       Case PCRE2_ERROR_NOMATCH
          ' No matches, that's normal :)
       
       Case Is > 0
          ' Number of matches, store information about matches
          l_OvectorPtr = pcre2_get_ovector_pointer(l_MatchDataHandle)
          
          If l_OvectorPtr = 0 Then
             ' Shouldn't happen!
             Err.Raise vbObjectError, , "Ovector pointer could not be retrieved!"
          End If
          
          win32_CopyMemory l_MatchStart, ByVal l_OvectorPtr, 4
          win32_CopyMemory l_MatchLen, ByVal (l_OvectorPtr + 4), 4
          l_MatchLen = l_MatchLen - l_MatchStart
          
          ReDim la_Ovector(2 * l_MatchCount - 1)
    
    
          win32_CopyMemory la_Ovector(0), ByVal l_OvectorPtr, 2 * l_MatchCount * 4
          
          With RegExMatch
             .FoundMatch = l_MatchCount
             .MatchStart = la_Ovector(0) + 1
             .MatchLen = la_Ovector(1) - la_Ovector(0)
             .Match = Mid$(p_TextToSearch, .MatchStart, .MatchLen)
             
             .SubMatchCount = l_MatchCount - 1
             
             If l_MatchCount > 1 Then
                ReDim .SubMatches(1 To l_MatchCount - 1)
             
                For ii = 1 To l_MatchCount - 1
                   l_MatchStart = la_Ovector(ii * 2) + 1
                   l_MatchLen = la_Ovector(ii * 2 + 1) - l_MatchStart + 1
                   If l_MatchLen > 0 Then
                      .SubMatches(ii) = Mid$(p_TextToSearch, l_MatchStart, l_MatchLen)
                   End If
                Next ii
             End If
          End With
                            
       Case Else
          ' Uhoh! We need to handle these
          Err.Raise vbObjectError - l_MatchCount, , "PCRE Match Error: " & l_MatchCount
          
       End Select
       
    Cleanup:
       'On Error Resume Next
    
       ' Free match data if necessary
       If l_MatchContextHandle <> 0 Then pcre2_match_context_free l_MatchContextHandle: l_MatchContextHandle = 0
       If l_MatchDataHandle <> 0 Then pcre2_match_data_free l_MatchDataHandle: l_MatchDataHandle = 0
       If l_CompiledRegexHandle <> 0 Then pcre2_code_free l_CompiledRegexHandle: l_CompiledRegexHandle = 0
       
       'Free compile context before exiting
       If l_CompiledContextHandle <> 0 Then pcre2_compile_context_free l_CompiledContextHandle: l_CompiledContextHandle = 0
    
    
       If l_ErrorNumber <> 0 Then
          If IsPcre2ErrorCode(l_ErrorNumber) Then
             l_ErrorDesc = l_ErrorDesc & vbNewLine & "PCRE2 Error Message: " & GetPcre2ErrorMessage(l_ErrorNumber)
          Else
             If IsPcre2ErrorCode(vbObjectError - l_ErrorNumber) Then
                l_ErrorDesc = l_ErrorDesc & vbNewLine & "PCRE2 Error Message: " & GetPcre2ErrorMessage(vbObjectError - l_ErrorNumber)
             End If
          End If
          
          On Error GoTo 0
          Err.Raise l_ErrorNumber, , l_ErrorDesc
       End If
    
       Exit Function
    
    ErrorHandler:
       l_ErrorNumber = Err.Number
       l_ErrorDesc = Err.Description
       
       Debug.Assert False
       Resume Cleanup
       
    End Function
    
    Private Function IsPcre2ErrorCode(ByVal p_ErrorCode As Long) As Boolean
       IsPcre2ErrorCode = (p_ErrorCode <= [_PCRE_RC_ERROR_FIRST] And p_ErrorCode >= [_PCRE_RC_ERROR_LAST])
    End Function
    
    Private Function GetPcre2ErrorMessage(ByVal p_ErrorCode As Long) As String
       Dim l_BufferLength As Long
       Dim l_Buffer As String
       Dim l_MessageLength As Long
       
       l_BufferLength = 256
       
       Do
          l_Buffer = Space$(l_BufferLength)
          
          l_MessageLength = pcre2_get_error_message(p_ErrorCode, StrPtr(l_Buffer), l_BufferLength)
          
          If l_MessageLength < 0 Then
             Select Case l_MessageLength
             Case PCRE_RC_ERROR_NOMEMORY
                ' Buffer too small
                l_BufferLength = l_BufferLength * 2
             Case PCRE_RC_ERROR_BADDATA
                ' Bad error code
                
                Exit Do
             Case Else
                Debug.Assert False
                Exit Do
                
             End Select
          End If
       Loop While l_MessageLength < 0
       
       If l_MessageLength < 0 Then
          GetPcre2ErrorMessage = "Unknown error #" & p_ErrorCode & ", PCRE2 error message result #" & l_MessageLength
       Else
          GetPcre2ErrorMessage = Left$(l_Buffer, l_MessageLength)
       End If
    End Function
    '
    Public Function PcrTest(ByVal p_TextToSearch As String, Optional ByVal p_RegexToMatch As String, Optional ByVal p_CaseSensitive As Boolean = False) As Boolean
       ' Returns a Match UDT
       
       ' If .Matched = False then no matches were found
       ' If .Matched = True then:
       ' A match was found (with possible submatches depending on the regex).
       '    The full matched text will be stored in .Match as a string
       '    If there are sub-matches, then SubMatch count will be > 0
       '    You can retrieve sub-matches from the .SubMatches member using one-based indexing
       '    so .SubMatches(1) will return sub-match #1, .SubMatches(2) will return sub-match #2, etc...
       '    If .SubMatchCount = 0 then .SubMatches will not be dimensioned, so do not try to access it.
       
       'Static l_CompiledContextHandle As Long
       'Static l_CompiledRegexHandle As Long
       
       'Static l_MatchDataHandle As Long
       
       Dim l_MatchContextHandle As Long
       
       Dim l_ErrorNumber As Long
       Dim l_ErrorDesc As String
       Dim l_MatchCount As Long
       'Dim l_OvectorPtr As Long
       'Dim la_Ovector() As Long
       ''''''''''''''''''''''''''''''''''''''''''''Dim l_StrPtr As Long
       Dim l_ErrorCode As Long
       Dim l_ErrorPosition As Long
       'Dim l_MatchStart As Long
       'Dim l_MatchLen As Long
       Dim l_Flags As Long
       
       '''''Dim ii As Long ' Loop counter
          
       On Error GoTo ErrorHandler
             
       'If Trim$(p_TextToSearch) = vbNullString Then
         'Exit Sub
       'End If
             
       If sPatPrev <> p_RegexToMatch Then
         'If Not p_CaseSensitive Then
         'End If
         l_Flags = PCRE_CO_CASELESS
         sPatPrev = p_RegexToMatch
         l_CompiledContextHandle = pcre2_compile_context_create(0)
         If l_CompiledContextHandle = 0 Then Err.Raise "Could not compile PCRE context! Last DLL Error: " & Err.LastDllError
         l_CompiledRegexHandle = pcre2_compile(StrPtr(p_RegexToMatch), Len(p_RegexToMatch), l_Flags, l_ErrorCode, l_ErrorPosition, l_CompiledContextHandle)
         If l_CompiledRegexHandle = 0 Then Err.Raise vbObjectError, , "Could not compile regex! Regex: " & p_RegexToMatch & vbNewLine & "Errorcode: " & l_ErrorCode & ", Error Position: " & l_ErrorPosition
         l_MatchDataHandle = pcre2_match_data_create_from_pattern(l_CompiledRegexHandle, 0)
         If l_MatchDataHandle = 0 Then Err.Raise vbObjectError, , "Could not allocate match data! Last DLL Error: " & Err.LastDllError
       End If
          
       'l_StrPtr = StrPtr(p_TextToSearch)
       '''''If l_StrPtr = 0 Then l_StrPtr = StrPtr("")
       
       '''''l_MatchCount = pcre2_match(l_CompiledRegexHandle, l_StrPtr, Len(p_TextToSearch), 0, 0, l_MatchDataHandle, l_MatchContextHandle)
       l_MatchCount = pcre2_match(l_CompiledRegexHandle, StrPtr(p_TextToSearch), Len(p_TextToSearch), 0, 0, l_MatchDataHandle, l_MatchContextHandle)
       
       Select Case l_MatchCount
       Case PCRE2_ERROR_NOMATCH
         ' No matches, that's normal :)
         ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''pcrTest = False
       Case Is > 0
         ' Number of matches, store information about matches
         PcrTest = True
       Case Else
         ' Uhoh! We need to handle these
         Err.Raise vbObjectError - l_MatchCount, , "PCRE Match Error: " & l_MatchCount
       End Select
       
       '''''''''''''''''''''Exit Function
             
    Cleanup:
       'On Error Resume Next
    
       ' Free match data if necessary
       If l_MatchContextHandle <> 0 Then pcre2_match_context_free l_MatchContextHandle: l_MatchContextHandle = 0
       '''''If l_MatchDataHandle <> 0 Then pcre2_match_data_free l_MatchDataHandle: l_MatchDataHandle = 0
       '''''If l_CompiledRegexHandle <> 0 Then pcre2_code_free l_CompiledRegexHandle: l_CompiledRegexHandle = 0
       
       'Free compile context before exiting
       '''''If l_CompiledContextHandle <> 0 Then pcre2_compile_context_free l_CompiledContextHandle: l_CompiledContextHandle = 0
    
    
       If l_ErrorNumber <> 0 Then
          If IsPcre2ErrorCode(l_ErrorNumber) Then
             l_ErrorDesc = l_ErrorDesc & vbNewLine & "PCRE2 Error Message: " & GetPcre2ErrorMessage(l_ErrorNumber)
          Else
             If IsPcre2ErrorCode(vbObjectError - l_ErrorNumber) Then
                l_ErrorDesc = l_ErrorDesc & vbNewLine & "PCRE2 Error Message: " & GetPcre2ErrorMessage(vbObjectError - l_ErrorNumber)
             End If
          End If
          
          On Error GoTo 0
          Err.Raise l_ErrorNumber, , l_ErrorDesc
       End If
    
       Exit Function
    
    ErrorHandler:
       l_ErrorNumber = Err.Number
       l_ErrorDesc = Err.Description
       
       Debug.Assert False
       Resume Cleanup
       
    End Function
    '
    Public Sub pcrCleanup()
      
      sPatPrev = ""
      
      If l_MatchDataHandle <> 0 Then pcre2_match_data_free l_MatchDataHandle: l_MatchDataHandle = 0
      If l_CompiledRegexHandle <> 0 Then pcre2_code_free l_CompiledRegexHandle: l_CompiledRegexHandle = 0
      If l_CompiledContextHandle <> 0 Then pcre2_compile_context_free l_CompiledContextHandle: l_CompiledContextHandle = 0
      
    End Sub
    '
    Last edited by softv; Jan 2nd, 2024 at 08:53 AM.

  21. #61
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by softv View Post
    By the way, olaf, in your code, for my testing needs, I changed the line inside the 'RegExSearch' function to "return (oRegEx.test(s))"
    The test()-method of the RegExp object keeps state in global mode -
    and can therefore give wrong results, when the input-string to search through,
    is e.g. identical to the prior one.

    For example, if you fill all 50000 DB-Records in the little test-table T with the same Fld1-Values,
    the Regex-search (in case of a positive match) will not report 50000 records found as it should,
    but only "every second one" (25000).

    That's the reason why I've put the bool-expression (s.search(oRegEx) != -1) into place,
    which does not suffer from that "keep last state" behaviour ...
    (only with that expression in place, will JS9-regexp will behave identical to the other regexp-engines in the context of an sqlite-UDF in "g"-mode).

    Olaf
    Last edited by Schmidt; Jan 2nd, 2024 at 12:21 PM.

  22. #62
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    // will not report 50000 records found as it should, but only "every second one" (25000). //
    yes, I did observe it yesterday that I was
    getting only 25000 records. And, I was wondering like anything for a while. Then I understood that the RegExSearch method was probably returning "every second one". But, strangely, all this was happening when I was using the 'Search' method, as in your code. Having read your reply above now, I dont know why that was happening with the 'Search' method. Well, anyway, since only 25K records were returned, I changed the code to use the Test() method which was the method I was anyway using earlier (i.e. before you suggested the init/reinit optimisation also). I was using Test() only earlier since I just wanted True or False result only and I did not know how to achieve it through 'Search' method. Also, using the Test method made me think that it will be anyway the faster way to test for matching since it just returns True or False only. And also I felt it goes along with my usage of 'Test' method for vbr and pcr too.

    Both the earlier time when I was using Test and now when I am using Test also, I am receiving 50000 records as result only, correctly.

    Now, after reading what you have written, I am wondering why 'Search' method was returning 25000 records for me and not the 'Test' method. I will check again whether I have made any mistake at my end, when using the 'Search' method. If not, then, when time permits, I shall explore more. Meanwhile, if you are able to deduce what possibly could be happening at my end, you can share the same with me.


    Thanks a TON again for all your expert knowledge-sharing. Learning a lot. If you had not told, I would never have gotten time to explore the exact reason for why I was getting 25000 records.

    I take this opportunity to thank you (well, I have lost count of how many times I have thanked you, over the years.
    ) for one more thing as well. Seeing the way you have used RegExp as UDF, I was wondering a lot as to how RegExp alone was able to accept 'X RegExp Y' format in its call. It intrigued me a lot. I was thinking of asking you since I could not get any clue on how that was possible. But then, at one point of time, I did get the reason in the internet via this answer - https://stackoverflow.com/questions/...use-in-sqlite3 . I marvelled at that time of you knowing so many interesting/beautiful pieces of information in various languages/technologies. God Bless you, olaf. God Bless all.

    Kind Regards.
    Last edited by softv; Jan 2nd, 2024 at 01:02 PM.
    Love is God. God is Love. As Ever, All Glory and Thanks to the Lord Almighty Only, Forever...

    "You say grace before meals. All right. But I say grace before the concert and the opera, and grace before the play and pantomime, and grace before I open a book, and grace before sketching, painting, swimming, fencing, boxing, walking, playing, dancing and grace before I dip the pen in the ink." - G. K. Chesterton

  23. #63
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: VB6 Automated Source Code Processing Helper

    Just for completeness, I've now integrated also the Scripting-RegExp55 Object into the mix again,
    which slightly outperforms even pcre2 for "simple patterns" now (because I've not given it Pattern-Caching in my earlier UDF-Classes).

    Results compiled:


    Here the enhanced UDF-Class (now supporting all 3 variants)...
    jpbros Project needs a reference to "Microsoft VBScript Regular Expressions 5.5" in addition
    Code:
    Option Explicit
    
    Implements RC6.IFunction
    
    Private SC As cActiveScript, CO As Object, RX As RegExp
     
    Private Sub Class_Initialize()
      Set SC = New_c.ActiveScript("JScript9", False, False)
          SC.AddCode "var oRegEx=null;" & vbCrLf & _
                     "function RegExInit(sPat){" & vbCrLf & _
                     "    oRegEx = new RegExp(sPat, 'gmi')" & vbCrLf & _
                     "}" & vbCrLf & _
                     "function RegExSearch(s){" & vbCrLf & _
                     "    return (s.search(oRegEx) != -1)" & vbCrLf & _
                     "}"
      Set CO = SC.CodeObject 'func-calls work fastest, when we use the CodeObject
      
      Set RX = New RegExp 'create a RegExp-instance
          RX.Global = True: RX.MultiLine = True: RX.IgnoreCase = True  'set its default-state to "/gmi"
    End Sub
     
    Private Sub Class_Terminate()
       modRegex.RegexMatch vbNullString  ' Cleanup handles
    End Sub
    
    Private Property Get iFunction_DefinedNames() As String
      iFunction_DefinedNames = "RegExpJScript9,RegExpPcre2,RegExpVBS" 'tell SQLite, which functionname we are using
    End Property
    
    Private Sub iFunction_Callback(ByVal ZeroBasedNameIndex As Long, ByVal ParamCount As Long, UDF As cUDFMethods)
      Static stPat0 As String, stPat2 As String
      
      If ParamCount <> 2 Then UDF.SetResultError "RegExp needs two parameters!": Exit Sub
      
      If UDF.GetType(2) = SQLite_NULL Then 'if the second param (the Field or Expression to search) is Null...
         UDF.SetResultNull '...then return a Null here as well
         
      Else 'normal case (no further sanity-checks)
        Select Case ZeroBasedNameIndex
          Case 0 ' JScript Regex
            If stPat0 <> UDF.GetText(1) Then 'make sure. to re-init the js-RegEx-Object only, when the pattern changes
               stPat0 = UDF.GetText(1): CO.RegExInit stPat0
            End If
            UDF.SetResultInt32 CO.RegExSearch(UDF.GetText(2))
            
          Case 1 'PCRE2
            UDF.SetResultInt32 modRegex.RegexMatch(UDF.GetText(2), UDF.GetText(1)).FoundMatch
            
          Case 2 'VBS-RegExp 55
            If stPat2 <> UDF.GetText(1) Then 'make sure. to vbs-RegEx-pattern only, when the pattern changes
               stPat2 = UDF.GetText(1): RX.Pattern = stPat2
            End If
            UDF.SetResultInt32 RX.Test(UDF.GetText(2))
        End Select
      End If
    End Sub
    Form-TestCode:
    Code:
    Option Explicit
    
    Private Cnn As cConnection, Rs As cRecordset
    
    Private Sub Form_Load()
      Set Cnn = New_c.Connection(, DBCreateInMemory)
          Cnn.AddUserDefinedFunction New CRegExpUDF2
      
      Cnn.Execute "Create Table T(ID Integer Primary Key, Fld1 Text)"
      Dim i As Long
      For i = 1 To 50000
        If i Mod 2 = 0 Then
          Cnn.ExecCmd "Insert Into T(Fld1) Values(?)", "abc123"
        Else
          Cnn.ExecCmd "Insert Into T(Fld1) Values(?)", "xyzabc"
        End If
      Next
      Cnn.ExecCmd "Insert Into T(Fld1) Values(?)", "abcxyz" 'make it 50001 records
    End Sub
     
    Private Sub Form_Click()
      Me.Cls
      
      New_c.Timing True
        Set Rs = Cnn.GetRs("Select Count(*) From T Where RegExpVBS(?, Fld1)", "^(?=xyz).*")
      Me.Print "RegVBS FoundMatches: " & Rs(0).Value & New_c.Timing
     
      New_c.Timing True
        Set Rs = Cnn.GetRs("Select Count(*) From T Where RegExpPcre2(?, Fld1)", "^(?=xyz).*")
      Me.Print "PCRE2 FoundMatches: " & Rs(0).Value & New_c.Timing
    
      New_c.Timing True
        Set Rs = Cnn.GetRs("Select Count(*) From T Where RegExpJScript9(?, Fld1)", "^(?=xyz).*")
      Me.Print "JScript9 FoundMatches: " & Rs(0).Value & New_c.Timing
    End Sub
    Olaf

  24. #64
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    //
    If stPat2 <> UDF.GetText(1) Then 'make sure. to vbs-RegEx-pattern only, when the pattern changes
    stPat2 = UDF.GetText(1): RX.Pattern = stPat2
    End If
    UDF.SetResultInt32 RX.Test(UDF.GetText(2))
    //

    Yes, olaf. The above (as in your code) is exactly the kind of code I have used also, in generating the test-results provided by me in my post #60. I did not mention this explicitly in my aforesaid post. So, I thought I will take this opportunity to mention it now.

    Following your init/reinit suggestion, when I optimised pcrTest, I did the change in vbr code also (as you have done in your above code). Thanks once again.

    Always indebted to all of you experts' kind and helpful guidance. God Bless all.

    Kind Regards.
    Love is God. God is Love. As Ever, All Glory and Thanks to the Lord Almighty Only, Forever...

    "You say grace before meals. All right. But I say grace before the concert and the opera, and grace before the play and pantomime, and grace before I open a book, and grace before sketching, painting, swimming, fencing, boxing, walking, playing, dancing and grace before I dip the pen in the ink." - G. K. Chesterton

  25. #65
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: VB6 Automated Source Code Processing Helper

    Quote Originally Posted by softv View Post
    Then I understood that the RegExSearch method was probably returning "every second one".
    B
    ut, strangely, all this was happening when I was using the 'Search' method, as in your code.
    There's something else wrong then... (did you use the correct naming/mapping of UDF-functions and the JS-defined ones)?

    Here's a simple test - directly at js9-level (without SQLite) - which gives proof to the phenomenon, as I observed it:

    Code:
    Private Sub Form_Load()
      Dim SC As cActiveScript, CO As Object, i As Long
      Set SC = New_c.ActiveScript("JScript9", False, False)
          SC.AddCode "var oRegEx=null;" & vbCrLf & _
                     "function RegExInit(sPat){" & vbCrLf & _
                     "    oRegEx = new RegExp(sPat, 'gmi')" & vbCrLf & _
                     "}" & vbCrLf & _
                     "function RegExSearch1(s){" & vbCrLf & _
                     "    return (s.search(oRegEx) != -1)" & vbCrLf & _
                     "}" & vbCrLf & _
                     "function RegExSearch2(s){" & vbCrLf & _
                     "    return oRegEx.test(s)" & vbCrLf & _
                     "}"
      Set CO = SC.CodeObject 'func-calls work fastest, when we use the CodeObject
      
      CO.RegExInit "^(?=xyz).*" 'init (and compile) the pattern in a new oRegEx-js-Obj-instance
      Debug.Print vbLf; "correct behaviour with RegExSearch1; (s.search(oRegEx) != -1):"
      For i = 1 To 4: Debug.Print , CO.RegExSearch1("xyzabc"): Next
     
      CO.RegExInit "^(?=xyz).*" 'init (and compile) the pattern in a new oRegEx-js-Obj-instance
      Debug.Print vbLf; "incorrect behaviour with RegExSearch2; oRegEx.test(s):"
      For i = 1 To 4: Debug.Print , CO.RegExSearch2("xyzabc"): Next
    End Sub
    HTH

    Olaf

  26. #66
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    234

    Re: VB6 Automated Source Code Processing Helper

    // Here's a simple test - directly at js9-level (without SQLite) - which gives proof to the phenomenon, as I observed it: //

    Sorry for the delay, Olaf. I could get time to check out your code just a while ago only. Yes, I could observe the phenomenon at my end. THANK you.

    For SQL queries also, 'Search' is returning 50K records only now (as UDF in SqlQuery). What made it return 25K records earlier, I don't know.
    As far as "Test' is concerned, it is also returning 50K records only (as UDF in SqlQuery). It is returning correct records for non-English database also, always.

    I shall have options to test both ways. If ever I encounter 25K records in my future tests at any time, I shall let you know.

    Thanks thanks thanks again, Olaf, for taking the time to educate me, coming down to my level (as you always do). Ever thankful.

    Kind Regards.
    Love is God. God is Love. As Ever, All Glory and Thanks to the Lord Almighty Only, Forever...

    "You say grace before meals. All right. But I say grace before the concert and the opera, and grace before the play and pantomime, and grace before I open a book, and grace before sketching, painting, swimming, fencing, boxing, walking, playing, dancing and grace before I dip the pen in the ink." - G. K. Chesterton

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width