-
1 Attachment(s)
VB6: Best way to find all permutations of a given string in a file - Contest
Aloha,
This is a Spin-off from another thread where this all began. What we want to do here is take the file given here and search it for every permutation of the string asdf (This file will be updated occasionally). For this test all 8000 known entries in the 5mb file are in lower case. This does not mean that there are not more for the file was created by a random method and as you know random means random so it is possible that there are more. The file format is true binary. This means that all characters are in the file from 0-255. The representation of the permutations are in ASCII format. NO UNICODE or any other format is represented in the file.
I have also included in the zip file a file that contains the index for every permutation that is known in the file and what the permutation is. This file is supplied so that you can verify your work. Remember there may be more permutation in the file than the known ones... and the unknown one may span the 64k byte boundary.
Once all the entries are in I will create a different type of file that will contain some variations of the permutations and test the entries on that file just to keep everything honest. I will see about getting a prize of some sort for the winner.
The winners will be determined by several methods:
1 - Speed
2 - Understandability (ease of verification)
3 - Readability (how the code was written and commented)
4 - Method used
5 - Originality
Notes:
File must be read and processed in 64kb chunks to allow any size file to be scanned.
Permutation Variations:
Variable length.
Fixed length.
No repeating characters.
Repeating characters.
Your program need not be able to handle all of the given variations but it is a list that you could work from.
I would like to have several category of winners because speed does not always determine the needed solution (some methods won't be able to generate the exact log file but will still be correct). But you still need to generate a type of log file to verify your claim.
I would like to have this contest (challenge) run for about three months to give everyone a chance to work the problem. It would be nice if your code was self verifying (some methods won't allow this). This means that it prints out a report of where it found the permutation, what permutation it was and a count of each permutation. in the format of File Offset, Permutation in sorted order. It would be really nice to have the program check against the provided list to verify itself for the known permutations. I have also supplied the elements in the index file in csv format for this purpose.
Code should be sent ready to execute (please check your submission and make sure that all forms, modules and classes load from the same folder as the project). Code should have timing output using the GetTickCount API so that all timings are consistent and MUST use Option Explicit in all forms/modules/classes. All output should be in the format of the file FindStrSearch included in the zip file. Screen output is also appreciated.
The test file for download is too big for VBForums so you will need to download it from here Permutations File (Updated 4/18/2007).
This file is a simple test with all known permutations totally within each 64k boundary.
After 2 months are up I will deliver a more complex and final file that has more challenges. All code MUST be VB6 code but can use Windows API's. No external code or calls other than VB6 and Windows API's will be accepted. This means no custom dll's written in other languages. You can use any original references and components supplied to support VB6 from Microsoft.
You have until July 14, 2007
Code well. If you have some suggestions please let me know...
DO NOT POST YOUR ENTRIES!!!!
WE WANT ALL ENTRIES TO BE ORIGINAL WORK.
All entrants must let me know if they want to enter prior to submitting project...
You will need to email your entries to contest001 at randem dot com.
Source code only.
All code submissions that qualify will be compiled into a project and released on the *********** website as well as in the CodeBank Forum of VBForums for others to download, view and use. No rights to any code will be retained by the submitter. Submitters will be given complete credit for their submission in all postings.
Note: Please use your VBForums name in the submission. I have no idea of who you are without the cross reference.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Does our solution need to be in InStr() form? As in, send it the following four parameters in order?
Starting position
Text to search
Text to find (in any order)
Options (Either a Case Sensitive flag or vbText/vbBinary/vbDatabase)
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
No the solution can and should be in any workable form that works well and easy to use to accomplish the task.
I hope to see many variations on how this can be done.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
He said... "Do not post your entries"
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Ellis Dee,
Do you have something to submit?
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Let me get this straight. We have to create a program that searches the given text file for every "asdf" found, then have it write a file (or print a page, or display in textbox?) containing the offset of every "asdf" found?
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
That is correct... But I never said the given file was text.... It's Binary.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quote:
Originally Posted by randem
That is correct... But I never said the given file was text.... It's Binary.
When you say "permutations" of "asdf," does that include "sdfa," "fads" etc ?
Never mind. I read the index and got the answer.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Shouldnt this go in the Contests Forum?
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
I guess these "contests" should be called "challenges", because each time I tried to pull up a contest in the last two years I was told there will be no contests on these forums (until some other somewhat official contest will come up that still isn't coming). That's why I started off a challenge on this forum instead of a contest, to avoid getting told the same thing all over again.
I guess contests are now ok again...
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
I read over the explanation a couple times and I'm still not 100% sure what we're supposed to do. Maybe because I'm tired.
Are we supposed scan the binary file and find all permutations of asdf?
What should the function return? The number of permutations found?
It's supposed to write a log of each permutation (and it's index?) found?
What if the user wants to find permutations of something other than asdf?
:confused:
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
You're just writing a program (not a function) that counts the number of permutations. The one I wrote counts them, orders them, and saves them with the permutation and the location. It also found 8136 permutations in the list.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
You are supposed to scan the binary formatted file for any permutation of "asdf". permutations of "asdf" are the strings that are planted in the file to find.
A permutation of "asdf" is ALL four characters in any order sequentially.
The program is supposed to be able to self verify, meaning that it will match against the supplied list to verify that it found all the "KNOWN" permutations from the list. There may be "UNKNOWN" permutations in the file also. For if your program cannot pass this basic test, it will not handle the next more complex test file.
Write a log of all permutations and the offset into the 64k byte boundary of where it was found along with the count of the times each permutation was found in the file.
If your code is coded correctly it will not matter what the permutation is or the size of the permutation that it needs to find (within reason).
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
randem, In the code I sent you, I left a ReDim in there that doesn't need to be there... sorry.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quote:
Originally Posted by randem
That is correct... But I never said the given file was text.... It's Binary.
And the distinction is? ;)
Can we presume that "asdf" will be ascii encoded?
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
The file will be a true binary formated file, it will have characters 0-255. No other encoding will be present. No unicode or any other format will be present. The characters "asdf" will be the ASCII representation of these characters.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quote:
Originally Posted by randem
Ellis Dee,
Do you have something to submit?
Yes, but what I sent you isn't what you're looking for. I'll submit a real solution in the next couple weeks.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
You ask for each permutation within each 64k block.
Should permutations found on a boundary (ie partially in two blocks) not be counted?
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
They ABSOLUTELY should be counted. As I stated the KNOWN permutations are totally within each 64k boundary. The UNKNOWN permutations can be anywhere, even spanning 64k byte boundaries.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Are also partial foundings on existing finds counted? Ie. if there is asdfa in the file, will this count as two permutations? Or asdfasdf counts as five or only two?
And no, I haven't had time to check the files and probably won't have that time in the nearest future.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
asdfa is two permutations.
-
1 Attachment(s)
Re: VB6: Best way to find all permutations of a given string in a file - Contest
In case anyone is wondering, the output from your submission should look something like this file. It would help in the verification process.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quote:
Originally Posted by randem
No external code or calls other than VB6 and API's will be accepted.
Quick question...Does this mean we can't use something like, the FileSystemObject??
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
That is a VB call... The statement refers to custom C subroutines and dll and things of that nature. Any native VB or OS supported calls can be used.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
why are you looking for permutations in files :ehh: isn't that 2 contests, 1st for fastest file searching and the 2nd for fastest way to calculate permutations? you really just want fastest permutations eg calculate the permutations per second from an application.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quote:
Originally Posted by learning c
why are you looking for permutations in files :ehh: isn't that 2 contests, 1st for fastest file searching and the 2nd for fastest way to calculate permutations? you really just want fastest permutations eg calculate the permutations per second from an application.
The contest is the contest. Why is not the question. it's not for the fastest way to calculate permutations nor is it for the fastest way to load a file. Those are personal options.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quick Question - What sort of times are you looking for? Average I mean. I have some code going for around 7mins atm. Just wondering how severe I need to be to get closer to the speedy times.
Edit:
Doh! Spotted at the bottom of your text file - 672ms (bloody wow!) since mine is 420000ms... I have a bit of work to do :)
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
That is what your submission is all about...
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Another question:
I've used this challenge as a first project in .net express.
What files do I need to send to you?
Also: The search file results text you posted has 8070 combinations - I've matched that here, but I could've sworn in the VBA code I wrote it came back with more... Is there definitive amount to find or do we go by your count?
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
guys don't compare your times to his time. It is on a different (and possibly multi-core workstation) computer. Compare them to your own previous times.
Also i don't see how anyone could time disk access when you could just read the entire file into a string buffer (no matter what the contents), which is presumably what he did if he did an instr search.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Well, I wouldn't say... I'm running at 120 ms and this is a single core laptop. However I don't output anything.
randem: Do we have to know what is the found permutation for output? I can output the position, but due to optimization I don't know the combination itself unless I make things more complicated. Couldn't we just output the positions?
Also, how about harddisk caching? The file is small enough to remain in cache for a short while, which means that the first time you run a code you get ~60 ms slower results and the next run coming shortly after is faster as the data comes from cache.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Ecniv,
You need to send everything that makes the project run so that I can see exactly what you did.
Merri,
The found permutation would be helpful but if you found a way to definitely identify the permutation and not know which one it is that is something that could be good and would like to be seen.
as Lord Orwell states do not compare your times with mine for ultimately the times will change when I run all the code on a standardized computer. That is where all the timing and such counts. The permutation search should be where your timing is. How you read the file is totally up to the individual and not included in the timing.
The count that I was shown in the example I posted is the least amount of permutations that are in the file. It was given as an example for if you do not find at least that many permutations your code has flaws and should be reworked.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Quote:
Originally Posted by randem
Code should have timing output using the GetTickCount API so that all timings are consistent.
So, I've decided to fiddle with this after all. I would like to suggest using the QueryPerformanceCounter API function for more accurate timing, because GetTickCount is only accurate to about 16 milliseconds.
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
I just wrote a prog that does what's required in about 1/2 hour (I mean it took me 1/2 hour to write :-P) and takes a little under 2 seconds to find all permutations (in IDE, *not* compiled yet)...and Randem, your "counts" of each value in the file (at the end of FindStrIndex.txt) are wrong...what's the difference between FindStrIndex and FindStrSearch?
Edit: My code doesn't save the results in sorted order...it does per permutation...shouldn't take much longer to make it sort the data before writing it though :-P
Edit 2: Got it down to 1.68 seconds (1688 ticks, compiled) with 141 ticks (0.15 secs) to sort and save afterwards...although my output seems to be different to the output you gave as an example...and that's with me running other programs, I then set program priority to realtime and got 1547 with +125 for saving and it'd probably be better if I closed all the other progs running :-P
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Oops, I noticed I had my computer in power saving mode, which meant I was running at 800 MHz... changes things a little, it wasn't hard disk cache after all, just the processor jumping up to 1800 MHz when running the code quickly enough.
randem: I can figure out the actual permutation, but the way I store the data while processing is unoptimal for converting the information in to what is currenly required in the output file. In the other hand, if we know the positions and we have the file, we don't really need that information: it is useful for your validation, but it doesn't help us at this point.
So it is entirely unrequired to output that data: only positions matter. This is why I'd suggest a simple file where only the start position of each permutation is outputted, separated by CRLF. No special formatting and I'd suggest the order also to be free if file outputting is included within timing (it'd be unfair to include sorting).
I have a new question as well: should we output the text file while looking for permutations or should we output it once timing is finished? At the moment I have included the file output in my time. Which is... equal to amount of solving about 25000 very easy sudokus using my sudoku solver. Anyone interested can have merry time finding out how fast it is.
I can tell my time under IDE as it is pretty poor: roughly 1300 ms.
Oh, and GetTickCount would be all too unaccurate for my code...
Code:
'clsTime.cls
Option Explicit
Private m_Freq As Double
Private m_Start As Currency
Private m_Stop As Currency
Private Declare Function QueryPerformanceFrequency Lib "kernel32" (lpFrequency As Currency) As Long
Private Declare Function QueryPerformanceCounter Lib "kernel32" (lpPerformanceCount As Currency) As Long
Public Function GetRemainingSeconds(ByVal CurrentValue As Long, ByVal EndVal As Long) As Currency
Dim dblCurrent As Double
If CurrentValue < 1 Then CurrentValue = 1
If EndVal > CurrentValue Then
QueryPerformanceCounter m_Stop
dblCurrent = (CDbl(m_Stop - m_Start) / m_Freq)
GetRemainingSeconds = CCur((dblCurrent / CurrentValue * EndVal) - dblCurrent)
End If
End Function
Public Function GetTime() As Double
QueryPerformanceCounter m_Stop
GetTime = CDbl(m_Stop - m_Start) / m_Freq
End Function
Public Sub Start()
QueryPerformanceCounter m_Start
End Sub
Private Sub Class_Initialize()
Dim curFreq As Currency
QueryPerformanceFrequency curFreq
m_Freq = CDbl(curFreq)
End Sub
Code:
Dim Q As clsTime
' initialize class
Set Q = New clsTime
' start timing
Q.Start
' return time in milliseconds
MsgBox Fix(Q.GetTime * 1000)
' once done...
Set Q = Nothing
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
It seems everyone is only considering speed. As I stated in the original post speed is only one area. Speed means NOTHING if your output is INCORRECT!!! Fast and wrong NEVER beats slow and correct!!!
The only timing is finding the permutations. Anything else you do outside of that is up to you. I repeat SPEED IS ONLY ONE PART OF THE TOTAL CONTEST!
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
However, correct results are nothing if your program takes 2 weeks to output them :-P
Speed is important, but it takes a backseat to accuracy :-)
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
Well, when it all comes down to it I would rather have something that takes two weeks and is 100% correct all the time than something that takes only a few seconds and is on 50% correct! Sometimes 99% is not good enough.
One can take something that is 100% correct and speed it up. Taking something that operates fast and wrong... Making it faster is of no use!
The output file is your validation... Anyone can make a claim!
-
Re: VB6: Best way to find all permutations of a given string in a file - Contest
smUX,
If you say the counts are wrong. Where is the output file to prove this claim. This is what I am after Proof not just claims. Speed is secondary.