|
-
Jan 27th, 2009, 02:37 PM
#1
Thread Starter
Frenzied Member
[RESOLVED] Random string
Hi, I've created a small tool to upload binary files to Usenet newsgroups. Every binary file is cut into smaller pieces (about 400kB) and then uploaded to the newsserver. Every piece has its own code (Message-ID) that I put in the header.
I create Message-ID's of about 45 characters. When creating random strings of about 45 characters there should be billions of combinations, but many of my users get incomplete uploads, because the application creates Message-ID's already created/used by other users.
Why do so many people create the same strings? Is it because of the Randomize in the function instead of the form_load event?
This is the code I use to create the Message-ID.
vb Code:
Private Const CHARS = "123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Private Function RandomString() As String
Dim i As Long
Dim tmp As String
Randomize
For i = 1 To 28
tmp = tmp & Mid(CHARS, Int(52 * Rnd) + 1, 1)
Next i
RandomString = tmp & "@application.local"
End Function
I also tried this function, but the same problem occurs.
vb Code:
Public Function RandomString() As String
Dim i As Long
Dim btByte As Byte
Randomize
For i = 1 To 28
btByte = Int(Rnd() * 127)
Select Case btByte
Case 48 To 57
RandomString = RandomString & Chr(btByte)
Case 65 To 90
RandomString = RandomString & Chr(btByte)
Case 97 To 122
RandomString = RandomString & Chr(btByte)
Case Else
i = i - 1
End Select
Next i
RandomString = RandomString & "@application.local"
End Function
Last edited by Chris001; Jan 27th, 2009 at 03:34 PM.
-
Jan 27th, 2009, 03:34 PM
#2
Re: Random string
 Originally Posted by Chris001
but many of my users get incomplete uploads, because the application creates Message-ID's already created/used by other users.
Are you SURE that's what is happening ?
How many users are we talking about here ? I mean if you had hundreds of thousand users, then maybe there is a chance that you would get 1 or 2 identical strings.
But let's say the chances are high to get the same string, then you should prefix the string with their user ID (user name) because that is unique to all of them, right ?
You can also try another way to create random strings, like GUID for example:
http://www.devx.com/vb2themax/Tip/18261
Code:
Private Declare Function CoCreateGuid_Alt Lib "OLE32.DLL" Alias "CoCreateGuid" (pGuid As Any) As Long
Private Declare Function StringFromGUID2_Alt Lib "OLE32.DLL" Alias "StringFromGUID2" (pGuid As Any, ByVal address As Long, ByVal Max As Long) As Long
Function CreateGUID() As String
Dim res As String, resLen As Long, guid(15) As Byte
res = Space$(128)
CoCreateGuid_Alt guid(0)
resLen = StringFromGUID2_Alt(guid(0), ByVal StrPtr(res), 128)
CreateGUID = Left$(res, resLen - 1)
End Function
Private Sub Form_Load()
MsgBox CreateGUID
End Sub
Just remove the {} and - characters
-
Jan 27th, 2009, 03:37 PM
#3
Re: Random string
You are using Randomize without a seed. When not provided, the system timer is used as the seed per MSDN.
If the same seed is used then it can generate the same random numbers.
Maybe consider prefixing your random string with a string that should never be duplicated. Maybe prefix it with the serial number of a hard drive or other hardware. Maybe using combinations of hardware serial numbers and software information/serial numbers. By prefixing it with something that should be completely unique, the remaining random characters are basically fiiller or can be used for addiitonal header info.
Edited: I see CVMichael and I were on the same wavelength
-
Jan 27th, 2009, 03:41 PM
#4
Hyperactive Member
Re: Random string
Hmm... I'm really surprised. I don't think you need to keep calling randomize in the function as like you say you could put it in the form's load event but I don't think that explains your issue.
You're getting the same numbers as others are generating? Everything you're doing looks correct to me. Well it does look like you're not using the full set of your chars (you have 52 + 9 there not just 52) but that doesn't explain why you're getting the same strings as others. Are you certain it's happening? Also, what about tagging your files with something personal like a nickname or even a time stamp. That should ensure uniqueness.
Edit: whoa that was some quick responses. sorry for any redundancy
-
Jan 27th, 2009, 03:45 PM
#5
Re: Random string
Calling Randomize repeatedly actually makes it less random, there is a thread in the CodeBank (or maybe UtilityBank) that explains in detail.
You should normally only call it once, when your program starts.
-
Jan 27th, 2009, 03:51 PM
#6
Re: Random string
Isn't it weird how we all post at the same time (almost), yet the thread was started an hour ago... hmmm...
-
Jan 27th, 2009, 04:25 PM
#7
Junior Member
Re: Random string
Try this function below. This function randomize your selection in two steps. First it randomize selection between digits, small letters, capital letters. The second step it randomize within the previously randomly selected category.
Code:
Const MAXLEN As Integer = 45
Private Function RandomString() As String
Dim iChar As Integer
Randomize
For i = 1 To MAXLEN
Select Case Int((3 * Rnd) + 1)
Case 1
iChar = Int(Asc("9") - Asc("0")) * Rnd + Asc("0") 'This to return 0 - 9
Case 2
iChar = Int(Asc("Z") - Asc("A")) * Rnd + Asc("A") 'This to return A - Z
Case 3
iChar = Int(Asc("z") - Asc("a")) * Rnd + Asc("a") 'This to return a - z
End Select
RandomString = RandomString & CStr(Chr(iChar))
Next i
End Function
Guaranteed against DOA
-
Jan 27th, 2009, 05:27 PM
#8
Thread Starter
Frenzied Member
Re: Random string
Thanks for the replies.
The application is quite new, but so far it has been downloaded about 1000 times.
I'm quite sure that here and there the same Message-ID's are created. I did a test upload this afternoon and one piece wasn't uploaded to the server. When I tried to download the missing piece by using its Message-ID, I found out that there already was an article on the newsserver using that exact same Message-ID, but it belonged to a totally different file uploaded by somebody else a few days ago.
One of the users told me he had uploaded files with a total of about 2100 pieces (about 800MB) and when the upload was finished several pieces were missing. He gave me the Message-ID's of the missing pieces (created by my app) and I found out that those Message-ID's were already used as well by other users, in the past few weeks, for totally different files.
I'm going to try GUID. It seems to be a lot faster than both of the functions I posted above. I'll try tfbasta's function too. Instead of "application.local" I'll use the VolumeSerialNumber of the hard drive (the real serial number of the hard drive requires the user to be logged in as Administrator).
-
Jan 27th, 2009, 05:31 PM
#9
Re: Random string
Chris, you mentioned "past few weeks"? How long are you going to make the message IDs remain unique? This could be a potential logic killer. Even using GUID and serial numbers, I think it could be possible that the same GUID be created on the same computer if it was run enough times. Maybe think about ways to prevent having to go back and check several weeks of data against today's data.
-
Jan 27th, 2009, 06:12 PM
#10
Thread Starter
Frenzied Member
Re: Random string
My application does not keep track of all Message-ID's used. It simply creates them and that's it.
The problem is not that the same Message-ID is created on one computer, but the same Message-ID is created on two (or even more) computers a few hundred miles away from each other.
It's really not a problem if once in a while there's a piece missing from the newsservers, because most of the time Par2 repair files (google QuickPar) are posted as well, in order to repair corrupted data. I can understand that on rare occasions it might be possible that one user in America creates a same Message-ID as somebody else did a month ago in Italy while uploading a file, but some people have several pieces missing almost every time they upload files.
-
Jan 27th, 2009, 07:29 PM
#11
Re: Random string
Problem with random is there could be bias which increases the chance that values are regenerated (you won't consume all possible combinations just once, most will be duplicated instead). Pure randomization won't work, there has to be a form of organization/partition info embedded in the key like encryption. Rather than focusing on random string generator per se, step back and assess the bigger picture... What kind of "uniqueness scheme" would you want and can be supported by the existing system and tech know-how?
- Unique hash value supplied by server (static per user? based on other criteria?) that is joined to another unique value generated at client (again what is used to seed has value?)?
- Hash based on the binary of file? Binary of filename? UNC path (prepend with IP)?
- Control file + file slices implementation? Unique key for control file and part of key for file slice refers to the control file?
- MAC address (if you can retrieve this without admin loged in)?
-
Jan 27th, 2009, 07:42 PM
#12
Fanatic Member
Re: Random string
Your dealing with probabitlies and people win the lottery everyday. Try increasing the string length to increase the odds, Perhaps this will help:
http://en.wikipedia.org/wiki/Infinit...opular_culture
-
Jan 27th, 2009, 07:49 PM
#13
Hyperactive Member
Re: Random string
I couldn't find any detailed documentation about how the Randomize function is implemented. All that is said that it seeds the random number generator based upon the system timer. So the question is how likely is it that the seeding will be the same. Anyway I wrote a quick test to see. It generates sequences of 10 numbers 1 through 52 (to simulate your alphabet). Sequences repeat about 1 out of 35 times. So definitely having the randomize statement in your function so that it is repeatedly called is a bad idea.
It does look like you will still have problems even if you call randomize once so you will need to do something else to ensure uniqueness.
This might help: http://nayyeri.net/blog/generating-r...trings-in-net/
Edit: I initially wrote my test a bit differently than chris programmed. I rewrote it to match how he had it set up and the sequences repeat only about 1 out of 2000 times.
Last edited by wy125; Jan 27th, 2009 at 08:18 PM.
-
Jan 27th, 2009, 07:53 PM
#14
Re: Random string
 Originally Posted by technorobbo
Given the assumption that number of files uploaded and their sizes (more parts) will increase sooner or later pure randomization will still hit a performance ceiling and you would need to increase the length once again raising backward compatibility issues of design. Key wherein some parts are extensible and processing is backward compatible (regardless of length of this part of key, length of other parts as they are), or key version embedded (divergent processing for different versions) would scale better.
-
Jan 27th, 2009, 07:58 PM
#15
Hyperactive Member
Re: Random string
Longer keys won't matter at all. A certain percentage will be identical if he keeps his current strategy of relying on the random number generator
-
Jan 27th, 2009, 08:19 PM
#16
Thread Starter
Frenzied Member
Re: Random string
I'm using GUID to create a 32 character string and I'm using the Unix Timestamp (number of seconds since 1970) as Unique Domain-ID.
When the user starts the application I get the current Unix Timstamp, for example 1233108812.
So unless two users start the application at the same second it will never be possible for them to get the same Message-ID. If that's not enough, then I'll add the Volume Serial Number of the hard drive as well.
-
Jan 27th, 2009, 08:28 PM
#17
Re: Random string
Chris, If I were to write a routine today to generate random strings that were 45 characters in length using ASCII characters ranging from 1 to 255 and you let the progran run continuously, it might generate a matched pair of 45-character strings sometime after you and I were both dead.
BUT, it could generate a matched pair in a microsecond.
-
Jan 27th, 2009, 08:36 PM
#18
Hyperactive Member
Re: Random string
Code doc if you reseed the random number generator each time you call the function you will find a match relatively quick. Remember that his application is used by many different users. Each time they launch the application the random number generator is seeded. The probability that two instances of the application create the same sequences is independent of the length of the generated strings and depend only on the probability that the applications initialized the generator with the same seed value. I estimate that this happens about 1 in about 1000 times or so.
-
Jan 27th, 2009, 08:41 PM
#19
Re: Random string
 Originally Posted by Chris001
I'm using GUID to create a 32 character string and I'm using the Unix Timestamp (number of seconds since 1970) as Unique Domain-ID.
When the user starts the application I get the current Unix Timstamp, for example 1233108812.
So unless two users start the application at the same second it will never be possible for them to get the same Message-ID. If that's not enough, then I'll add the Volume Serial Number of the hard drive as well.
Actually its harder to get unique records/files based on time, especially when multiple servers are involved.
What if time was accidentally reset to an earlier value (e.g. mboard change, hardware problems do occur... or invalid time was initially used, then corrected and now duplicates occur)... also that will work only for a single server implementation, if you want to scale to a server farm then their time (at each server) will have to be synchronized.
-
Jan 27th, 2009, 11:00 PM
#20
Re: Random string
 Originally Posted by Chris001
I create Message-ID's of about 45 characters. When creating random strings of about 45 characters there should be billions of combinations, but many of my users get incomplete uploads, because the application creates Message-ID's already created/used by other users.
Chris
As a latecomer, I don't have much to add regarding randomizing. So,
an observation and then a question..
Observation: Billions?? Try gazillions. Just doing 26 lower-case letters .. 26 ^ 26
equals 6.15 e 36 (whereas a billion is only 1.00 e 9)
Question: What is the need to have the different Message-IDs in the first place?
(it does not seem to be a security-related matter). Could the "fix" be as
simple as a textfile on each user's computer that get's updated with each
message sent (to prevent duplicates)? What am I missing?
Spoo
-
Jan 28th, 2009, 06:53 AM
#21
Fanatic Member
Re: Random string
How about adding a checksum at the end of the file name so the software can confirm that it downloaded the correct file. This will atleast flag an error but if a duplicate filename does exist I'm not sure how you could ensure that the right one will ever get down loaded. With a checksum you may be able to keep retrying different posts until you find the right one. This would imply that you create your own downloader but, hey, why not?
Last edited by technorobbo; Jan 28th, 2009 at 07:03 AM.
-
Jan 28th, 2009, 08:06 AM
#22
Re: Random string
 Originally Posted by wy125
Code doc if you reseed the random number generator each time you call the function you will find a match relatively quick. Remember that his application is used by many different users. Each time they launch the application the random number generator is seeded. The probability that two instances of the application create the same sequences is independent of the length of the generated strings and depend only on the probability that the applications initialized the generator with the same seed value. I estimate that this happens about 1 in about 1000 times or so.
I agree, so here is how I would write the code to generate the unique strings:
(1) Generate several million or so random strings of 45-character length.
(2) Sort the strings and check for a match.
(3) Discard the strings that matched so that all are unique.
(4) If random order is important, scramble the unique strings using the Knuth shuffle.
(5) Store the strings in a file and assign them one at a time.
As the strings are used up, you can assign a byte flag to each so that it does not get used twice.
Last edited by Code Doc; Jan 28th, 2009 at 09:20 AM.
Doctor Ed
-
Jan 28th, 2009, 08:19 AM
#23
Re: Random string
Without checking to eliminate duplicates, there is no guaranty 2 random strings are not the same. You can only reduce the chance of duplicate by appending something such as date-time value.
With a single user in one run, use one Randomize at start up, the native VB Rnd() function can provide 16,777,216 different Double values before it repeat the sequence. Use Randomize repeatedly that might make the duplicate happens sooner.
One of my threads in Code-Bank discusses about Wichmann-Hill Pseudo Random Number Generator.
The idea of appending or combine date-time value with a random key is used by Ms-Access in replicate databases. However, although it's rare, it still may cause conflict when synchronize.
Perhaps appending UserID and Date+Timer of user's PC (in form of Hex) will minimize the chance of duplicate.
-
Jan 28th, 2009, 01:30 PM
#24
Thread Starter
Frenzied Member
Re: Random string
@ leinad31
Maybe I understand you wrong, but I'm talking about the Unix Timstamp of the user's computer and not of the newsserver.
@ Spoo
Thanks for the explanation. I didn't know that.
It doesn't have anything to do with security. Every piece of data on the newsserver has a unique code in order to distinguish them from each other. It's like a national identification number to distinguish people from each other.
@ technorobbo
NNTP does not pay attention to file names. The same file can be uploaded millions of times as long as it has a different Message-ID. Users download the files (with other apps not written by me) by simply sending a "BODY" command with the Message-ID to the newsserver and the server sends the data back (whatever that data may be) that belongs to the Message-ID. The download program decodes all pieces of data with the yEnc algorithm, reads the yEnc header/footer and joins the pieces in the correct order, which results in the original file.
@ Code Doc
I can store Message-ID's in a file, but the problem is/was not that the same Message-ID's are created by the same person, but by two or more people living hundreds or even a few thousand miles away from each other. And some of the users only upload a small file that gets divided into 2 smaller pieces and then won't use the app for weeks/months, so it's good idea, but in those cases creating several million Message-ID's is unnecessary work.
@ anhn
Thank you. I'll have a look at that code and do some tests with it.
Last edited by Chris001; Jan 28th, 2009 at 01:39 PM.
-
Jan 28th, 2009, 01:54 PM
#25
Re: Random string
Chris
Given your reply, may I respectfully suggest that you might be barking
up the wrong tree -- randomizing-wise, that is. Why do you even need
to do anything with random numbers?
If each user has a unique user-id (not heretofore mentioned by you, but
something you probably could do), then it seems to me that all you'd
need to dwell on is keeping track of his/her various messages.
This (UserMessageID) could be done with ..
> a simple counter, growing from 1 to whatever (and stored in a text file)
> a string based on a time-stamp of said user's computer.
Does that make sense?
Would that do the job?
EDIT:
BTW, the ultimate MessageID would be a concatenation of
> UserID
> UserMessageID
Spoo
Last edited by Spoo; Jan 28th, 2009 at 02:10 PM.
-
Jan 28th, 2009, 02:21 PM
#26
Thread Starter
Frenzied Member
Re: Random string
That's a good idea as well. As long as every user has a unique domain-id (perhaps motherboard serial number) a simple counter and a timestamp would suffice.
Thanks Spoo.
I'll mark this thread as resolved. I have enough ideas now to create unique Message-ID's.
Thank you everybody
-
Jan 28th, 2009, 04:22 PM
#27
Re: [RESOLVED] Random string
Actually, I like Spoo's idea also--a simple counter that increments integers, perhaps picking up letters as well along the way. Sweet and simple and hundreds of millions could be generated with no match. I imagine Microsoft and Adobe did something similar with serial numbers. Then they separated the 4-byte chunks with hyphens.
Good luck with your project.
-
Jan 28th, 2009, 04:43 PM
#28
Re: [RESOLVED] Random string
Thanks, Doc. What do I win?
Spoo
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|