Results 1 to 28 of 28

Thread: [RESOLVED] Random string

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Resolved [RESOLVED] Random string

    Hi, I've created a small tool to upload binary files to Usenet newsgroups. Every binary file is cut into smaller pieces (about 400kB) and then uploaded to the newsserver. Every piece has its own code (Message-ID) that I put in the header.

    I create Message-ID's of about 45 characters. When creating random strings of about 45 characters there should be billions of combinations, but many of my users get incomplete uploads, because the application creates Message-ID's already created/used by other users.

    Why do so many people create the same strings? Is it because of the Randomize in the function instead of the form_load event?

    This is the code I use to create the Message-ID.
    vb Code:
    1. Private Const CHARS = "123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    2.  
    3. Private Function RandomString() As String
    4.     Dim i As Long
    5.     Dim tmp As String
    6.  
    7.     Randomize
    8.     For i = 1 To 28
    9.         tmp = tmp & Mid(CHARS, Int(52 * Rnd) + 1, 1)
    10.     Next i
    11.  
    12.     RandomString = tmp & "@application.local"
    13. End Function

    I also tried this function, but the same problem occurs.
    vb Code:
    1. Public Function RandomString() As String
    2.     Dim i       As Long
    3.     Dim btByte  As Byte
    4.  
    5.     Randomize
    6.     For i = 1 To 28
    7.         btByte = Int(Rnd() * 127)
    8.         Select Case btByte
    9.             Case 48 To 57
    10.                 RandomString = RandomString & Chr(btByte)
    11.             Case 65 To 90
    12.                 RandomString = RandomString & Chr(btByte)
    13.             Case 97 To 122
    14.                 RandomString = RandomString & Chr(btByte)
    15.             Case Else
    16.                 i = i - 1
    17.         End Select
    18.     Next i
    19.         RandomString = RandomString & "@application.local"
    20. End Function
    Last edited by Chris001; Jan 27th, 2009 at 03:34 PM.

  2. #2
    PowerPoster
    Join Date
    Feb 2002
    Location
    Canada, Toronto
    Posts
    5,803

    Re: Random string

    Quote Originally Posted by Chris001
    but many of my users get incomplete uploads, because the application creates Message-ID's already created/used by other users.
    Are you SURE that's what is happening ?

    How many users are we talking about here ? I mean if you had hundreds of thousand users, then maybe there is a chance that you would get 1 or 2 identical strings.

    But let's say the chances are high to get the same string, then you should prefix the string with their user ID (user name) because that is unique to all of them, right ?

    You can also try another way to create random strings, like GUID for example:
    http://www.devx.com/vb2themax/Tip/18261

    Code:
    Private Declare Function CoCreateGuid_Alt Lib "OLE32.DLL" Alias "CoCreateGuid" (pGuid As Any) As Long
    Private Declare Function StringFromGUID2_Alt Lib "OLE32.DLL" Alias "StringFromGUID2" (pGuid As Any, ByVal address As Long, ByVal Max As Long) As Long
    
    Function CreateGUID() As String
        Dim res As String, resLen As Long, guid(15) As Byte
        res = Space$(128)
        CoCreateGuid_Alt guid(0)
        resLen = StringFromGUID2_Alt(guid(0), ByVal StrPtr(res), 128)
        CreateGUID = Left$(res, resLen - 1)
    End Function
    
    Private Sub Form_Load()
        MsgBox CreateGUID
    End Sub
    Just remove the {} and - characters

  3. #3
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Random string

    You are using Randomize without a seed. When not provided, the system timer is used as the seed per MSDN.

    If the same seed is used then it can generate the same random numbers.

    Maybe consider prefixing your random string with a string that should never be duplicated. Maybe prefix it with the serial number of a hard drive or other hardware. Maybe using combinations of hardware serial numbers and software information/serial numbers. By prefixing it with something that should be completely unique, the remaining random characters are basically fiiller or can be used for addiitonal header info.

    Edited: I see CVMichael and I were on the same wavelength
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  4. #4
    Hyperactive Member
    Join Date
    Mar 2002
    Location
    Boston, MA
    Posts
    391

    Re: Random string

    Hmm... I'm really surprised. I don't think you need to keep calling randomize in the function as like you say you could put it in the form's load event but I don't think that explains your issue.

    You're getting the same numbers as others are generating? Everything you're doing looks correct to me. Well it does look like you're not using the full set of your chars (you have 52 + 9 there not just 52) but that doesn't explain why you're getting the same strings as others. Are you certain it's happening? Also, what about tagging your files with something personal like a nickname or even a time stamp. That should ensure uniqueness.

    Edit: whoa that was some quick responses. sorry for any redundancy

  5. #5
    Super Moderator si_the_geek's Avatar
    Join Date
    Jul 2002
    Location
    Bristol, UK
    Posts
    41,974

    Re: Random string

    Calling Randomize repeatedly actually makes it less random, there is a thread in the CodeBank (or maybe UtilityBank) that explains in detail.

    You should normally only call it once, when your program starts.

  6. #6
    PowerPoster
    Join Date
    Feb 2002
    Location
    Canada, Toronto
    Posts
    5,803

    Re: Random string

    Isn't it weird how we all post at the same time (almost), yet the thread was started an hour ago... hmmm...

  7. #7
    Junior Member
    Join Date
    Dec 2008
    Posts
    17

    Re: Random string

    Try this function below. This function randomize your selection in two steps. First it randomize selection between digits, small letters, capital letters. The second step it randomize within the previously randomly selected category.

    Code:
    Const MAXLEN As Integer = 45
    
    Private Function RandomString() As String
        Dim iChar As Integer
        Randomize
        
        For i = 1 To MAXLEN
            Select Case Int((3 * Rnd) + 1)
            Case 1
                iChar = Int(Asc("9") - Asc("0")) * Rnd + Asc("0")           'This to return 0 - 9
            Case 2
                iChar = Int(Asc("Z") - Asc("A")) * Rnd + Asc("A")           'This to return A - Z
            Case 3
                iChar = Int(Asc("z") - Asc("a")) * Rnd + Asc("a")           'This to return a - z
            End Select
            RandomString = RandomString & CStr(Chr(iChar))
        Next i
    End Function
    Guaranteed against DOA

  8. #8

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Random string

    Thanks for the replies.

    The application is quite new, but so far it has been downloaded about 1000 times.

    I'm quite sure that here and there the same Message-ID's are created. I did a test upload this afternoon and one piece wasn't uploaded to the server. When I tried to download the missing piece by using its Message-ID, I found out that there already was an article on the newsserver using that exact same Message-ID, but it belonged to a totally different file uploaded by somebody else a few days ago.

    One of the users told me he had uploaded files with a total of about 2100 pieces (about 800MB) and when the upload was finished several pieces were missing. He gave me the Message-ID's of the missing pieces (created by my app) and I found out that those Message-ID's were already used as well by other users, in the past few weeks, for totally different files.


    I'm going to try GUID. It seems to be a lot faster than both of the functions I posted above. I'll try tfbasta's function too. Instead of "application.local" I'll use the VolumeSerialNumber of the hard drive (the real serial number of the hard drive requires the user to be logged in as Administrator).

  9. #9
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Random string

    Chris, you mentioned "past few weeks"? How long are you going to make the message IDs remain unique? This could be a potential logic killer. Even using GUID and serial numbers, I think it could be possible that the same GUID be created on the same computer if it was run enough times. Maybe think about ways to prevent having to go back and check several weeks of data against today's data.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  10. #10

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Random string

    My application does not keep track of all Message-ID's used. It simply creates them and that's it.

    The problem is not that the same Message-ID is created on one computer, but the same Message-ID is created on two (or even more) computers a few hundred miles away from each other.

    It's really not a problem if once in a while there's a piece missing from the newsservers, because most of the time Par2 repair files (google QuickPar) are posted as well, in order to repair corrupted data. I can understand that on rare occasions it might be possible that one user in America creates a same Message-ID as somebody else did a month ago in Italy while uploading a file, but some people have several pieces missing almost every time they upload files.

  11. #11
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629

    Re: Random string

    Problem with random is there could be bias which increases the chance that values are regenerated (you won't consume all possible combinations just once, most will be duplicated instead). Pure randomization won't work, there has to be a form of organization/partition info embedded in the key like encryption. Rather than focusing on random string generator per se, step back and assess the bigger picture... What kind of "uniqueness scheme" would you want and can be supported by the existing system and tech know-how?

    - Unique hash value supplied by server (static per user? based on other criteria?) that is joined to another unique value generated at client (again what is used to seed has value?)?
    - Hash based on the binary of file? Binary of filename? UNC path (prepend with IP)?
    - Control file + file slices implementation? Unique key for control file and part of key for file slice refers to the control file?
    - MAC address (if you can retrieve this without admin loged in)?

  12. #12
    Fanatic Member technorobbo's Avatar
    Join Date
    Dec 2008
    Location
    Chicago
    Posts
    864

    Re: Random string

    Your dealing with probabitlies and people win the lottery everyday. Try increasing the string length to increase the odds, Perhaps this will help:

    http://en.wikipedia.org/wiki/Infinit...opular_culture
    Have Fun,

    TR
    _____________________________
    Check out my Alpha DogFighter2D Game Demo and Source code. Direct Download:http://home.comcast.net/~technorobbo/Alpha.zip or Read about it in the forum:http://www.vbforums.com/showthread.php?t=551700. Now in 3D!!! http://home.comcast.net/~technorobbo/AlPha3D.zip or read about it in the forum: http://www.vbforums.com/showthread.php?goto=newpost&t=552560 and IChessChat3D internet chess game

  13. #13
    Hyperactive Member
    Join Date
    Mar 2002
    Location
    Boston, MA
    Posts
    391

    Re: Random string

    I couldn't find any detailed documentation about how the Randomize function is implemented. All that is said that it seeds the random number generator based upon the system timer. So the question is how likely is it that the seeding will be the same. Anyway I wrote a quick test to see. It generates sequences of 10 numbers 1 through 52 (to simulate your alphabet). Sequences repeat about 1 out of 35 times. So definitely having the randomize statement in your function so that it is repeatedly called is a bad idea.

    It does look like you will still have problems even if you call randomize once so you will need to do something else to ensure uniqueness.

    This might help: http://nayyeri.net/blog/generating-r...trings-in-net/

    Edit: I initially wrote my test a bit differently than chris programmed. I rewrote it to match how he had it set up and the sequences repeat only about 1 out of 2000 times.
    Last edited by wy125; Jan 27th, 2009 at 08:18 PM.

  14. #14
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629

    Re: Random string

    Quote Originally Posted by technorobbo
    Your dealing with probabitlies and people win the lottery everyday. Try increasing the string length to increase the odds, Perhaps this will help:

    http://en.wikipedia.org/wiki/Infinit...opular_culture
    Given the assumption that number of files uploaded and their sizes (more parts) will increase sooner or later pure randomization will still hit a performance ceiling and you would need to increase the length once again raising backward compatibility issues of design. Key wherein some parts are extensible and processing is backward compatible (regardless of length of this part of key, length of other parts as they are), or key version embedded (divergent processing for different versions) would scale better.

  15. #15
    Hyperactive Member
    Join Date
    Mar 2002
    Location
    Boston, MA
    Posts
    391

    Re: Random string

    Longer keys won't matter at all. A certain percentage will be identical if he keeps his current strategy of relying on the random number generator

  16. #16

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Random string

    I'm using GUID to create a 32 character string and I'm using the Unix Timestamp (number of seconds since 1970) as Unique Domain-ID.

    When the user starts the application I get the current Unix Timstamp, for example 1233108812.

    So unless two users start the application at the same second it will never be possible for them to get the same Message-ID. If that's not enough, then I'll add the Volume Serial Number of the hard drive as well.

  17. #17
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Re: Random string

    Chris, If I were to write a routine today to generate random strings that were 45 characters in length using ASCII characters ranging from 1 to 255 and you let the progran run continuously, it might generate a matched pair of 45-character strings sometime after you and I were both dead.

    BUT, it could generate a matched pair in a microsecond.
    Doctor Ed

  18. #18
    Hyperactive Member
    Join Date
    Mar 2002
    Location
    Boston, MA
    Posts
    391

    Re: Random string

    Code doc if you reseed the random number generator each time you call the function you will find a match relatively quick. Remember that his application is used by many different users. Each time they launch the application the random number generator is seeded. The probability that two instances of the application create the same sequences is independent of the length of the generated strings and depend only on the probability that the applications initialized the generator with the same seed value. I estimate that this happens about 1 in about 1000 times or so.

  19. #19
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629

    Re: Random string

    Quote Originally Posted by Chris001
    I'm using GUID to create a 32 character string and I'm using the Unix Timestamp (number of seconds since 1970) as Unique Domain-ID.

    When the user starts the application I get the current Unix Timstamp, for example 1233108812.

    So unless two users start the application at the same second it will never be possible for them to get the same Message-ID. If that's not enough, then I'll add the Volume Serial Number of the hard drive as well.
    Actually its harder to get unique records/files based on time, especially when multiple servers are involved.

    What if time was accidentally reset to an earlier value (e.g. mboard change, hardware problems do occur... or invalid time was initially used, then corrected and now duplicates occur)... also that will work only for a single server implementation, if you want to scale to a server farm then their time (at each server) will have to be synchronized.

  20. #20
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Random string

    Quote Originally Posted by Chris001
    I create Message-ID's of about 45 characters. When creating random strings of about 45 characters there should be billions of combinations, but many of my users get incomplete uploads, because the application creates Message-ID's already created/used by other users.
    Chris

    As a latecomer, I don't have much to add regarding randomizing. So,
    an observation and then a question..

    Observation: Billions?? Try gazillions. Just doing 26 lower-case letters .. 26 ^ 26
    equals 6.15 e 36 (whereas a billion is only 1.00 e 9)

    Question: What is the need to have the different Message-IDs in the first place?
    (it does not seem to be a security-related matter). Could the "fix" be as
    simple as a textfile on each user's computer that get's updated with each
    message sent (to prevent duplicates)? What am I missing?

    Spoo

  21. #21
    Fanatic Member technorobbo's Avatar
    Join Date
    Dec 2008
    Location
    Chicago
    Posts
    864

    Re: Random string

    How about adding a checksum at the end of the file name so the software can confirm that it downloaded the correct file. This will atleast flag an error but if a duplicate filename does exist I'm not sure how you could ensure that the right one will ever get down loaded. With a checksum you may be able to keep retrying different posts until you find the right one. This would imply that you create your own downloader but, hey, why not?
    Last edited by technorobbo; Jan 28th, 2009 at 07:03 AM.
    Have Fun,

    TR
    _____________________________
    Check out my Alpha DogFighter2D Game Demo and Source code. Direct Download:http://home.comcast.net/~technorobbo/Alpha.zip or Read about it in the forum:http://www.vbforums.com/showthread.php?t=551700. Now in 3D!!! http://home.comcast.net/~technorobbo/AlPha3D.zip or read about it in the forum: http://www.vbforums.com/showthread.php?goto=newpost&t=552560 and IChessChat3D internet chess game

  22. #22
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Re: Random string

    Quote Originally Posted by wy125
    Code doc if you reseed the random number generator each time you call the function you will find a match relatively quick. Remember that his application is used by many different users. Each time they launch the application the random number generator is seeded. The probability that two instances of the application create the same sequences is independent of the length of the generated strings and depend only on the probability that the applications initialized the generator with the same seed value. I estimate that this happens about 1 in about 1000 times or so.
    I agree, so here is how I would write the code to generate the unique strings:

    (1) Generate several million or so random strings of 45-character length.
    (2) Sort the strings and check for a match.
    (3) Discard the strings that matched so that all are unique.
    (4) If random order is important, scramble the unique strings using the Knuth shuffle.
    (5) Store the strings in a file and assign them one at a time.

    As the strings are used up, you can assign a byte flag to each so that it does not get used twice.
    Last edited by Code Doc; Jan 28th, 2009 at 09:20 AM.
    Doctor Ed

  23. #23
    Head Hunted anhn's Avatar
    Join Date
    Aug 2007
    Location
    Australia
    Posts
    3,669

    Re: Random string

    Without checking to eliminate duplicates, there is no guaranty 2 random strings are not the same. You can only reduce the chance of duplicate by appending something such as date-time value.

    With a single user in one run, use one Randomize at start up, the native VB Rnd() function can provide 16,777,216 different Double values before it repeat the sequence. Use Randomize repeatedly that might make the duplicate happens sooner.

    One of my threads in Code-Bank discusses about Wichmann-Hill Pseudo Random Number Generator.

    The idea of appending or combine date-time value with a random key is used by Ms-Access in replicate databases. However, although it's rare, it still may cause conflict when synchronize.

    Perhaps appending UserID and Date+Timer of user's PC (in form of Hex) will minimize the chance of duplicate.
    • Don't forget to use [CODE]your code here[/CODE] when posting code
    • If your question was answered please use Thread Tools to mark your thread [RESOLVED]
    • Don't forget to RATE helpful posts

    • Baby Steps a guided tour
    • IsDigits() and IsNumber() functions • Wichmann-Hill Random() function • >> and << functions for VB • CopyFileByChunk

  24. #24

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Random string

    @ leinad31

    Maybe I understand you wrong, but I'm talking about the Unix Timstamp of the user's computer and not of the newsserver.


    @ Spoo

    Thanks for the explanation. I didn't know that.

    It doesn't have anything to do with security. Every piece of data on the newsserver has a unique code in order to distinguish them from each other. It's like a national identification number to distinguish people from each other.


    @ technorobbo

    NNTP does not pay attention to file names. The same file can be uploaded millions of times as long as it has a different Message-ID. Users download the files (with other apps not written by me) by simply sending a "BODY" command with the Message-ID to the newsserver and the server sends the data back (whatever that data may be) that belongs to the Message-ID. The download program decodes all pieces of data with the yEnc algorithm, reads the yEnc header/footer and joins the pieces in the correct order, which results in the original file.


    @ Code Doc

    I can store Message-ID's in a file, but the problem is/was not that the same Message-ID's are created by the same person, but by two or more people living hundreds or even a few thousand miles away from each other. And some of the users only upload a small file that gets divided into 2 smaller pieces and then won't use the app for weeks/months, so it's good idea, but in those cases creating several million Message-ID's is unnecessary work.


    @ anhn

    Thank you. I'll have a look at that code and do some tests with it.
    Last edited by Chris001; Jan 28th, 2009 at 01:39 PM.

  25. #25
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Random string

    Chris

    Given your reply, may I respectfully suggest that you might be barking
    up the wrong tree -- randomizing-wise, that is. Why do you even need
    to do anything with random numbers?

    If each user has a unique user-id (not heretofore mentioned by you, but
    something you probably could do), then it seems to me that all you'd
    need to dwell on is keeping track of his/her various messages.

    This (UserMessageID) could be done with ..

    > a simple counter, growing from 1 to whatever (and stored in a text file)
    > a string based on a time-stamp of said user's computer.

    Does that make sense?
    Would that do the job?

    EDIT:
    BTW, the ultimate MessageID would be a concatenation of

    > UserID
    > UserMessageID

    Spoo
    Last edited by Spoo; Jan 28th, 2009 at 02:10 PM.

  26. #26

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Random string

    That's a good idea as well. As long as every user has a unique domain-id (perhaps motherboard serial number) a simple counter and a timestamp would suffice.

    Thanks Spoo.



    I'll mark this thread as resolved. I have enough ideas now to create unique Message-ID's.

    Thank you everybody

  27. #27
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Re: [RESOLVED] Random string

    Actually, I like Spoo's idea also--a simple counter that increments integers, perhaps picking up letters as well along the way. Sweet and simple and hundreds of millions could be generated with no match. I imagine Microsoft and Adobe did something similar with serial numbers. Then they separated the 4-byte chunks with hyphens.

    Good luck with your project.
    Doctor Ed

  28. #28
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: [RESOLVED] Random string

    Thanks, Doc. What do I win?

    Spoo

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width