PDA

Click to See Complete Forum and Search --> : Unique random number generation?


timeshifter
Aug 28th, 2007, 10:11 AM
So, the way I'm seeing this, there's two main methods. But first, the project scope... user enters lower and upper bounds, inclusive, and the number of values to be generated. Output is a read-only text box.

The first method, and currently in use:
A counter, starting at 0. A Do...Loop restricted to that counter. A random number is created within the bounds specified, and sent to a function, which loops through the current array. If the supplied value matches any value in the array, the function returns false, and the loop tries again. Don't knock me for it, I know it has limits. The closer you get to the max range, the longer it'll take... but it was quick and easy, and i was after 64 numbers from a list of 256. Not a big issue there.

The second method in my head is something as follows:
User enters the parameters. When user clicks Generate, a list is built of every legal value within the bounds. Using the counter mentality, probably a For...Next loop for this instance, the program would simply pick a random number between 0 and List.Count-1, and use that value, then promptly remove it. I can see this as being close to optimum, on the concept that it doesn't loop through someting constantly, and there'd be no risk of choosing duplicates, as a valid number would be removed from the list upon being chosen. My only concern is this: What if the user enters bounds that create a range of millions? That list would get enormous, and could possibly cause damage on an older computer.

What are your thoughts?

jemidiah
Aug 29th, 2007, 05:15 AM
It looks like you're trying to shuffle a list (the computer science terminology for it). Depending on the number of values that should be generated, you could take only a portion of the resulting, shuffled list--so you could use one of the shuffling algorithms that have already been invented and studied in-depth.

Your first method is basically an implementation of the brute force shuffling method (http://en.wikipedia.org/wiki/Random_permutation).

The second is cool, but to remove an element from the list would (at the least) require a "removed" flag for removed items, which would require you check the flag and (if needed) repeat a particular step. You could also remake the list, but that would be much less efficient.

From Wikipedia: (http://en.wikipedia.org/wiki/Shuffle#Shuffling_algorithms)

In a computer, shuffling is equivalent to generating a random permutation of the cards. There are two basic algorithms for doing this, both popularized by Donald Knuth. The first is simply to assign a random number to each card, and then to sort the cards in order of their random numbers. This will generate a random permutation, unless two of the random numbers generated are the same. This can be eliminated either by retrying these cases, or reduced to an arbitrarily low probability by choosing a sufficiently wide range of random number choices.

The second, generally known as the Knuth shuffle or Fisher-Yates shuffle[1], is a linear-time algorithm (as opposed to the previous O(n log n) algorithm if using efficient sorting such as mergesort or heapsort) which involves moving through the pack from top to bottom, swapping each card in turn with another card from a random position in the part of the pack that has not yet been passed through (including itself). Providing that the random numbers are unbiased, this will always generate a random permutation.


Take the number of values generated to be k. Using the 2nd algorithm, you could perform swaps only k times. However, you would still need to keep a list of length n, where n is the number of numbers (integers) in the given range.

From the Knuth Shuffle (http://en.wikipedia.org/wiki/Knuth_shuffle) page:

The Fisher-Yates shuffle is quite efficient; indeed, its asymptotic time and space complexity are optimal.


It would appear that the Knuth Shuffle (method 2) is the best possible (though I wish I could see the citation; source 7 from the Wiki page above doesn't seem to talk about the optimality of this method).


I keep wishing there were a way to use a shorter list but (without checking your list of length k) I can't think of any way to do this, and it doesn't sound like it's possible from the quote above.


Hope this helped

timeshifter
Aug 29th, 2007, 09:40 AM
Certainly an interesting perspective. I hadn't thought of that method. I've successfully implemented the second method in my first post... generating a list of all legal values, picking a random member of that list, adding it to the output list, and removing it from the selection list. Looping that, there will never be any errors, because it's only choosing from what's left... no errors, no duplicates... if only generating that list every time weren't so inefficient...

jemidiah
Aug 29th, 2007, 06:22 PM
The Knuth shuffle described shouldn't be tough to implement if you wanted to go that route, and it would use just as much memory as your current method does initially. It would also eliminate the need to resize your array.

Or, you could optimize your method a bit by using a "deleted" flag/marker (maybe setting the array element to 0) during the first half (or so) of the number generation, and then by resizing the array during the rest of the number generation. Since it would probably take the longest to resize at the start of the process when the array is incredibly long, this would seem to be much more efficient. Also, your chance of having to redo your random choice using the marker system (that is, you'd have to choose another index if the one you chose was marked "deleted") would increase substantially near the end of the number generation.

Cutting out those two effects would probably make the entire routine much faster, though it looks like the Knuth shuffle would be easier to implement than these optimizations.

Good luck!

Logophobic
Aug 29th, 2007, 09:53 PM
Knuth shuffle is indeed very simple. Here is a function that takes the parameters timeshifter described and returns an array containing the chosen numbers.

Public Function UniqueRandomNumbers(ByVal Low As Long, ByVal High As Long, ByVal Number As Long) As Long()
Dim i As Long ' Loop counter
Dim r As Long ' Random number
Dim t As Long ' Temporary
Dim n() As Long ' The list
Dim ub As Long ' Upper bound of list array (to avoid repeatedly calculating High - Low)

' High must not be less than Low
If Low > High Then
t = Low
Low = High
High = t
End If

' Create list of all values in range
ub = High - Low
ReDim n(ub)
For i = 0 To ub
n(i) = Low + i
Next i

' Number must be at least 1
If Number < 1 Then Number = 1

' Number must not be greater than count of values in range
If Number > ub Then
Number = ub + 1
End If

' ' Complete shuffle
' ' Chosen values are moved to end of list (it's easier that way)
' ' All values will be chosen exactly once
' For i = ub To 1 Step -1
' r = Int(Rnd * (i + 1)) ' Range: 0 to i
' t = n(r)
' n(r) = n(i)
' n(i) = t
' Next i

' Partial shuffle
' Chosen values are moved to begining of list (where they need to be)
' Only the specified Number of values will be chosen
For i = 0 To Number - 1
r = Int(Rnd * (ub - i + 1)) + i ' Range: i to ub
t = n(r)
n(r) = n(i)
n(i) = t
Next i

ReDim Preserve n(Number - 1)
UniqueRandomNumbers = n
End Function

timeshifter
Aug 30th, 2007, 10:18 AM
The Knuth shuffle described shouldn't be tough to implement if you wanted to go that route, and it would use just as much memory as your current method does initially. It would also eliminate the need to resize your array.

Or, you could optimize your method a bit by using a "deleted" flag/marker (maybe setting the array element to 0) during the first half (or so) of the number generation, and then by resizing the array during the rest of the number generation. Since it would probably take the longest to resize at the start of the process when the array is incredibly long, this would seem to be much more efficient. Also, your chance of having to redo your random choice using the marker system (that is, you'd have to choose another index if the one you chose was marked "deleted") would increase substantially near the end of the number generation.

Cutting out those two effects would probably make the entire routine much faster, though it looks like the Knuth shuffle would be easier to implement than these optimizations.

Good luck!

Out of a technicality, there's no real "array" going on... it's a list. I just loop through from the lower to upper bounds, adding every number to a list. I then pick a random member of the list, add that to the final list, and remove it from the list of available numbers, so each number can only be chosen once, and there's no looping to verify that a number is indeed unique. If it's in the first list, it hasn't been chosen yet... which seems to work very well indeed. I'll look into the other method for speed determinations, though.