I am trying to create a random byte from a string to use as a seed. What I have so far is this:
Code:
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cbMultiByte As Long, ByVal lpDefaultChar As Long, ByVal lpUsedDefaultChar As Long) As Long
Private Function Seed(sInput As String) As Byte
Dim N%
Dim bInput() As Byte
Dim bResult As Byte
bInput = StrToUtf8(sInput)
For N% = 0 To UBound(bInput)
bResult = bResult Xor bInput(N%)
Next N%
bResult = bResult Xor bInput(8)
Debug.Print bResult
Seed = bResult
End Function
Private Function StrToUtf8(strInput As String) As Byte()
Const CP_UTF8 = 65001
Dim nBytes As Long
Dim bBuffer() As Byte
If Len(strInput) < 1 Then Exit Function
'Get length in bytes *including* terminating null
nBytes = WideCharToMultiByte(CP_UTF8, 0&, ByVal StrPtr(strInput), -1, 0&, 0&, 0&, 0&)
ReDim bBuffer(nBytes - 2) 'Remove terminating byte
nBytes = WideCharToMultiByte(CP_UTF8, 0&, ByVal StrPtr(strInput), -1, ByVal VarPtr(bBuffer(0)), nBytes - 1, 0&, 0&)
StrToUtf8 = bBuffer
End Function
The problem is that because the string I used was all lower case ASCII, the Seed routine does not make full use of the byte (8 to 30 instead of 1 to 255).
Of course there is and you've already used such in TLS implementation. It's called "key derivation function" in which a short key (like a password or a "master key") is *expanded* into a variable sized random sequence of bytes e.g. two sets of 32 bytes each for traffic key, two sets of 12 bytes each for IV and 16 bytes for MAC key (no MAC with AEAD ciphers but you get the point). TLS 1.3 uses HKDF algorithm for this purpose based on HMAC hashes.
You can use BCryptDeriveKeyPBKDF2 API function (uses PBKDF2 algorithm which again uses HMAC as described in RFC 2898) to expand passwords to variable length random byte array (which does not increase password entropy btw) like in this sample code: Simple AES 256-bit password protected encryption
Code:
'--- generate RFC 2898 based derived key
On Error GoTo EH_Unsupported '--- CNG API missing on XP
hResult = BCryptOpenAlgorithmProvider(uCrypto.hPbkdf2Alg, StrPtr("SHA1"), StrPtr(MS_PRIMITIVE_PROVIDER), BCRYPT_ALG_HANDLE_HMAC_FLAG)
On Error GoTo 0
ReDim baDerivedKey(0 To 2 * lKeyLen + 1) As Byte
On Error GoTo EH_Unsupported '--- PBKDF2 API missing on Vista
hResult = BCryptDeriveKeyPBKDF2(uCrypto.hPbkdf2Alg, baPass(0), UBound(baPass) + 1, baSalt(0), UBound(baSalt) + 1, 1000, 0, baDerivedKey(0), UBound(baDerivedKey) + 1, 0)
On Error GoTo 0
. . . where baPass is the password and the output baDerivedKey receives expanded key to whatever size you want (still keeping the original password entropy though).
To generate a truly random string (ANSI), I'd probably do something like the following:
Code:
Option Explicit
'
Private Declare Function CryptAcquireContextW Lib "advapi32.dll" (hProv As Long, ByVal pszContainer As Long, ByVal pszProvider As Long, ByVal dwProvType As Long, ByVal dwFlags As Long) As Boolean
Private Declare Function CryptGenRandom Lib "advapi32.dll" (ByVal hProv As Long, ByVal dwlen As Long, pbBuffer As Any) As Boolean
Private Declare Function CryptReleaseContext Lib "advapi32.dll" (ByVal hProv As Long, ByVal dwFlags As Long) As Long
'
Public Function RandomAnsiString(iLen As Long) As String
' Generates random ANSI strings with characters in the full range of &h00 to &hff.
Dim hCrypt As Long
Const PROV_RSA_FULL As Long = 1&
Const CRYPT_VERIFYCONTEXT As Long = &HF0000000
'
Dim bb() As Byte
ReDim bb(iLen - 1&)
'
Call CryptAcquireContextW(hCrypt, 0&, 0&, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT) ' Initialize advapi32.
Call CryptGenRandom(hCrypt, iLen, bb(0)) ' Get our random bytes.
Call CryptReleaseContext(hCrypt, 0&) ' Turn off advapi32.
'
RandomAnsiString = StrConv(bb, vbUnicode) ' Put ANSI bytes into Unicode VB6 string.
End Function
I was curious, so I generated a "few" random characters and then saw what the frequency distribution looked like. Here's the code I used to generate the characters:
Code:
Private Sub Form_Load()
' Test "flatness" of single ANSI characters from above function.
Dim i As Long
Const TestCount As Long = 50000
' Generate some random ANSI values.
Dim sa(TestCount) As String
For i = 1& To TestCount
sa(i) = RandomAnsiString(1&)
Next
' Count how many we got of each.
Dim bb(255&)
Dim j As Long
For i = 1& To TestCount
j = Asc(sa(i))
bb(j) = bb(j) + 1&
Next
' Dump frequencies. (In two passes, so we don't overflow the Immediate window.)
For i = 0& To 127&
Debug.Print i, bb(i)
Next
Stop
For i = 128& To 255&
Debug.Print i, bb(i)
Next
Stop
End Sub
And here's the frequencies it generated after one pass:
I didn't perform any statistical test, but that looks like a fairly "flat" (uniform) distribution to me.
p.s. If you wanted Unicode characters, it wouldn't take much adjusting of that function to get those. Basically, double the size of the bb() array, double the size of the random data you request, and then just directly assign the bb() array to the returned string value.
Last edited by Elroy; Oct 3rd, 2021 at 09:05 AM.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
I didn't follow all the details on the thread, but I wanted to comment a couple of things:
1) By definition, if the OP wants a "random value" from a String, the values returned must be tied to the given String, so they won't be "random".
I mean, the same String must produce the same "random value".
2) Does "truly random" really exist, or only values produced by complex processes that we don't know how to predict? (I mean, in the Universe).
Well, don't worry about this second one.
Yeah, I wasn't clear on what the OP wanted as well.
I initially read it as wanting a random string that could be used as a seed. I thought about commenting that that's typically not how a seed is used, but maybe I misunderstood.
Yes, if we're just wanting a seed "from a string", and we want the same string to generate the same seed each time, then yeah, a hash sounds like the correct answer (as Trick noted).
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Hashing is when you have a long input (e.g. a file) and produce short output (e.g. a 32 byte hash).
"Expanding" is the opposite of hashing i.e. deriving a long key (100-1000 bytes) from a short key (a password or a master secret) which of course cannot produce more randomness than the initial short key already possesses (e.g. a password of only small latin latters reduces this a lot) but the point is not to lose any randomness in the process.
This is what PBKDF2 can be used for (there are many other algorithms) and it also has a parameter for number of iterations to perform which manually slows down the whole process so the same algorithm can be used to store password hashes as well.
Ok, just for grins, I made a "hashing" algorithm. Couttsj, if you're not concerned with super privacy between your string and the hash (such as something like SHA-3), then this will do for you:
Code:
Option Explicit
'
Private Declare Function GetMem4 Lib "msvbvm60" (ByRef Source As Any, ByRef Dest As Any) As Long ' Always ignore the returned value, it's useless.
Public Function SeedFromAnsiString(sAnsi As String) As Single
'
' Transfer string to bytes, forcing ANSI, ignoring second byte of each character.
Dim bb() As Byte
bb = StrConv(sAnsi, vbFromUnicode)
'
' Make sure length if multiple of 4.
Dim iLen As Long
If Len(sAnsi) = 0& Then iLen = 3& Else iLen = Len(sAnsi) + 2&
While (iLen + 1&) Mod 4&: iLen = iLen - 1&: Wend ' Zero based, multiple of 4.
If UBound(bb) <> iLen Then ReDim Preserve bb(iLen) ' Make adjustment, if necessary.
'
' Make Longs and XOR them.
Dim iLong() As Long
ReDim iLong(iLen \ 4)
Dim iHash As Long
Dim iHold As Long
Dim i As Long
For i = 0& To iLen Step 4&
GetMem4 bb(i), iHold
iHash = iHash Xor iHold
Next
'
' Move iHash into a single, and make sure it's a valid IEEE Single.
' We check the Inf and NaN possibility first, as it's easier as a Long.
' Basically, if the &H7F800000 are all on, it's either Inf or NaN.
If (iHash And &H7F800000) = &H7F800000 Then iHash = iHash And Not &H7F800000
GetMem4 iHash, SeedFromAnsiString ' And now we can move it, creating a hash (non Inf, non NaN) Single.
End Function
I did make a few assumptions:
1) That we're dealing with ANSI strings.
2) That the VB6 Randomize seed is a Single. I researched and couldn't find a definitive answer for this, but I did find some highly suggestive comments. So, that's the hash that I generated (a Single).
Just as some further notes, I tested this "seed as Single" assumption a bit. If you do the following in the Immediate window, you get the same starting random number:
So, I'm thinking that a "Single" for the seed is correct.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
It appears that I did not describe the purpose adequately. The seed is used to shuffle the string bytes, and the shuffle must be reversible. That is to say, the old string must produce a seed that is used to create a new shuffled string, and can be used to unshuffle the new string. That is the reason for the extra Xor function, as the same bytes in a different order produce the same seed.
In addition, I have to duplicate this process in JavaScript, and some browsers restrict the use of crypto functions to HTTPS only (eg. Google Chrome & others). I actually had it working, but I wanted to test the function to see how extendable it was. Testing was much easier to do in the VB6 version, and what I found was that sometimes the seed would get stuck on the same result, and sometimes it would actually go to zero.
This is hard for me to explain, so I have created a sample (attached). I did notice that I got a wider seed calculation when at least one capital character is used.
J.A. Coutts
Last edited by couttsj; Oct 3rd, 2021 at 03:30 PM.
so, you want to create a seed that can both "shuffle" and "un-shuffle"
instead of seed we could call it "key", we generate a key that suffles a string, and the same key can un-shuffle it.
the easiest way would be to make the string into bytes, the same with the "key"
using xor method, you shuffle the bytes together with the key, like
byte(x) = byte(x) Xor byte(y) Xor key(z)
and of course x/y/z can be in any way as long it can be mirrored, like an encode/decoding function. that should shuffle it quite well.
the encoding/decoding are mirrors. the function I use to encode the string is not the same as the decode, even if they both use the same key.
but it all depends on the complexity of the encoding of course.
I ran several text strings with varying lengths using seed values 1 through 254, and there were no duplications in the shuffled strings. Then I ran the string "UserIDpassword" through 1270 loops with no duplications. The occasional repeat is not a problem, but with 2.631308369 E+35 possible permutations, that should be a rarity. It all depends on getting a diverse seed calculation, and the current routine I am using does not provide that.
Just FYI, there are nowhere near 2.631308369 E+35 possible permutations for a seed. At most, there are 2^32 - 2^24 permutations (4,278,190,080).
Where did I get that number? From everything I can tell, the seed is a Single, which is four bytes. The possible permutations for four bytes is 2^32. However, when all of the 8 exponent bits are on, it's either NaN or Inf (which I assumed to be invalid seeds), and that's the minus 2^24. Admittedly, I didn't check to see what happens with different NaN (or Inf) seeds. Also, I didn't check any sub-normal values to make sure they're acceptable as seeds either (with different sub-normals resulting in a different seed, different starting random number).
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
It appears that Inf (infinity) does provide a valid seed. However, NaN values cause an overflow error when we attempt to use them as a seed.
So, with this, I guess the answer to the number of valid seeds (resulting in different starting points) is 2^32 - 2^24 + 1
Testing Code:
Code:
Option Explicit
Private Declare Function GetMem4 Lib "msvbvm60" (ByRef Source As Any, ByRef Dest As Any) As Long ' Always ignore the returned value, it's useless.
Private Sub Form_Load()
Const NanOrInf As Long = &H7F800000
Dim f As Single
Dim i As Long
i = NanOrInf: GetMem4 i, f ' Creates an Inf Single.
Rnd -1: Randomize f
Debug.Print Rnd ' Reports 0.5835753
i = NanOrInf Or &H1&: GetMem4 i, f ' Creates a NaN Single.
Rnd -1: Randomize f
Debug.Print Rnd ' Overflow error.
Unload Me
End Sub
And sub-normals seem to work fine.
Test Code:
Code:
Option Explicit
Private Declare Function GetMem4 Lib "msvbvm60" (ByRef Source As Any, ByRef Dest As Any) As Long ' Always ignore the returned value, it's useless.
Private Sub Form_Load()
Dim f As Single
Dim i As Long
i = &H1&: GetMem4 i, f ' Creates a Sub-Normal Single.
Rnd -1: Randomize f
Debug.Print Rnd ' Reports 0.1927062
i = &H2&: GetMem4 i, f ' Creates a Sub-Normal Single.
Rnd -1: Randomize f
Debug.Print Rnd ' Reports 0.4419737
i = &H1001&: GetMem4 i, f ' Creates a Sub-Normal Single.
Rnd -1: Randomize f
Debug.Print Rnd ' Reports 0.1956359
Unload Me
End Sub
Personally, I'm guessing that the Rnd algorithm just does everything as an IEEE Single. The Rnd function certainly returns a Single, so this seems reasonable.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Also, it'd be easy to modify my code in post #8 to be more random. The first thought that comes to my mind is a bit-shift-and-wrap based on the character position of each character in the input string.
Also, regarding JavaScript, I'm sure you can find a way to copy four bytes from one variable type to another. But, a line-for-line translation might be difficult (if not impossible). For one, JavaScript variables are very loosely typed. Basically, they're all like Variants (with specific typing option). Also, it appears that Math.Random (in JavaScript) returns an IEEE Double (in one of those loosely typed variables). Therefore, it's essentially going to be a completely different algorithm with seeds behaving differently and the returned sequence being different.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Just FYI, there are nowhere near 2.631308369 E+35 possible permutations for a seed. At most, there are 2^32 - 2^24 permutations (4,278,190,080).
Where did I get that number? From everything I can tell, the seed is a Single, which is four bytes. The possible permutations for four bytes is 2^32. However, when all of the 8 exponent bits are on, it's either NaN or Inf (which I assumed to be invalid seeds), and that's the minus 2^24. Admittedly, I didn't check to see what happens with different NaN (or Inf) seeds. Also, I didn't check any sub-normal values to make sure they're acceptable as seeds either (with different sub-normals resulting in a different seed, different starting random number).
You are right. That permutation count is wrong. I used an online calculator for 32 characters when I was using an SHA-256 hash. For the 14 character string I am now using, the number is 87,178,291,200. The seed is a single byte, and I was able to get a range of 4 to 254 by adding weight to each character in the string.
Code:
bResult = bResult Xor bInput(N%) + CByte(N%)
It will take more testing, but hopefully this will provide the results I am looking for.