-
Jul 31st, 2015, 03:39 PM
#1
Thread Starter
Member
Generating a unique fixed-length hash in VB6
Hello, all!
I've been researching this for a while now, and I'm still not coming up with exactly what I'm looking for. Perhaps I'm just being dense, but I can't seem to translate any of the answers I've found into an actual, workable solution for my current situation.
I need to generate a unique ID that can be stored in a database to retrieve a specific record with the following requirements:
- Unique
- Non-Sequential
- Fixed-length of 10 characters (The specific length could potentially change before final implementation of the solution, depending on what the "higher-ups" decide they want. Of course, if it were to change, it would probably be to decrease the length unfortunately)
- Alphanumeric (preferably upper-case only)
- VB6 (I know, I know...)
The "input" available for generating this code is comprised of at least three separate elements:
- Customer number (fixed-length)
- Account number (variable-length)
- A specific date
These three elements should ensure the uniqueness of the record to which the generated ID is tied.
The overall objective is to provide a user with this code on a printed and mailed notification letter. The user will be able to enter this code on our company's Web site for submitting information back to us. The unique aspect is important to ensure that no two users receive the same code, thereby submitting information which is tied to the wrong data record.
I could potentially exclude the date from the above "input" values, but I'm including it at this point to ensure that, even if we send multiple notification letters to the same individual (customer number & account number), we will always pull the correct database record (our company will never send two notification letters to the same individual on the same day). If necessary, I could potentially include additional data elements for input, but I fear that this could cause even more problems than it solves.
I'm sure I could bang my head on my desk for a while longer and eventually work something out, but my brain is starting to hurt with all of the search results I've reviewed so far. I was wondering if someone here could help point me in the right direction for finding a viable solution. Thank you in advance for all of your assistance.
Last edited by G_Hosa_Phat; Jul 31st, 2015 at 04:09 PM.
-
Jul 31st, 2015, 04:35 PM
#2
Re: Generating a unique fixed-length hash in VB6
Normally a hash has far fewer bits of information than its inputs. For this reason hashes are not normally unique but merely "approximately unique" and when used for retrieval you have to expect multiple hits and then walk through until you find the one you really want.
Normally you want "approximately unique" to be as unique as possible, to limit the size of the resulting hit list to be walked.
The only way to make a unique hash is to strip out noise bits from each of your input data items and then concatenate the remaining bits.
For example an input consisting of digits 0 to 9 only needs 4 bits per digit instead a whole byte for each digit. But using a numeric Byte, Integer, or Long can be even briefer (a Byte can handle 0 to 255, Integer -32,768 to 32,767, Long -2,147,483,648 to 2,147,483,647). Same goes for currency and real numeric types.
Strings that only use the 7-bit ASCII subset could be packed together saving 1 bit per character compared to ANSI. You could even pack things tighter if you exclude control characters, but not trivially.
However trying to shoehorn your data (you didn't give sizes) into 10 uppercase alphanumeric characters... well uniqueness doesn't seem practical: 36^10 = &HCFD41B9100000 or somewhat under 52 bits of uniqueness.
Last edited by dilettante; Jul 31st, 2015 at 04:38 PM.
-
Jul 31st, 2015, 05:10 PM
#3
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
@delettante - Thank you for your response. Perhaps a full-fledged hash of the data is going a bit "too far" in this situation, as the only real requirement is to have a unique, non-sequential, alphanumeric, 10-character code generated for each of the notification letters our company sends out (I might be able to talk them into expanding the size a bit, but not much more than 10). Based on my estimations, we probably produce less than 100,000 of these letters per year, so the "limitations" of 36^10 unique codes should be acceptable for the foreseeable future.
Maybe I could/should just try to use a random generator of some sort to produce the code, then put it on a loop that checks the database until I get one that hasn't already been used. I just thought that finding a way to hash data that should be unique to begin with would help to guarantee that the generated code would end up unique. Any suggestions would be appreciated.
-
Jul 31st, 2015, 05:35 PM
#4
Re: Generating a unique fixed-length hash in VB6
No, your problem will be trying to invent a hash function that can map your inputs to those 36^10 values with no duplication. It just isn't possible, you have under 52 bits of uniqueness. Hashing is not a magic trick.
The only solution I can think of is a database table or equivalent that contains all of your input fields and indexes them together as a key along with a conventional IDENTITY type field to serve as your "hash." Every time you have a record you look it up in the table, if you get a hit you have your "hash value." If not you store a new record in the table, auto-generating the next IDENTITY feld value.
To keep these from being altered by database compaction, etc. you might have to use a regular data field and increment the highest value to get a new value. This means even more chruning away on the database, and probably indexing that field too.
At that point you are far worse off than before you began this unless storage is cheap and new records are rare. Then it is merely sluggish.
If somebody has a better idea I'd love to hear it.
-
Jul 31st, 2015, 05:39 PM
#5
Re: Generating a unique fixed-length hash in VB6
Another way to look at hashing is as lossy compression.
Since you want lossless compression there are extreme limits to what can be accomplished. At best you'll probably only be able to squeeze a 50% reduction, so with 51 and change bits available you are limited to 102 bits or so of possible input data.
Even that assumes a lossless compression algorithm that can make guarantees that most do not.
-
Jul 31st, 2015, 06:13 PM
#6
Re: Generating a unique fixed-length hash in VB6
Hi G_Hosa_Phat. Dilettante has done a great job describing the difficulty of what you're trying to do. The technical name of this task is perfect hashing. That link takes you to a Wikipedia article that describes the difficulty of the problem, and various solutions. One of the linked books in that article goes into a lot more detail.
Generally speaking, an excessive amount of work is required to construct a perfect hash function for arbitrary-length inputs. It's much easier to assume that collisions will happen, and to just use a (much simpler) imperfect hash function.
If you have no choice but to go down the perfect hashing road, those links will get you started, but in all seriousness, I'd avoid that requirement if at all possible. Your additional constraints of specialized input and output data formats make this even more difficult than it would otherwise be.
(This is my $0.02... unless of course you're very confident in your math skills! )
-
Jul 31st, 2015, 07:23 PM
#7
Re: Generating a unique fixed-length hash in VB6
Originally Posted by Tanner_H
... unless of course you're very confident in your math skills! )
Well, the underlying math is not really all that difficult (in case the Hash-Function
in question is "near ideal" - and many CRC-algos are considered to coming close,
and are well-researched regarding their distribution (for different polynomials).
Here's a link which shows a nice overview for collision-probabilities of different Hash-Lengths:
http://stackoverflow.com/questions/1...ollision-crc32
According to the link above, the formula(s) for the OP would be:
Code:
Public Function GetAmountOfCollisions(Samples, HashBits)
GetAmountOfCollisions = (Samples ^ 2 - Samples) / (2 * (2 ^ HashBits))
End Function
Public Function GetProbForAtLeastOneCollision(Samples, HashBits)
GetProbForAtLeastOneCollision = 1 - Exp(-(Samples ^ 2 - Samples) / (2 * (2 ^ HashBits)))
End Function
E.g. when using a CRC-32 - along with a yearly amount of 100000 drawn samples,
the results would come out as (e.g. ran in the Immediate-Window):
? GetAmountOfCollisions(10^5, 32)
1.16414157673717
? GetProbForAtLeastOneCollision(10^5, 32)
0.687809461338744
So, CRC-32 would have a roughly 70% chance to collide already within the first year - over
10 years (amount then 10^6) the collisions would increase to about 116 (among these 1Mio draws).
With a near ideal 52Bit-Hash, the probability to generate at least one collision would be:
- in one year (10^5 samples) roughly "one in a million"
? GetProbForAtLeastOneCollision(10^5, 52)
1.11021130611011E-06
- and after 10 years (10^6 drawn samples) roughly "one in ten-thousand"
? GetProbForAtLeastOneCollision(10^6, 52)
1.1101602870478E-04
With a 64Bit Hash, the Probabilities would be:
2.71047850830541E-10 ("2.7 in 10^10" for 10^5 samples)
2.71050268896289E-08 ("2.7 in 10^8" for 10^6 samples)
The OP will have to present these Probs to the deciders - and let them ponder,
if they could live with a "one in 10000 chance over 10 years" for a duplicate Hash -
or if adding 3 Chars, to get about 64Bit would worth it, to lower the collision-
probability to about 2.7 in 100Mio (over 10 years).
Olaf
-
Jul 31st, 2015, 08:57 PM
#8
Re: Generating a unique fixed-length hash in VB6
FWIW, maybe the following could be considered a compromise...
It is using 30Bits from two different 32Bit-Hashes, then combined and
resulting in a 12Char-AlphaNumeric Output.
Each of these 12Chars representing 5Bit = 60Bit-hash-resolution in total.
(since we use only 32Chars of the AlphaNumeric ASCII-range, I removed
the Chars which could be "ambigous reads" - as e.g. the 0, the O and the Q -
This is the 32Char-String against which the 5Bits per Char are mapped:
Const CharMap$ = "ABCDEFGHJKLMNPRSTUVWXYZ123456789"
The following module I wrote some years ago, and it contains different CRC-
implementations which (when native compiled) will perform quite well:
*Into a Module - e.g. modCRC.bas
Code:
Option Explicit
Public Function Crc6(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 6, True, &H6B&
crc = 63
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (crc \ 256)
Next i
Crc6 = crc Xor 63
End Function
Public Function Crc16(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 16, True, &H8005&
crc = 0
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (crc \ 256)
Next i
Crc16 = crc
End Function
Public Function Crc16_ModBus(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 16, True, &H8005&
crc = &HFFFF&
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (crc \ 256)
Next i
Crc16_ModBus = crc
End Function
Public Function Crc16_USB(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 16, True, &H8005&
crc = &HFFFF&
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (crc \ 256)
Next i
Crc16_USB = crc Xor &HFFFF&
End Function
Public Function Crc16_CCITT(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 16, False, &H1021&
crc = &HFFFF&
For i = LBound(B) To UBound(B)
crc = (crcTab(((crc \ 256) Xor B(i)) And &HFF&) Xor (crc * 256)) And &HFFFF&
Next i
Crc16_CCITT = crc
End Function
Public Function Crc16_CCITT_XModem(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 16, False, &H1021&
crc = 0
For i = LBound(B) To UBound(B)
crc = (crcTab(((crc \ 256) Xor B(i)) And &HFF&) Xor (crc * 256)) And &HFFFF&
Next i
Crc16_CCITT_XModem = crc
End Function
Public Function Crc16_CCITT_Kermit(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 16, True, &H1021&
crc = 0
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (crc \ 256)
Next i
Crc16_CCITT_Kermit = crc \ 256 + 256 * (crc And &HFF&) 'Byte-Swap
End Function
Public Function Crc32(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 32, True, &H4C11DB7
crc = &HFFFFFFFF
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (((crc And &HFFFFFF00) \ &H100) And &HFFFFFF)
Next i
Crc32 = crc Xor &HFFFFFFFF
End Function
Public Function Crc32_C(B() As Byte) As Long
Dim i As Long, crc As Long: Static crcTab(0 To 255) As Long
If crcTab(1) = 0 Then CreateLookupTable crcTab, 32, True, &H1EDC6F41
crc = &HFFFFFFFF
For i = LBound(B) To UBound(B)
crc = crcTab((crc Xor B(i)) And &HFF&) Xor (((crc And &HFFFFFF00) \ &H100) And &HFFFFFF)
Next i
Crc32_C = crc Xor &HFFFFFFFF
End Function
'-------------- Helper-Functions for Lookup-Table-Generation ----------------------
Private Sub CreateLookupTable(crcTab() As Long, ByVal BitLen As Long, ByVal Reflected As Boolean, ByVal Poly As Long)
Dim r As Long, i As Long, V As Long, BM As Long
If BitLen = 32 Then BM = &H80000000 Else BM = 2 ^ (BitLen - 1)
For V = 0 To UBound(crcTab)
r = V
If Reflected Then r = Reflect(V, IIf(BitLen < 8, BitLen, 8))
If BitLen > 8 Then r = SHL(r, BitLen - 8)
For i = 0 To IIf(BitLen < 8, BitLen, 8) - 1
r = SHL(r, 1) Xor IIf(r And BM, Poly, 0)
Next
If Reflected Then r = Reflect(r, BitLen)
If BitLen = 32 Then crcTab(V) = r Else crcTab(V) = r And CLng(2 ^ BitLen - 1)
Next V
End Sub
Private Function Reflect(ByVal v As Long, ByVal Bits As Long) As Long
Dim i As Long, M As Long
Reflect = v
For i = 0 To Bits - 1
If (Bits - i - 1) = 31 Then M = &H80000000 Else M = 2 ^ (Bits - i - 1)
Reflect = IIf(v And 1, Reflect Or M, Reflect And Not M)
v = SHR(v, 1)
Next i
End Function
Private Function SHL(ByVal V As Long, ByVal Bits As Long) As Long
If Bits = 0 Then SHL = V: Exit Function
SHL = (V And (2 ^ (31 - Bits) - 1)) * 2 ^ Bits Or IIf(V And 2 ^ (31 - Bits), &H80000000, 0)
End Function
Private Function SHR(ByVal V As Long, ByVal Bits As Long) As Long
If V < 0 Then SHR = ((V And &H7FFFFFFF) \ 2 ^ Bits) Or 2 ^ (31 - Bits) Else SHR = V \ 2 ^ Bits
End Function
Public Function GetAmountOfCollisions(Samples, HashBits)
GetAmountOfCollisions = (Samples ^ 2 - Samples) / (2 * (2 ^ HashBits))
End Function
Public Function GetProbForAtLeastOneCollision(Samples, HashBits)
GetProbForAtLeastOneCollision = 1 - Exp(-(Samples ^ 2 - Samples) / (2 * (2 ^ HashBits)))
End Function
And here the applied-usage of the above CRC-Module in a small routine,
combining the results of two different CRC32-implementations
(the "classic CRC32", which is e.g. used in Ethernet-scenarios - and
the CRC32-C version which is using a different polynomial (&H1EDC6F41)
and is then also fed with the reversed Input-String...
Form-Code for testing:
Code:
Option Explicit
Private Sub Form_Load()
Debug.Print GetAlphaNumeric60BitHash("abc")
End Sub
Public Function GetAlphaNumeric60BitHash(S As String) As String
Const CharMap$ = "ABCDEFGHJKLMNPRSTUVWXYZ123456789" '<- 32Chars (5Bit-Mapping)
Dim B() As Byte, CharLUT() As Byte, H(0 To 1) As Long, i&, j&
B = S
H(0) = Crc32(B) And &H7FFFFFFF
B = StrReverse(S)
H(1) = Crc32_C(B) And &H7FFFFFFF
GetAlphaNumeric60BitHash = Space$(12)
For i = 0 To 1: For j = 0 To 5
Mid$(GetAlphaNumeric60BitHash, i * 6 + j + 1, 1) = Mid$(CharMap, (H(i) And 31&) + 1, 1)
H(i) = H(i) \ 32 'shift 5 Bits to the right
Next j, i
End Function
HTH
Olaf
Last edited by Schmidt; Jul 31st, 2015 at 10:29 PM.
-
Jul 31st, 2015, 09:02 PM
#9
Fanatic Member
Re: Generating a unique fixed-length hash in VB6
Originally Posted by G_Hosa_Phat
I need to generate a unique ID that can be stored in a database to retrieve a specific record with the following requirements:
- Unique
- Non-Sequential
- Fixed-length of 10 characters (The specific length could potentially change before final implementation of the solution, depending on what the "higher-ups" decide they want. Of course, if it were to change, it would probably be to decrease the length unfortunately)
- Alphanumeric (preferably upper-case only)
- VB6 (I know, I know...)
The "input" available for generating this code is comprised of at least three separate elements:
- Customer number (fixed-length)
- Account number (variable-length)
- A specific date
These three elements should ensure the uniqueness of the record to which the generated ID is tied.
Most database systems has globally unique ID datatype 'as standard', which you can use as a primary key and also as a replication key in failsafe/failover databases. However if you want to create unique value, then there is CoCreateGUID api at your disposal.
Code:
Option Explicit
Private Type GUID
Data1 As Long
Data2 As Integer
Data3 As Integer
Data4(7) As Byte
End Type
Private Declare Function CoCreateGuid Lib "OLE32.DLL" (pGuid As GUID) As Long
Public Function CreateGUID() As String
Dim udtGUID As GUID
If (CoCreateGuid(udtGUID) = 0) Then
CreateGUID = _
String(8 - Len(Hex$(udtGUID.Data1)), "0") & Hex$(udtGUID.Data1) & _
String(4 - Len(Hex$(udtGUID.Data2)), "0") & Hex$(udtGUID.Data2) & _
String(4 - Len(Hex$(udtGUID.Data3)), "0") & Hex$(udtGUID.Data3) & _
IIf((udtGUID.Data4(0) < &H10), "0", "") & Hex$(udtGUID.Data4(0)) & _
IIf((udtGUID.Data4(1) < &H10), "0", "") & Hex$(udtGUID.Data4(1)) & _
IIf((udtGUID.Data4(2) < &H10), "0", "") & Hex$(udtGUID.Data4(2)) & _
IIf((udtGUID.Data4(3) < &H10), "0", "") & Hex$(udtGUID.Data4(3)) & _
IIf((udtGUID.Data4(4) < &H10), "0", "") & Hex$(udtGUID.Data4(4)) & _
IIf((udtGUID.Data4(5) < &H10), "0", "") & Hex$(udtGUID.Data4(5)) & _
IIf((udtGUID.Data4(6) < &H10), "0", "") & Hex$(udtGUID.Data4(6)) & _
IIf((udtGUID.Data4(7) < &H10), "0", "") & Hex$(udtGUID.Data4(7))
End If
End Function
Simply call by;
Code:
Dim sGUID as string
sGUID = CreateGUID
-
Aug 1st, 2015, 12:32 AM
#10
Re: Generating a unique fixed-length hash in VB6
Forgive if I am wrong, but both the Customer Number and the Account Number should already be unique. I suppose that one customer could have multiple accounts, so that would make the Account Number the obvious choice to use. How some companies address this issue is to use a temporary "Secret" that is associated with the account. A "Secret" would basically be a key to the account. For example, my email provider sends me daily email regarding spam that has been quarantined or bounced. It contains a link to my account combined with an 8 character secret. That secret is replaced by a new secret when the next notification is issued. The account is public information, but the secret is not. You could make the secret as short as you want, as the uniqueness is already provided by the Account Number.
J.A. Coutts
-
Aug 1st, 2015, 07:15 AM
#11
Re: Generating a unique fixed-length hash in VB6
Originally Posted by couttsj
Forgive if I am wrong, but both the Customer Number and the Account Number should already be unique. I suppose that one customer could have multiple accounts, so that would make the Account Number the obvious choice to use. How some companies address this issue is to use a temporary "Secret" that is associated with the account. A "Secret" would basically be a key to the account. For example, my email provider sends me daily email regarding spam that has been quarantined or bounced. It contains a link to my account combined with an 8 character secret. That secret is replaced by a new secret when the next notification is issued. The account is public information, but the secret is not. You could make the secret as short as you want, as the uniqueness is already provided by the Account Number.
I think he wants to identify a "record" (which probably corresponds to some business event, for example a sale) and not customer or account. Even then cust + acct + timestamp could produce collisions but even at one second resolution that should be good enough.
GUID/UUID values are never guaranteed to be unique. They are only as unique as a 128 bit value chosen pseudo-randomly can be, and many of the bits are "weighting" based on the generating machine's MAC address. This restricts their use to scenarios where they get used sparsely or temporarily (replication for example) and even then collisions may occur. Plus they are not hashes and you can't plug inputs in and get the same 128-bit value out each time.
The Perfect Hash link was interesting and puts a name to the creature, so that helps even if the article there doesn't seem of much practical use. Dynamic Perfect Hash sounds like the subset of that which is needed here.
I think what we really have here is a case of trying to put 10 pounds of stuff into a 5 pound bag.
-
Aug 1st, 2015, 08:39 AM
#12
Re: Generating a unique fixed-length hash in VB6
Originally Posted by Schmidt
Well, the underlying math is not really all that difficult (in case the Hash-Function
in question is "near ideal" - and many CRC-algos are considered to coming close,
and are well-researched regarding their distribution (for different polynomials).
In probability, the difference between "guaranteed 0%" and "very near 0%" is astronomical.
If "very low risk of collision" ends up being acceptable to the OP, then pretty much any hashing function would work. He'd just need to write code that covers the case of rare-but-not-impossible collisions.
But assuming no collisions and then "playing the odds" is a dangerous proposition, even when the odds are in your favor.
Originally Posted by dilettante
The Perfect Hash link was interesting and puts a name to the creature, so that helps even if the article there doesn't seem of much practical use. Dynamic Perfect Hash sounds like the subset of that which is needed here.
Dynamic perfect hashing is an ugly solution, but it would technically solve the OP's problem. Basically, dynamic perfect hashing just embraces the fact that you can't easily prove 0% risk of collisions, so you construct a hash function with high variability and some kind of random-seed-like secondary input (like salting).
If you experience a collision, you change the "seed" value to some new random value and rebuild the entire hash table, and keep doing this until you find a seed with zero collisions for the current input set. (A more technical implementation uses layers of hash tables, and you only rebuild the last layer as necessary; the wiki page explains this better.)
As long as the variability of the hash results are high and the relative size of the dataset is low, dynamic perfect hashing gives you O(1) performance... But if collisions are more frequent than you predicted, it becomes a slow, ugly solution. (Which may be what the doctor ordered here? idk)
Last edited by Tanner_H; Aug 1st, 2015 at 08:40 AM.
Reason: clarified "dynamic perfect hashing" vs just "dynamic hashing"
-
Aug 1st, 2015, 09:38 AM
#13
Fanatic Member
Re: Generating a unique fixed-length hash in VB6
Originally Posted by dilettante
GUID/UUID values are never guaranteed to be unique. They are only as unique as a 128 bit value chosen pseudo-randomly can be, and many of the bits are "weighting" based on the generating machine's MAC address. This restricts their use to scenarios where they get used sparsely or temporarily (replication for example) and even then collisions may occur. Plus they are not hashes and you can't plug inputs in and get the same 128-bit value out each time.
True - but one can always run generated GUID value, through MD5 128-bit hashing algorithm.
-
Aug 1st, 2015, 10:39 AM
#14
Re: Generating a unique fixed-length hash in VB6
Originally Posted by Tech99
True - but one can always run generated GUID value, through MD5 128-bit hashing algorithm.
Gosh, I don't know what to say. You sure got me there.
-
Aug 1st, 2015, 12:27 PM
#15
Fanatic Member
Re: Generating a unique fixed-length hash in VB6
Originally Posted by dilettante
Gosh, I don't know what to say. You sure got me there.
Yeah, altought meant to say bit more salted, but been trivial method, left this out.
-
Aug 1st, 2015, 04:29 PM
#16
Re: Generating a unique fixed-length hash in VB6
Originally Posted by Tanner_H
In probability, the difference between "guaranteed 0%" and "very near 0%" is astronomical.
Well, whilst the above being (mathematically) true -
I assume that in reality you're not such a "glass-half-empty"-person...
Real-life is full of risks and absolute certainties are nearly nowhere to be found (aside
from a few exceptions, as e.g. that men will *certainly* never understand women <g>).
Originally Posted by Tanner_H
... assuming no collisions and then "playing the odds" is a dangerous proposition,
even when the odds are in your favor.
I only offered VB-versions of the formulas - as well as a possible (60Bit in 12Chars) solution
for the OP (along with some probabilities, which he could present to the decision-makers).
As for the (unlikely) case of a collision (along with your wording of "unchecked Hashing"
being a "dangerous proposition")...
In the case of the OP, the danger can certainly be quantified
(assuming the worst-case of having generated the same hash for two different
records in the "send-out"-scenario the OP has described, the worst case will be,
that a certain notification will *not* be send to a certain customer)...
That's the price which will have to be paid, when the (unchecked) Hash-Approach
will be used in that scenario (in case of the unlikely event of a collision, when hash-
length >= 60Bits will be used).
That price will have to be set into relation with the price for the implementation
of a "water-tight-solution" - in *conjunction* with the probabilities:
1) the "water-tight-solution" will certainly cost x Extra "-ManHours", "-Storage-Space", "-CPU-Cycles" and "-Inconvenience"
2) the "unchecked Hash-solution" might cost a not send customer-notification (+ the side-effects of that)
I can only underline (since the OPs scenario is certainly not a "mission-critical-NASA-project")
how extremely low the likelihood is, that going with the above point #2 "might cost something".
The 60Bit-solution (in 12 well-readable ASCII-Chars) has a likelihood of not sending
a certain notification in 100,000 messages/year (over 10 full years) of: 0.000043%
Well, let's say that this quite low percentage is "still scary enough" - and the number of
ASCII-Chars which represent the Hash shall therefore be increased to, say, 20 (=100Hash-Bits):
Then the likelihood to encounter a collision within 10 years will be: ~ 4*10^-19
For comparison, the likelihood, that you (as a single person) will be hit by a Meteorite
tomorrow - (or any other day, for the rest of your life) - will be roughly: 3*10^-15
That this event happens, is about 7500 times more likely than the not send notification
(now seriously wondering, why those of us not having decided to live in a bunker permanently,
aren't running from cover to cover - or discard wearing a helmet all day long... )
I'm not saying that a watertight solution shall be discarded (in the OPs case dilettante
already outlined an AutoID-approach which is relatively "cheap" to implement - only
costing a few man-hours, some Disk-Space and a bit of Query-Time) ... I guess I just
try to defend software-solutions or approaches, which rely on such probabilities
(e.g. Git relying on a 160Bit-SHA1-Hash, to describe and identify certain content-snippets).
The math is given - and the Hashing-Algos "sound enough" (operating near the ideal hash).
For those in doubt of the above sentence - one can check that out oneself, to see
how the hash-algos we use today "meet the reality":
E.g. using the Code-Block for the CRC-Module I've posted in #8, pasting it into a *.bas,
one can then use the following in a Form:
Code:
Option Explicit
Private Sub Form_Load()
Const SampleCountCrc16 = 5000
Debug.Print "CRC-16 ( SampleCount:"; SampleCountCrc16; ")"
Debug.Print "Prediction:"; CLng(GetAmountOfCollisions(SampleCountCrc16, 16));
Debug.Print " --> Reality:"; GetRealWorldCollisions(SampleCountCrc16, False); vbLf
Const SampleCountCrc32 = 500000 'only half a Million, to not wait too long
Debug.Print "CRC-32 ( SampleCount:"; SampleCountCrc32; ")"
Debug.Print "Prediction:"; CLng(GetAmountOfCollisions(SampleCountCrc32, 32));
Debug.Print " --> Reality:"; GetRealWorldCollisions(SampleCountCrc32, True); vbLf
End Sub
Private Function GetRealWorldCollisions(AmountOfUniqueRandomInputs, ByVal UseCrc32 As Boolean) As Long
Dim i&, S$, B() As Byte, Hash As String, Col As New Collection
Rnd -1234 'ensure we create always the same random sequence
On Error Resume Next 'to detect failed Adds to the Collection
For i = 1 To AmountOfUniqueRandomInputs
S = Chr$(Rnd * 127) & Chr$(Rnd * 127) & CStr(i) '<- last part, to make it unique and ensure some length-variation
B = StrConv(S, vbFromUnicode)
If UseCrc32 Then
Hash = Hex(Crc32(B))
Else
Hash = Hex(Crc16(B))
End If
Err.Clear
Col.Add i, Hash
If Err Then GetRealWorldCollisions = GetRealWorldCollisions + 1
Next
End Function
The above code then printing out (just wait a few secs for the 0.5Mio Crc32 calculations):
Code:
CRC-16 ( SampleCount: 5000 )
Prediction: 191 --> Reality: 187
CRC-32 ( SampleCount: 500000 )
Prediction: 29 --> Reality: 26
Olaf
Last edited by Schmidt; Aug 1st, 2015 at 05:08 PM.
-
Aug 2nd, 2015, 10:28 AM
#17
Re: Generating a unique fixed-length hash in VB6
Originally Posted by Schmidt
As for the (unlikely) case of a collision (along with your wording of "unchecked Hashing"
being a "dangerous proposition")...
In the case of the OP, the danger can certainly be quantified
(assuming the worst-case of having generated the same hash for two different
records in the "send-out"-scenario the OP has described, the worst case will be,
that a certain notification will *not* be send to a certain customer)...
It's difficult to discuss the risks without the OP's clarification. The OP says: "The overall objective is to provide a user with this code on a printed and mailed notification letter. The user will be able to enter this code on our company's Web site for submitting information back to us. The unique aspect is important to ensure that no two users receive the same code, thereby submitting information which is tied to the wrong data record."
What data is the user submitting? SSNs? Credit cards? If they are, would a collision allow some other user to pull the submitted data?
Hashing personal data is serious business, full-stop. IMO, if hash collisions are a possibility (and given the OP's constraints re: 10-upper-case-character hash, large potential input space), the OP should code for them.
Collision attacks are a very real thing (they brought down the venerable MD5 hash function, after all). If it were my personal information being stored and processed by this company, I would want the coder to make sure the hash function was well-constructed and that all possibilities were accounted for.
If the OP can increase the variability of the hash (e.g. lowercase letters, symbols, more chars, etc) then this is a different discussion. Given the exponential nature of bit-representation, 60 bits would certainly be preferable to ~50. A more reasonable 128 would be even better.
But given the original constraints, and dilettante's correctly calculated 52-bits of variation, collision risk is high enough to warrant implementation considerations. My $0.02.
Originally Posted by Schmidt
I'm not saying that a watertight solution shall be discarded (in the OPs case dilettante
already outlined an AutoID-approach which is relatively "cheap" to implement - only
costing a few man-hours, some Disk-Space and a bit of Query-Time) ... I guess I just
try to defend software-solutions or approaches, which rely on such probabilities
(e.g. Git relying on a 160Bit-SHA1-Hash, to describe and identify certain content-snippets).
This is false equivocation, and is the kind of statement I worry about with new coders who don't understand the risks involved.
160 bits != 52 bits.
-
Aug 2nd, 2015, 12:06 PM
#18
Re: Generating a unique fixed-length hash in VB6
Sideshow:
Originally Posted by Schmidt
... e.g. Git relying on a 160Bit-SHA1-Hash, to describe and identify certain content-snippets...
Surely they don't actually do that?
Are you sure that isn't just a hash they are using as a check on file transfer corruption and tampering?
-
Aug 2nd, 2015, 02:54 PM
#19
Re: Generating a unique fixed-length hash in VB6
Originally Posted by Tanner_H
It's difficult to discuss the risks without the OP's clarification. The OP says: "The overall objective is to provide a user with this code on a printed and mailed notification letter.
... let's stop right here ...
As I understood it, a notification will be prepared, printed (and mailed) only when there's
currently no hash-entry already to be found ... e.g. in a DB-Table, called [NotificationsSent] ...
(having HashStr as the PK - and CustomerNr, AccountNr, SomeDate and perhaps SendState as additional Columns)
Such a Table is normal in such "E-Mail-Pumping"-scenarios, for logging-purposes and to avoid sending a
Notification twice to the same customer (for the same "Notification-Job") - because E-Mail-Pumps are a
fickle thing, which can fail easily for a variety of reasons (e.g. Mail-Server down), and so the record-entries in
that [NotificationsSent]-table are therefore treated like "Jobs in a Queue" - often requiring more than just one
send-attempt (usually handled aside from the HashValue to "peek-in", over the above mentioned SendState-Field).
So, in case of a collision (same Hash for a different set of "CustomerNr, AccountNr, SomeDate") -
the lookup into that table [NotificationsSent] over the produced HashValue will just tell:
"it's already in the works" or "it was already sent" (for another Customer) - hence no E-Mailing
to that new "colliding, different customer" will happen at all.
Originally Posted by Tanner_H
What data is the user submitting? SSNs? Credit cards?
If they are, would a collision allow some other user to pull the submitted data?
I guess not (when the notification-system at the OPs side is working roughly as outlined above,
not sending an E-Mail with a WebSite-Link to that "colliding customer" at all).
Originally Posted by Tanner_H
Hashing personal data is serious business, full-stop. IMO, if hash collisions are a possibility
(and given the OP's constraints re: 10-upper-case-character hash, large potential input space),
the OP should code for them.
*If* it's a "serious business-scenario" (money involved, expensive law-suits possible), then perhaps yes.
As for the "10 upper case chars" - here I thought the OP made it clear, that it doesn't need to be 10 Chars exactly
(50Bits is a bit sparse for a Hash-Value with regards to collision-probabililties in a 100000 Mails/Year scenario).
That's why I suggested to go with at least 12 Chars (60Bit) or better 16 or 20 Chars (80 and 100Bit respectively).
Originally Posted by Tanner_H
Collision attacks are a very real thing (they brought down the venerable MD5 hash function, after all).
That's a completely different realm, since we don't talk about "Secure Hashes" in the given scenario.
The requirement we have here, is only about Hash-Algos, which ensure:
"collision-probabilities as close as possible to the theoretically achievable"
The fact that Hash-Algos "designed for security" usually fulfill that requirement as well,
is IMO beside the point.
Aside from that, even when the OP would go with a "collision-free AutoID-approach"
(as the "Key-number to be entered" which then refers to a certain record in a DB-Table
at the vendors site), the very same "impersonation-attempts" could be performed by a hacker
(or any "man in the middle"), trying to deduce which ID maps to a certain customer, to make
use of that ID in "other potentially possible attacks").
Originally Posted by Tanner_H
If it were my personal information being stored and processed by this company, I would want the coder
to make sure the hash function was well-constructed and that all possibilities were accounted for.
Thought I tried just that (with my suggestions to the OP, to increase the HashLength
to at least 60Bit, because only then the collision-probabilities come into a reasonable range,
getting increasingly better with every added "5-Bit-Char").
Originally Posted by Tanner_H
If the OP can increase the variability of the hash (e.g. lowercase letters, symbols, more chars, etc) then this is a different discussion.
No, the only thing that counts in the end is the OverAll-Bit-Length of the String-Representation -
and if you want to stay in the realm of "readable String-representations" which work
in the ASCII-Range, you can either use:
- a 4Bit(16Char) alphabet (a wellknown one of this sort is what we know as: "HexString-representations")
- a 5Bit(32Char) alphabet (e.g. the one I was using in my Demo-Snippets)
- a 6Bit(64Char) alphabet (a wellknown one of this sort is what we know as: "Base64")
So, increasing the "amount of allowed, easy to enter chars" would (IMO) max out at 6Bit per Char,
which then would allow a 60Bit-Hash to be represented in only 10 Chars, but that's not all that
much different from the 12Chars I needed for the 60Bit-Hash-Length in my Demo-routines.
Originally Posted by Tanner_H
Given the exponential nature of bit-representation, 60 bits would certainly be preferable to ~50. A more reasonable 128 would be even better.
Ack, absolutely!
(the more "allowed Chars" the OP can squeeze out of the deciders, the better)...
Originally Posted by Tanner_H
[my mentioning of the 160Bit SHA-1 Hash, as being the workhorse for identifiying
small Content-Snippets within GIT (a worldwide used, distributed SCM-System)]...
This is false equivocation, and is the kind of statement I worry about
with new coders who don't understand the risks involved.
160 bits != 52 bits.
That's a bit unfair, after all the efforts, explanations and code-snippets I've put into this thread,
to make it easy for new coders to calculate and understand Hash-collision-probabilities better -
and how those can be derived from the Bit-Length (and the amount of "hashed Input-Samples").
Here again, what I wrote in post #7:
With a near ideal 52Bit-Hash, the probability to generate at least one collision would be:
- in one year (10^5 samples) roughly "one in a million"
? GetProbForAtLeastOneCollision(10^5, 52)
1.11021130611011E-06
- and after 10 years (10^6 drawn samples) roughly "one in ten-thousand"
? GetProbForAtLeastOneCollision(10^6, 52)
1.1101602870478E-04
So, "it's all there" for the "new coders" in question who read this thread in the future.
BTW, the argument the both of us have here about Hashing (or in more common terms,
about the topic: "what to use as a Unique-ID"), is a quite wellknown one...
Discussion then often heating up with proponents of the two realms:
1) "let's play it safe" (no matter how large the extra-efforts or the inconveniences)
2) "let's use applied math in form of long enough Hashes which are 'unique enough'"
1) has the main-disadvantage that, to produce a new Unique-ID safely, you will have to "contact some storage first"
("storage" usually being a DB, producing a new ID in a new transaction - often over the built-in "@identity-mechanisms").
That "needed contacting" makes this approach especially difficult, when we talk about truly distributed
scenarios (as e.g. with GIT) - since online-connections are not always available.
2) whilst easy to produce "without contacting anybody", has the main-disadvantage that:
"Uniqueness cannot be truly guaranteed".
The last sentence as it stands above is usually enough for most proponents of 1),
to discard *any* Hash-based solutions outright.
What most of them fail to recognize is the extremely low probability to produce
a non-unique-ID, which is (when we talk about HashBit-Length above 100) lower
than the already mentioned "hit by a meteorite"-scenario.
Here's another probability-scenario, the proponents of 1) should consider
(whilst "feeling safe" with their solutions).
The already mentioned storage which is needed when following 1), is usually
hosted on Hard-Disks these days - but usage of HardDisks involves a risk -
and with regards to what we talk about here - what if you stored a nice Auto-
generated-ID in a DB-Identity-Column residing on that disk, with the value of:
11111
but when you try to find this value, you get the result, that no record was
found - because the HardDisk had a Read-Error - and passed the read-out of:
11110
instead to the search-routine of the DB-Engine.
What now?
Can't happen you might say - HardDisks have error-correction-routines built in,
which can restore the original value which was written to the disk, even when
a few Bits were not read correctly.
Well, that's only one kind of read-error (known as "recoverable-read-errors", 'RRE').
But there's another kind of read-errors in the realm of HardDisks, known as 'URE' -
the "Unrecoverable ones").
And there's probabilities for that kind of error in the vendor-specifications of
most hard-disks - which are usually in the realm of 10^-15 or 10^-16.
And then there's field-studies of "high-scale-harddisk-users" (from MS, Amazon,
Google, CERN) which basically all state, that these URE's are likely to happen far
more often than specified by the HD-vendors - here a link to a review at the CERN:
http://indico.cern.ch/event/13797/se.../1?contribId=3
Executive Summary
We have established that low level data corruptions exist and that they have several origins.
The error rates are at the 10^-7 level, ...
For comparison, here again the probability of a "duplicate hash"-collision for
a 100Bit-Hash-Algo, producing 1Mio Hashes (from different Input-Values):
4*10^-19
That's even lower than what the vendors (optimistically) provide for their
unrecoverable Read-Errors.
So, as a conclusion (given a scenario of 1Mio AutoID'ed records, accumulated in a DB over 10 years):
It is far more likely that your DB will link you to the wrong AutoID (due to an URE on the harddisk), than
it is to produce a Hash-Collision for the same amount of Entries with an algo that's at least using 100HashBits).
(and the above not even considering risks due to Read-Errors, cause by faulty memory,
or a hickup in an overheated CPU etc.).
Olaf
-
Aug 2nd, 2015, 03:05 PM
#20
Re: Generating a unique fixed-length hash in VB6
Good Grief Things are getting out of hand here
Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.
-
Aug 2nd, 2015, 03:32 PM
#21
Re: Generating a unique fixed-length hash in VB6
Originally Posted by dilettante
[GIT using SHA-1 Hashing extensively, to map to Content-Values]
Surely they don't actually do that?
Are you sure that isn't just a hash they are using as a check on file transfer corruption and tampering?
They do:
https://git-scm.com/book/en/v2/Git-I...ls-Git-Objects
As does the (SQLite-based - and quite nice and tidy) distributed SCM, called Fossil:
http://fossil-scm.org/xfer/doc/trunk..._overview.wiki
Here's the math again:
Let's be extreme and say that there's a certain GIT-managed Source-Project, which has:
- 1Mio contributors world-wide
- each of them producing 1Mio check-in-snippets per year
- over a time-period of 100years
This would amount to a a total of check-in-snippets in that repo (after 100 years) of:
1Mio * 1Mio * 100 = 10^14 content-snippets
With the already provided formula, the probability of a collision for this amount is:
? GetAmountOfCollisions(10 ^ 14, 160)
3.4 * 10^-21
As already said in my last reply to Tanner, this is (by magnitudes) lower than the probability
to encounter a "GIT-problem" due to reasons of e.g. an unrecoverable HD-ReadError.
Olaf
-
Aug 2nd, 2015, 03:50 PM
#22
Re: Generating a unique fixed-length hash in VB6
Originally Posted by jmsrickland
Good Grief Things are getting out of hand here
Nah - the discussion is as OnTopic as can be...
What's your question?
Olaf
-
Aug 2nd, 2015, 04:24 PM
#23
Re: Generating a unique fixed-length hash in VB6
I think we're down to two camps.
One says "the data won't fit into the available bits."
The other says "who cares, most of the time no data will get lost or corrupted, after all it is rare that you can flip a coin 10 times and get heads 10 times in a row." Never mind the cost of unraveling the mess if it ever happens I guess.
See Gambler's fallacy.
-
Aug 2nd, 2015, 05:41 PM
#24
Re: Generating a unique fixed-length hash in VB6
Originally Posted by dilettante
I think we're down to two camps.
One says "the data won't fit into the available bits."
And here I thought, that this was already obvious to the OP himself -
hence his idea with hashing the data, which didn't fit into 10 (5Bit or 6Bit) chars.
Originally Posted by dilettante
The other says "who cares, most of the time no data will get lost or corrupted,
after all it is rare that you can flip a coin 10 times and get heads 10 times in a row."
You obviously didn't read (or understand) properly then...
<shrug>
Originally Posted by dilettante
Never mind the cost of unraveling the mess if it ever happens I guess.
Sigh - and also regarding the "costs" I already wrote quite some text.
Sadly the OP doesn't reply anymore - but (IMO) the "costs" due to
a Hash-collision will involve only a "not send mail-notification".
Note, that I didn't *recommend* using Hashes at all to the OP.
All my replies were just meant, to show different results for truly calculatable risks -
depending on the Hash-Length in Bits and the amount of hashed-inputsamples
(should the OP "go forward" with his Hashing-idea - hopefully with a bit more Chars than 10).
As for "fitting a unique identifier into as few well-typable Chars as possible" -
the OP could restrict the needed Chars easily to only 5 instead of 10, when an AutoID-
based approach would be used (e.g. when using them as the PK in the [NotificationsSent]-table
I mentioned earlier) - since 100000 Mail-Notifications per year (over 100 years) amounts
only to 10Mio - and 10Mio can be expressed in 5Chars (a 5Bit each).
Originally Posted by dilettante
Bringing that into the discussion, clearly shows that you didn't understand
much of what was said, sorry, since the probabilities which were mentioned,
are clearly by leaps and bounds below those of "getting tails after tossing a coin".
There's currently not a single known input-pair which would cause the
same SHA-1-HashValue (despite a whole lot of specialized SHA1-generating
attempts on specialized hardware, and despite Millions of Git-Users).
http://stackoverflow.com/questions/3...n-demo-example
As for Meteorite-Hits (which are far more likely than an SHA1-collision), there's
so far apparently two known (verifyable and witnessed) examples already:
http://www.telegraph.co.uk/news/scie...meteorite.html
So, don't be a victim to the "Gamblers fallacy" dile, follow through on your recommendation
and do the safe thing - (making sure you put your helmet on, the next day you go out...)
Olaf
Last edited by Schmidt; Aug 2nd, 2015 at 05:50 PM.
-
Aug 2nd, 2015, 06:22 PM
#25
Re: Generating a unique fixed-length hash in VB6
This is my Hash.
? hash("91110","6224-32412-12361","2016/10/17")
CTRRQKTRVI
? hash("91111","6224-32412-12361","2016/10/17")
CTRRRKTRVI
? hash("91110","6224-32412-12362","2016/10/18")
BVTORLVTWH
? hash("91111","6224-32412-12362","2016/10/18")
BVTOTLVTWH
Code:
Function hash(custID$, AcountVrId$, dt As Date) As String
Dim a() As Byte, b() As Byte, i As Integer, bb$, p As Integer
a() = custID$ + Right$(AcountVrId$, 6) + "1234567890"
b() = Left$(AcountVrId$, 2) + Format$(dt, "ddmmyyyy")
For i = 0 To 19 Step 2
p = a(i) + b(i) + Day(dt)
If p > 90 Then p = p - 90 + p / 2
If p > 90 Then p = p - 90 + p / 2
If p > 90 Then p = p - 90 + p / 2
If p < 65 Then p = 2 * 65 - p
If p > 90 Then p = p - 90 + p / 2
If p > 90 Then p = p - 90 + p / 2
If p > 90 Then p = p - 90 + p / 2
bb$ = bb$ + Chr(p)
Next i
hash = bb$
End Function
-
Aug 2nd, 2015, 07:10 PM
#26
Re: Generating a unique fixed-length hash in VB6
Originally Posted by georgekar
This is my Hash.
Sorry, George - but this is definitely not a useful Hashing.
E.g. you don't even cover the full input-range (leaving out
large parts of the Account-ID entirely)...
? hash("91110","6224-32412-12361","2016/10/17")
I've marked the input-parts you're leaving un-handled
(not influencing the result at all) in magenta.
Stuff like that is usually done by concatenating all the different
input-Parameters into a single string first (and in case of variable-
length input-params - usually with proper delimiters between
the different arguments, to avoid concatenating-surprises).
After that, the concatenated result-string is fed (usually as a Byte-Array)
into a *well-tested* (well-researched) Hash-function with known and
proper distribution-behaviour.
Self-developed Hash-algos are perhaps useful when you develop
your own Hash-Collection-Class or something (where Hash-creation-
speed is of greater interest than proper behaviour with regards to
collision-probabilities).
If the OP decides to use Hashing at all, I'd recommend going with the
established algos, which have proven themselves in the wild for some years
and were properly reviewed by more than one expert in that field.
Olaf
-
Aug 3rd, 2015, 12:06 AM
#27
Re: Generating a unique fixed-length hash in VB6
Is better from nothing...
My hash has to proved if works. In case that I was OP I do a test, for all customId. Maybe the acountId have other significant bits. But we have something to start...
-
Aug 3rd, 2015, 09:43 AM
#28
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
I apologize for not posting over the weekend. I honestly didn't expect or intend to stir up such a hornet's nest of ideas and opinions. Honestly, with my admittedly very limited knowledge of the ins and outs of how hashing actually works behind the scenes, I had no idea that I'd be getting myself into such a mess, but I truly appreciate the discussion and all of the examples provided. This is turning out to be much more of a headache than I had hoped.
To try to respond to some of the speculation from above, perhaps a more detailed explanation of the business rules are in order. Our company monitors current insurance status on collateralized loans for lending institutions. While we do not deal with SSN's or credit card information, the information we deal with is considered "sensitive" to the point that we have to be very careful with the specific data that we transmit electronically.
One purpose behind the proposed "hash" code is to help the end user (the individual receiving the printed notification letter) to submit valid insurance information. The intent is that the user will be required to enter this code, which the site will use to pull a record from the database that contains the details for the specific loan & collateral for which the notification letter was generated. Only a subset of the data will actually be displayed on the Web site, but much more detail will be stored in the database record.
The Account Numbers are strings of alphanumeric values (generally only numeric, but sometimes alpha) of varying length (anywhere from 3-4 characters up to 10-20 characters), but the Customer Numbers will be a fixed, four-character string of numeric characters. The Account Number will be unique to a specific Customer Number, but may be duplicated across different Customer Numbers (i.e., Account Number 123456 may be used by both Customer Number 9876 and Customer Number 7890). Additionally, over the term of a given loan, it is possible that multiple notification letters may be sent to the same Customer/Account Number combination. This is why I considered including a date (the date the notification letter was sent) as a part of any "hash" developed.
As I stated above, based on the apparent difficulty of trying to develop an actual hash of the data to meet my required constraints, it may be more effective to instead attempt to simply generate a random series of characters for each notification letter and include some logic that will check the database to see if that specific combination has already been used before writing the record to the database.
Last edited by G_Hosa_Phat; Aug 3rd, 2015 at 09:55 AM.
-
Aug 3rd, 2015, 10:03 AM
#29
Re: Generating a unique fixed-length hash in VB6
The sizes and values of the inputs matters in terms of simply jamming the data into a UniqueID value. If that could be done by removing redundant bits you would have a perfect mapping and your troubles would be over. But the definitions you gave above rules that out: you have too many important bits.
Originally Posted by G_Hosa_Phat
As I stated above, based on the apparent difficulty of trying to develop an actual hash of the data to meet my required constraints, it may be more effective to instead attempt to simply generate a random series of characters for each notification letter and include some logic that will check the database to see if that specific combination has already been used before writing the record to the database.
Randomness is a tricky thing though since you'd want a uniform distribution. But either that or a conventional hash would work as long as you can produce the desired bit-width with a good distribution to minimize collisions. There aren't any common hashes that create a value that fits well into 10 base-36 digits and you can't just trucate a 64-bit hash and expect uniform distribution. But there are techniques for doing this.
For example see D.W.'s answer here: Are there hash algorithms with variable length output?
That gets pretty ugly. Partially because he is striving for cryptographic purity, and party because that's just how hashes work.
So say you come up with a useful enough random value generator. Now you need database probes to detect non-unique values and then tie-breaker logic (e.g. adding 1 or 3 or 7 and re-probing). You need to lock the entire table in order to resolve concurrency collisions!
That gets ugly fast and will have severe performance impacts. So much so that you are probably going to be far better off using a second table of "in-use UniqueID values" in order to limit full-table locking to just that data. That will allow other readers, and writers of other fields, to operate without waiting on this big lock.
-
Aug 3rd, 2015, 11:27 AM
#30
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
Again, perhaps I'm being dense, but I'm not sure I understand the purpose or necessity of building a separate "in-use" table for this operation. The idea is that, when a notification letter is produced, it writes a record to a completely separate table in the database with the details of that specific letter. That record needs to be uniquely identified, but once it's written, there should be no further write action taken on that record. From that point forward, the only reason that record should be accessed would be to retrieve its details when a user supplies the identification code on the Web site.
Since my primary concern is simply ensuring a unique and non-sequential value, Perhaps I need to "ease up" on the restrictions and expand the code to 12 characters (although I would like to exclude "ambiguous" characters, as suggested by @Schmidt above, to limit the potential for data entry issues) and just let the "higher-ups" know that this is a programming requirement so that we are able to reliably retain a historical record of the data.
Regardless, I imagine that I could simply do a
Code:
SELECT COUNT(code) FROM notifications WHERE code = <generated code>
and loop the random code generation function until the return value is not greater than 0 immediately before the INSERT statement is executed. Is there a reason that this would not be a workable solution?
EDIT: Here's a simple VB6 random code generator that I could use in the proposed loop:
VB Code:
Public Function GenerateCode() As String Dim intLoop As Integer Dim intChar As Integer Dim blnContinue As Boolean Dim strResult As String Randomize For intLoop = 1 To 12 ' --- Expanding the code to 12 characters blnContinue = False Do While blnContinue = False intChar = Int((90 - 48 + 1) * Rnd + 48) Select Case intChar Case 50 To 57 ' --- 2-9 (Exclude "0" and "1") blnContinue = True Case 65 To 72 ' --- Upper-Case A - H (Exclude "I") blnContinue = True Case 74 To 78 ' --- Upper-Case J - N (Exclude "O") blnContinue = True Case 79 To 90 ' --- Upper-Case P - Z blnContinue = True Case Else blnContinue = False End Select Loop strResult = strResult & Chr(intChar) Next intLoop GenerateCode = strResult End Function
Again, if I loop this until I find a code that is not currently "used" in the database, I believe it could satisfy the specific requirements without causing any substantial performance degradation. Of course, I could be very wrong in that regard, and I'll have to wait until I've done some serious testing to verify that last statement.
Even so, I don't know that this would be any more intensive than attempting to develop and deliver a true and working hash function.
Last edited by G_Hosa_Phat; Aug 3rd, 2015 at 12:08 PM.
-
Aug 3rd, 2015, 11:52 AM
#31
Re: Generating a unique fixed-length hash in VB6
What's wrong with sequential numbers
Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.
-
Aug 3rd, 2015, 12:06 PM
#32
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
Sequential numbers greatly increases the potential for someone to enter another user's "code" and retrieve information that is not associated with them. While, at some point, the same can be true for a randomly generated code (if all of the possible combinations within the scope have been used and none of the data has ever been purged from the system), the likelihood of someone "guessing" at such a code is much less of a concern.
This brings up one other thing that I've been considering as well. I believe the collision potential could be mitigated to some extent if I implement a method for purging these notification messages from the database at some point in the future (say 5-7 years, or so). Honestly, I had hoped I wouldn't need to include this element, but I think it might be best for the overall solution that it seems I'm coming to here.
-
Aug 3rd, 2015, 12:21 PM
#33
Re: Generating a unique fixed-length hash in VB6
If the table in question never requires updating (aside from perhaps deletes, i.e. your purging old records) then things get a bit simpler.
However you still need to lock the entire table for any "write" operation. The issue is that when trying to do collision detection one writer could find a slot/value "free" and then before he uses it another writer could find the very same slot free as well and use it before the "first" writer actually does. The he writes his record and now you have a duplicate "unique ID."
That goes away if there is some singleton thread that does this in a middle logic tier I suppose, but otherwise you need to consider this concurrency issue. The problem is very similar to the one Microsoft describes in How To Implement Multiuser Custom Counters in Jet 4.0 and ADO 2.1 where there is just a one-row table but the same idea applies here.
-
Aug 3rd, 2015, 01:33 PM
#34
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
Okay, so it looks like my best solution is to forego attempting to hash the specific data and instead try to work with randomly generated values. You've all given me quite a bit to think about with regards to this implementation. Perhaps I need to seriously consider trying to find a way of off-loading the logic for this code generation to the RDBMS if possible.
Thank you all for the tremendous amount of information and all of the suggestions and ideas. I truly appreciate all of the feedback.
-
Aug 3rd, 2015, 09:47 PM
#35
Re: Generating a unique fixed-length hash in VB6
Originally Posted by G_Hosa_Phat
Okay, so it looks like my best solution is to forego attempting to hash the specific data and instead try to work with randomly generated values.
Nah - since you apparently already have (or plan) something like I guessed
(a DB-Table which already stores all the send Notifications, to allow a "re-mapping" later),
you can really use a simple AutoID as the PK for this thing.
Your concerns about the sequential behaviour of this AutoID can easily be
remedied by a simple Encryption (which e.g. scrambles the 3 lower Bytes of
an Int32-AutoID-DB-Field - and then maps this at maximum 24Bit-Chunk
to the 5 Chars (5 * 5Bit) I've already mentioned in post #24.
You might try out what I mean, by taking a look at the following code, which makes
use of a relative strong encryption-algo, derived from the RC4 (aka arcfour), also
applying a bit of salt - then followed by a mapping of the encrypted 24Bits to 5Chars.
The whole thing is reversible of course and implemented in two main-functions -
please put the following into a *.bas Module:
Code:
Option Explicit
Public Function IdToFiveChars(ByVal ID As Long, Key As String, Salt As String) As String
Const CharMap$ = "ABCDEFGHJKLMNPRSTUVWXYZ123456789" '<- 32Chars (5Bit-Mapping)
Dim i As Long, BDat() As Byte, BKey() As Byte, EncrID As Long
If ID <= 0 Or ID >= 2 ^ 24 Then Err.Raise vbObjectError, , "We need an ID in the range of [1 to 2^24-1]"
BKey = Key 'make a ByteArray out of our Key
BDat = Salt 'apply the salt and init the Data-ByteArray
'make space for the 24Bits (3Bytes) of our passed ID in BDat ...
ReDim Preserve BDat(UBound(BDat) + 3)
For i = 0 To 2 '...and copy the last 3 Bytes of ID over at the end
BDat(UBound(BDat) - 2 + i) = (ID And 255&): ID = ID \ 256
Next
SimpleStreamCipher BDat, BKey, True 'and apply the encryption on BDat
For i = 0 To 2 'make a Long-value out of the 3 now encrypted Bytes at the end
EncrID = 256 * EncrID + BDat(UBound(BDat) - i)
Next
IdToFiveChars = Space$(5) 'make space for the FiveChar-Mapping of EncrID
For i = 0 To 4 'and convert EncrID appropriately in a Loop
Mid$(IdToFiveChars, i + 1, 1) = Mid$(CharMap, (EncrID And 31&) + 1, 1)
EncrID = EncrID \ 32 'shift 5 Bits to the right
Next
End Function
Public Function FiveCharsToId(Chars As String, Key As String, Salt As String) As Long
Const CharMap$ = "ABCDEFGHJKLMNPRSTUVWXYZ123456789" '<- 32Chars (5Bit-Mapping)
Dim i As Long, BDat() As Byte, BKey() As Byte, EncrID As Long
If Len(Chars) <> 5 Then Err.Raise vbObjectError, , "We need 5 Chars here"
For i = 5 To 1 Step -1 're-map from the 5-Chars into the encrypted Long-Value
EncrID = EncrID * 32 + InStr(1, CharMap, Mid$(Chars, i, 1)) - 1
Next
BKey = Key 'make a ByteArray out of our Key
BDat = Salt 'apply the salt and init the Data-ByteArray
SimpleStreamCipher BDat, BKey, True 'encrypt only the Salt first in BDat
'make space for the 24Bits (3Bytes) of our EncrID at the end of salt-encr. BDat ...
ReDim Preserve BDat(UBound(BDat) + 3)
For i = 0 To 2 '...and copy the last 3 Bytes of ID over
BDat(UBound(BDat) - 2 + i) = (EncrID And 255&): EncrID = EncrID \ 256
Next
SimpleStreamCipher BDat, BKey, False 'now apply the decryption on the whole of BDat
For i = 0 To 2 'finally a copy-over of the 3 bytes at the end into our decrypted result-ID
FiveCharsToId = 256 * FiveCharsToId + BDat(UBound(BDat) - i)
Next
End Function
'this cipher is based on the RC4-algorithm (quite strong, no "play-thing")
Public Sub SimpleStreamCipher(B() As Byte, BK() As Byte, ByVal Enc As Boolean)
Dim i&, j&, k&, UB&, CK&, Tmp As Byte, C(255) As Byte, T(255) As Byte
UB = UBound(B): CK = UBound(BK) + 1
'Init Key-Arrays
For i = 0 To 255: C(i) = i: T(i) = BK(i Mod CK): Next
For i = 0 To 255
j = (j + C(i) + T(i)) Mod 256
Tmp = C(i): C(i) = C(j): C(j) = Tmp
Next i
'Crypt
i = 0: j = 0: CK = 0
For k = 0 To UB
i = (i + 1) Mod 256
j = (j + C(i)) Mod 256
Tmp = C(i): C(i) = C(j): C(j) = Tmp 'swap C(i) and C(j)
If Enc And k > 0 Then CK = B(k - 1) Else Tmp = B(k)
B(k) = B(k) Xor C((CLng(C(i)) + C(j) + CK) Mod 256)
If Not Enc Then CK = Tmp
Next k
End Sub
Usage of the two Functions (IdToFiveChars and FiveCharsToId) of the above module
then e.g. like this (into a Form without any Controls):
Code:
Option Explicit
Const Key As String = "Key", Salt As String = "Salt"
Private Sub Form_Load()
Dim FiveChars As String, i As Long
For i = 1 To 3 'test 3 IDs at the start of the allowed range
FiveChars = IdToFiveChars(i, Key, Salt)
Debug.Print i, FiveChars, FiveCharsToId(FiveChars, Key, Salt)
Next
For i = 2 ^ 24 - 3 To 2 ^ 24 - 1 'and 3 IDs at the end of the allowed range
FiveChars = IdToFiveChars(i, Key, Salt)
Debug.Print i, FiveChars, FiveCharsToId(FiveChars, Key, Salt)
Next
End Sub
The above shows the possible ID-Range and prints the following into the Immediate-Window:
Code:
1 R5YBL 1
2 PWLKF 2
3 NMNTK 3
16777213 VN59J 16777213
16777214 UNA8P 16777214
16777215 T6P2A 16777215
BTW, I've already tested the whole range (1 to 2^24-1) of possible ID-Values
successfully in a Loop - getting back exactly what I've put in (no duplicates, no errors) -
when you want to try that yourself, I suggest compiling native (because it takes a bit, roughly 1 Minute).
And just for completeness - the creation of a new record, then retrieving its AutoID
can easily (and safely) be accomplished by wrapping the task in a transaction.
Below is an extended Demo, which requires the same *.bas-Module with the two ID-Mapper-Functions -
and in addition a ListBox-Control (List1) on the Form, as well as a checked in Project-Reference to ADO.
Code:
Option Explicit
Const JetPrefix = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
Private DBFileName As String, Cnn As ADODB.Connection
Const Key As String = "Key", Salt As String = "Salt"
Private Sub Form_Load()
'simple, DB-independent Test beforehand
Dim FiveChars As String, i As Long
For i = 1 To 3 'test 3 IDs at the start of the allowed range
FiveChars = IdToFiveChars(i, Key, Salt)
Debug.Print i, FiveChars, FiveCharsToId(FiveChars, Key, Salt)
Next
For i = 2 ^ 24 - 3 To 2 ^ 24 - 1 'and 3 IDs at the end of the allowed range
FiveChars = IdToFiveChars(i, Key, Salt)
Debug.Print i, FiveChars, FiveCharsToId(FiveChars, Key, Salt)
Next
Caption = "Click the ListBox-Entries"
DBFileName = Environ("temp") & "\Test.mdb" 'define a FileName
On Error Resume Next
Kill DBFileName 'ensure we kill an already existing File in case it exists
On Error GoTo 0
CreateObject("ADOX.Catalog").Create JetPrefix & DBFileName '...create a new *.mdb
Set Cnn = New ADODB.Connection 'create and open the Connection
Cnn.CursorLocation = adUseClient
Cnn.Open JetPrefix & DBFileName
'create an appropriate Table in the Test-DB (the SQL-DDL could be used in both MS-SQLServer and JET)
Cnn.Execute "Create Table Notifications(ID Int Identity, CustNr nVarChar(5), AccNr nVarChar(25), NotifyDate DateTime)"
Dim NotifyDate As Date
NotifyDate = Now 'same Date for all of the 3 new records we add below
List1.AddItem AddNotification("1234", "123-456", NotifyDate)
List1.AddItem AddNotification("2345", "234-567", NotifyDate)
List1.AddItem AddNotification("3456", "345-678", NotifyDate)
End Sub
Private Sub List1_Click()
With GetRs("Select * From Notifications Where ID=" & FiveCharsToId(List1.Text, Key, Salt))
MsgBox !ID & vbCrLf & !CustNr & vbCrLf & !AccNr & vbCrLf & !NotifyDate
End With
End Sub
Function AddNotification(CustNr$, AccNr$, NotifyDate As Date, Optional ID As Long) As String
On Error GoTo RollBack
Cnn.BeginTrans 'wrap the whole Operation in a Transaction
With GetRs("Select * From Notifications Where 1=0")
.AddNew 'apply the 3 Values into a new Record
!CustNr = CustNr
!AccNr = AccNr
!NotifyDate = NotifyDate
.Update 'and update it
End With
'now ask the DB about the last Auto-ID it has given in this Transaction
ID = Cnn.Execute("Select @@identity")(0)
Cnn.CommitTrans 'no failures so far, so we commit it
AddNotification = IdToFiveChars(ID, Key, Salt) 'return the 5-Chars as a result
Exit Function
RollBack:
Cnn.RollbackTrans
End Function
Function GetRs(SQL As String) As ADODB.Recordset
Set GetRs = New ADODB.Recordset
GetRs.Open SQL, Cnn, adOpenStatic, adLockOptimistic
End Function
HTH
Olaf
-
Aug 4th, 2015, 09:51 AM
#36
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
@Schmidt Thank you for your suggestion. I believe I've found a workable solution by creating a custom ID field that generates a random alphanumeric code in the RDBMS so that, when a new record is inserted, the database will automatically take care of this part of it for me (rather than having the application logic generate something and hope that I don't run into a concurrency issue). The function that's written in the database to generate the unique ID will actually loop until it finds one that isn't in use before committing the INSERT.
Thank you all again for the tremendous discussion. It has really shed some light on the whole issue and helped me to work through a number of things rather than banging my head against a wall. I truly appreciate all of the assistance.
-
Aug 4th, 2015, 10:15 AM
#37
Re: Generating a unique fixed-length hash in VB6
Sure you do...and this is a more clever system..
You have a simple autonumber and you have to change positions by a standard way to make number to a fake random id.
so if we have 4 chars say 1435 we have to change positions like this 5314. You may have a random increment by one or two or three so the next to 1435 maybe can be 1437 so you get 7314. No customer can see the others "Serial number", so if he try to paste a random one to your automatic system maybe he put a 5315. You see now you go to other side, one small code to be a bigger one (as serial number or something like that), and now you can be creative. How many mails you send a day? 1000, 2000, 3000.. With a random "increment" of 3 you need just 4 chars, and you can put 3 or 4 random chars between original. So you have a function MailCode$(X) to give you each time different code( with random inside chars) and you can read and find in one step without search the X to test with name and address of the customer.
Last edited by georgekar; Aug 4th, 2015 at 10:20 AM.
-
Aug 4th, 2015, 11:46 AM
#38
Re: Generating a unique fixed-length hash in VB6
Originally Posted by G_Hosa_Phat
The function that's written in the database to generate the unique ID will actually loop until it finds one that isn't in use before committing the INSERT.
What happens if it doesn't find one that isn't in use (if that is possible at all)
Anything I post is an example only and is not intended to be the only solution, the total solution nor the final solution to your request nor do I claim that it is. If you find it useful then it is entirely up to you to make whatever changes necessary you feel are adequate for your purposes.
-
Aug 4th, 2015, 04:12 PM
#39
Thread Starter
Member
Re: Generating a unique fixed-length hash in VB6
Originally Posted by jmsrickland
What happens if it doesn't find one that isn't in use (if that is possible at all)
Actually, the way that it's currently written, it will continue to loop until it finds an available ID. There, of course, is the potential for it to get completely hung up in that loop if it keeps generating the same (random) values, and I may have to do some additional "tweaking" to prevent that from happening, but this is a good starting point. This is still a "work in progress", and I have a bit of time to finalize everything before any of this goes into production.
-
Aug 4th, 2015, 08:23 PM
#40
Re: Generating a unique fixed-length hash in VB6
Normally when you find a collision you'd increment the value by 1 or a random small value > 0 and probe again.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|