The encryption protocol I have chosen to use in this sample program is RC4. It is fast and its limitations are overcome by using a 256 bit key and relatively large record sizes. The current sample uses TLS 1.3 to establish the network connection. The Agreed Secret calculated by each party is then used as the key for the file transfer.
The server program (FileServer.exe) listens on port 1159. Modify the SharePath to reflect the location you wish to share files from. The client program (GetFile.exe) currently defaults to the loopback address. Change the server location to an IP address or a domain name. A domain name must be DNS hosted or configured in the "HOSTS" file. The Path reflects where you want to download files to. Click the "Connect" button, and if successful the files available on the server will be listed in the ListBox. Clicking on a file in that list will download and store the file with a MsgBox relecting that information.
This Version uses TLS 1.3 protocols to establish the connection. By default ECDH_P256 is used, but ECDH_P384 or ECDH_P521 can also be used. Normally, the server program would run as a service. If there is sufficient interest, I can add that feature. Authentication via UserID/Password can also be added.
> The encryption protocol I have chosen to use in this sample program is RC4. It is fast and its limitations are overcome by using a 256 bit key and relatively large record sizes.
Btw, RC4 is completely broken cipher nowadays and there is no fixing it using large keys/records.
> The encryption protocol I have chosen to use in this sample program is RC4. It is fast and its limitations are overcome by using a 256 bit key and relatively large record sizes.
Btw, RC4 is completely broken cipher nowadays and there is no fixing it using large keys/records.
cheers,
</wqw>
I disagree. RC4 is a stream cipher traditionally using a 40 or 128 bit stored key. That left it vulnerable to figuring out the key from short encryptions or many encryption samples. That has made it unsuitable for TLS connections. This application uses a different 256 bit key for every connection and is not used on short encryptions. Even if a hacker figures out the key for one episode, it is useless for the next.
Take a closer look at each of those vulnerabilities. They are all based on the fact that the key stream produced by RC4 and a fixed key are pseudo random. If the key used to produce the key stream is truly random, then the key stream itself will also be truly random. The Agreed Secret produced by ECC is considered to be truly random.
If the key used to produce the key stream is truly random, then the key stream itself will also be truly random.
Not for RC4. The cipher is biased i.e. it's output is not "truly random" no matter how random the key is. Please, stop using it for any meaningful purpose :-))
I mean, using it here for GetFile is OK I guess, though a bit strange, provided you have all the power of BCrypt API to implement whatever cipher you want.
We will have to agree to disagree. But in the original VBA code for RC4 that I ran across, the author had added the commented code in the code below. He had suggested that it improved the security of the cipher, which I am having a hard time understanding. If the hacker knows that I added those extra cycles, how would it improve the security?
J.A. Coutts
Code:
Private Function RunRC4(bText() As Byte, bKey() As Byte) As Byte()
Static allBytes() As Byte
Dim s() As Byte
Dim kLen As Long
Dim bTmp As Byte
Dim I As Long
Dim J As Long
Dim lPtr As Long
Dim sLen As Long
Dim bResult() As Byte
On Error GoTo RunErr
If GetbSize(allBytes) = 0 Then
ReDim allBytes(255)
For lPtr = 0 To 255
allBytes(lPtr) = lPtr
Next
End If
s = allBytes
kLen = GetbSize(bKey)
For I = 0 To 255
J = (J + s(I) + bKey(I Mod kLen)) Mod 256
bTmp = s(I)
s(I) = s(J)
s(J) = bTmp
Next I
I = 0
J = 0
'DebugPrintByte "Initial", S
'For lPtr = 0 To 3071
' I = (I + 1) Mod 256
' J = (J + S(I)) Mod 256
' bTmp = S(I)
' S(I) = S(J)
' S(J) = bTmp
'Next lPtr
'DebugPrintByte "Cycled", S
sLen = GetbSize(bText)
ReDim bResult(sLen - 1)
For lPtr = 0 To sLen - 1
I = (I + 1) Mod 256
J = (J + s(I)) Mod 256
bTmp = s(I)
s(I) = s(J)
s(J) = bTmp
bResult(lPtr) = s((CLng(s(I)) + s(J)) Mod 256) Xor bText(lPtr)
Next lPtr
RunRC4 = bResult
Exit Function
RunErr:
Erase bResult
RunRC4 = bResult
End Function
He had suggested that it improved the security of the cipher. . .
This improves nothing, some kind of security through obscurity.
In every cryptanalysis scenario the attacker has the algorithm, has the source code, has everything with comments and explanations if you wish. The only thing they don't have is the key, this is the only secret in the whole scenario.
Everyone can get FileServer.exe from your zip above and take a look at the disassembly and reverse the modified algorithm. So the code is believed to be available to the attacker, it's not a secret and only the keys are secret.
Quoting from the introduction:
"We explore the use of the Mantin biases (Mantin, Eurocrypt 2005) to recover plaintexts from RC4-encrypted traffic. We provide a more fine-grained analysis of these biases than in Mantin’s original work. We show that, in fact, the original analysis was incorrect in certain cases: the Mantin biases are sometimes non-existent, and sometimes stronger than originally predicted. We then show how to use these biases in a plaintext recovery attack. Our attack targets two unknown bytes of plaintext that are located close to sequences of known plaintext bytes, a situation that arises in practice when RC4 is used in, for example, TLS. We provide a statistical framework that enables us to make predictions about the performance of this attack and its variants. We then extend the attack using standard dynamic programming techniques to tackle the problem of recovering longer plaintexts, a setting of practical interest in recovering HTTP session cookies and user passwords that are protected by RC4 in TLS. We perform experiments showing that we can successfully recover 16-byte plaintexts with 80% success rate using 2^31 ciphertexts, an improvement over previous attacks."
I presume that they chose 16 bytes, because that is the length of the secret key used to create the key stream. To arrive at this result, they fed 16 unknown text bytes, along with 65 known text bytes on each side of the unknown bytes, into the list Viterbi algorithm. After 2,147,483,648 inputs, they were able to recover the unknown 16 text bytes 80% of the time. To quote the study:
"This is a situation of practical interest in attacking session cookies [1] and passwords [4] that are protected by RC4 in TLS." Cookies and passwords are transmitted in know positions in TLS, and for this reason, RC4 is not permitted to be used in TLS.
Btw, here is a standard VB6 module which implements ChaCha20 cipher (from the Salsa20 family of ciphers) in compact 140 lines of code
Code:
'--- mdChaCha20.bas
Option Explicit
DefObj A-Z
#Const HasPtrSafe = (VBA7 <> 0) Or (TWINBASIC <> 0)
#If HasPtrSafe Then
Private Declare PtrSafe Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As LongPtr)
#Else
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
#End If
Private LNG_POW2(0 To 31) As Long
Public Type CryptoChaCha20Context
Constant(0 To 3) As Long
Key(0 To 7) As Long
Nonce(0 To 3) As Long
Block(0 To 63) As Byte
NBlock As Long
NCounter As Long
End Type
Private Function ROTL32(ByVal lX As Long, ByVal lN As Long) As Long
'--- ROTL32 = LShift(X, n) Or RShift(X, 32 - n)
Debug.Assert lN <> 0
ROTL32 = ((lX And (LNG_POW2(31 - lN) - 1)) * LNG_POW2(lN) Or -((lX And LNG_POW2(31 - lN)) <> 0) * LNG_POW2(31)) Or _
((lX And (LNG_POW2(31) Xor -1)) \ LNG_POW2(32 - lN) Or -(lX < 0) * LNG_POW2(lN - 1))
End Function
Private Function UAdd(ByVal lX As Long, ByVal lY As Long) As Long
If (lX Xor lY) > 0 Then
UAdd = ((lX Xor &H80000000) + lY) Xor &H80000000
Else
UAdd = lX + lY
End If
End Function
Private Sub pvInit()
Dim lIdx As Long
If LNG_POW2(0) = 0 Then
LNG_POW2(0) = 1
For lIdx = 1 To 30
LNG_POW2(lIdx) = LNG_POW2(lIdx - 1) * 2
Next
LNG_POW2(31) = &H80000000
End If
End Sub
Private Sub pvChaCha20Quarter(lA As Long, lB As Long, lC As Long, lD As Long)
lA = UAdd(lA, lB): lD = ROTL32(lD Xor lA, 16)
lC = UAdd(lC, lD): lB = ROTL32(lB Xor lC, 12)
lA = UAdd(lA, lB): lD = ROTL32(lD Xor lA, 8)
lC = UAdd(lC, lD): lB = ROTL32(lB Xor lC, 7)
End Sub
Private Sub pvChaCha20Core(uCtx As CryptoChaCha20Context, baOutput() As Byte)
Static lZ(0 To 15) As Long
Static lX(0 To 15) As Long
Dim lIdx As Long
Call CopyMemory(lZ(0), uCtx.Constant(0), 16 * 4)
Call CopyMemory(lX(0), uCtx.Constant(0), 16 * 4)
For lIdx = 0 To 9
pvChaCha20Quarter lZ(0), lZ(4), lZ(8), lZ(12)
pvChaCha20Quarter lZ(1), lZ(5), lZ(9), lZ(13)
pvChaCha20Quarter lZ(2), lZ(6), lZ(10), lZ(14)
pvChaCha20Quarter lZ(3), lZ(7), lZ(11), lZ(15)
pvChaCha20Quarter lZ(0), lZ(5), lZ(10), lZ(15)
pvChaCha20Quarter lZ(1), lZ(6), lZ(11), lZ(12)
pvChaCha20Quarter lZ(2), lZ(7), lZ(8), lZ(13)
pvChaCha20Quarter lZ(3), lZ(4), lZ(9), lZ(14)
Next
For lIdx = 0 To 15
lX(lIdx) = UAdd(lX(lIdx), lZ(lIdx))
Next
Call CopyMemory(baOutput(0), lX(0), 16 * 4)
End Sub
Public Sub CryptoChaCha20Init(uCtx As CryptoChaCha20Context, baKey() As Byte, baNonce() As Byte, Optional ByVal NCounter As Long = 4)
Dim sConstant As String
Dim baFull(0 To 15) As Byte
Debug.Assert UBound(baKey) + 1 = 16 Or UBound(baKey) + 1 = 32
With uCtx
pvInit
If UBound(baKey) = 31 Then
Call CopyMemory(.Key(0), baKey(0), 32)
sConstant = "expand 32-byte k"
Else
Call CopyMemory(.Key(0), baKey(0), 16)
Call CopyMemory(.Key(4), baKey(0), 16)
sConstant = "expand 16-byte k"
End If
Call CopyMemory(.Constant(0), ByVal sConstant, Len(sConstant))
If UBound(baNonce) >= UBound(baFull) Then
Call CopyMemory(baFull(0), baNonce(0), UBound(baFull) + 1)
ElseIf UBound(baNonce) >= 0 Then
Call CopyMemory(baFull(15 - UBound(baNonce)), baNonce(0), UBound(baNonce) + 1)
End If
Call CopyMemory(.Nonce(0), baFull(0), 16)
.NBlock = 0
.NCounter = NCounter '--- part of Nonce that get incremented after pvChaCha20Core (in DWORDs)
End With
End Sub
Public Sub CryptoChaCha20Cipher(uCtx As CryptoChaCha20Context, baInput() As Byte, Optional ByVal Pos As Long, Optional ByVal Size As Long = -1)
Const BLOCKSZ As Long = 64
Dim lOffset As Long
Dim lTaken As Long
Dim lIdx As Long
With uCtx
If Size < 0 Then
Size = UBound(baInput) + 1 - Pos
End If
Do While Size > 0
If .NBlock = 0 Then
pvChaCha20Core uCtx, .Block
For lIdx = 0 To .NCounter - 1
uCtx.Nonce(lIdx) = UAdd(uCtx.Nonce(lIdx), 1)
If uCtx.Nonce(lIdx) <> 0 Then
Exit For
End If
Next
.NBlock = BLOCKSZ
End If
lOffset = BLOCKSZ - .NBlock
lTaken = .NBlock
If Size < lTaken Then
lTaken = Size
End If
For lIdx = 0 To lTaken - 1
baInput(Pos) = baInput(Pos) Xor .Block(lOffset)
Pos = Pos + 1
lOffset = lOffset + 1
Next
.NBlock = .NBlock - lTaken
Size = Size - lTaken
Loop
End With
End Sub
It might not be shorter that RC4 implementation but is quite strong stream cipher (based on 64-byte block for internal state).
I came up with this encryption routine several years ago and incorporated it into a DLL. Being a DLL, it was very easy to test in this application, and it worked like a charm.
Code:
Private Function Encrypt(bArray() As Byte, bKey() As Byte) As Byte()
Dim lPtr1 As Long
Dim lPtr2 As Long
Dim bTmp() As Byte
Dim Block() As Byte
Dim iLen As Integer
Dim M%, N%
On Error GoTo EncryptErr
iLen = (7 And (bKey(0) Xor bKey(1))) + 2
Block = bKey
N% = iLen
For lPtr1 = 0 To UBound(bKey) Step N%
Do Until N% = 0
N% = N% - 1
If lPtr1 + N% > UBound(bKey) Then Exit Do
Block(lPtr1 + N%) = bKey(lPtr1 + M%)
M% = M% + 1
Loop
M% = 0: N% = iLen
Next lPtr1
Block = HashData(HashAlg, Block)
lPtr1 = UBound(bArray)
ReDim bTmp(lPtr1)
lPtr2 = iLen
For lPtr1 = 0 To lPtr1
bTmp(lPtr1) = bArray(lPtr1) Xor Block(lPtr2)
If lPtr2 < UBound(Block) Then
lPtr2 = lPtr2 + 1
Else
Block = HashData(HashAlg, Block)
lPtr2 = (7 And (Block(0) Xor Block(1)))
End If
Next lPtr1
Encrypt = bTmp
EncryptErr:
End Function
Using the Agreed Secret as the starting key (32 bytes), the starting point for the creation of the first block is calculated from the first 2 bytes of the Agreed Secret. After creation, the block is then hashed to form the first block to be used in the creation of the Key Stream. The Key Stream is then xor'd with the Data using the same starting point. That is to say, not all of the Key Stream is used to encrypt the data. The first encryption length could be anywhere from 22 bytes to 30 bytes. When it runs out of bytes in the Key Stream, a new block is created by hashing the previous block, with a new starting point calculated from the previous block. The subsequent encryption lengths will be from 24 bytes to 32 bytes. Repetition of encryption lengths is thus avoided.
This looks like the key stream is some concatenation of successive recursive hashes like this
KS = H(K') || H(H(K')) || H(H(H(K'))) || ...
where H is the chosen hash and K' is some transmutation of the key (doesn't matter how weird the indexing is) and additionally some truncation of the H result up to key size.
Btw, here is a standard VB6 module which implements ChaCha20 cipher (from the Salsa20 family of ciphers) in compact 140 lines of code
It might not be shorter that RC4 implementation but is quite strong stream cipher (based on 64-byte block for internal state).
cheers,
</wqw>
My own encryption routine turned out to be too slow. That is probably why I stopped working on it. So then I tried ChaCha20. In the IDE, it was also relatively slow, but not compiled. Using a 1.6 MB file on the local network:
In IDE
My own: 4.9 seconds
My own as DLL: 3.3 seconds
ChaCha20: 3.0 seconds
RC4: 1.6 seconds
Compiled
My own: 4.9 seconds
My own as DLL: 4.8 seconds
ChaCha20: 1.6 seconds
RC4: 1.6 seconds
I will work on ChaCha20 to see if I can speed it up. Is the Nonce actually used in this particular case?
I get much better results for ChaCha, when native compiled (not to "PCode").
In IDE (all with a 1.6Mio ByteArray as Input-Data):
about 1250msec
native compiled (all "extended options" unchecked):
about 120msec
native compiled (all "extended options" checked):
about 90msec
So, (native) that's about 20MB per second throughput on an average modern CPU.
Olaf
Those times are those required to send the records with a maximum size of 16,389 bytes over a local network, including the file read time. In the case of the RC4 and ChaCha20 compiled times, I am pretty sure that the limitation is network speed, not computational speed, and I am don't know why the IDE time for ChaCha20 is so high. If I was to use the Loopback address, I might get more accurate computational times.
Using a cutdown version of ChaCha20 (32 byte only) configured as a Class, I ran a few more tests. Using the loopback address on the compiled programs of the RC4 and ChaCha20 versions made no difference. Both transferred a 1.6 MB file in about 1.6 seconds. Disabling the network transfer itself:
ChaCha20 in the IDE: 1,783 ms
ChaCha20 compiled: 156 ms
RC4 in the IDE: 141 ms
RC4 compiled: 15 ms
ChaCha Ratio = 1783/156 = 11.4
RC4 Ratio = 141/15 = 9.4
Although the RC4 times were faster, the ChaCha20 times are quite acceptable. Let me know if you think I should update the download.