Tuesday, March 31, 2020

Digest for comp.lang.c++@googlegroups.com - 5 updates in 2 topics

Frederick Gotham <cauldwell.thomas@gmail.com>: Mar 31 11:50AM -0700

I have triple-posted this to sci.crypt, comp.lang.c++, comp.lang.c. (I would have taken more flack for cross-posting).
 
I'm currently writing a paper on a cryptographic technique I have come up with. I hope to publish before the end of April.
 
When encrypting/decrypting a file of size from about 500 kB to 500 GB, I shall aim for my technique to be comparatively fast to the likes of Rijndael, Twofish, Serpent.
 
When encrypting/decrypting a small input, e.g. from 1 byte to 1 kB, my technique will be incredibly slow, as significant computation is required to load in the key and transform the first block. (The first block gets special treatment). If you had a million small files to encrypt separately, it would take thousands of times longer than if it were one large file.
 
It is the time taken to load in the key and transform the first block that shall make my technique far less susceptible to brute-forcing. I hear that the Blowfish algorithm was admired for this reason because loading in the key is as expensive as encrypting 4 kB of data.
 
Has anyone any thoughts on deliberately obfuscating the first step of an encryption algorithm, with the consequence of severe inefficiency when encrypting small files, but with the aim of making brute-forcing extremely time-consuming?
 
I'm coding my technique in C++ for multi-threaded high performance on modern PC's (and supercomputers). I've got an x86-64 machine with 20 cores to test this on.
 
Also I'm coding my technique single-threadedly in C for minimal ROM and RAM usage on microcontrollers. I won't be able to get it small enough for 8-bit microcontrollers, but a 32-Bit PIC should be more than powerful enough.
Melzzzzz <Melzzzzz@zzzzz.com>: Mar 31 09:28AM

>> seens begining of time...
 
> Using undocumented features in software which needs to be
> reliable is stupid.
 
Undocumented CPU feature is reliable. There is not one CPU different then
other in production, if same...
 
 
--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala
Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 11:31AM +0200

> 4 128 bit units per core. 2 adds and 2 multiplies. You can freely check
> at Agner's site...
 
That's wrong. Ryzens before Zen2 have 2 * 64 bit add 2 * 64 bit mul.
My benchmarks prove this. The asm mul-benchmark I've given in this
thread is unrolled that, if you would be correct, you would get twice
the throughput it actually gives. But it doesn't.
Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 11:32AM +0200

> Undocumented CPU feature is reliable. ...
 
No, its undocumented and thereby not reliable.
Bonita Montero <Bonita.Montero@gmail.com>: Mar 31 11:36AM +0200

> 4 128 bit units per core. 2 adds and 2 multiplies. You can freely check
> at Agner's site...
 
In this code ...
 
?fMul@@YQ_KXZ PROC
vpxor xmm0, xmm0, xmm0
vpxor xmm1, xmm1, xmm1
mov rcx, 1000000000 / 10
avxMulLoop:
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
vmulpd ymm0, ymm0, ymm1
dec rcx
jnz avxMulLoop
mov rax, 1000000000
ret
?fMul@@YQ_KXZ ENDP
 
... the CPU would alternately dispatch the VMULPD-instructions
to the alleged two 128 bit mul execution-units. But the timings
report different.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: