[[!meta title="Entropic hash…"]]
[[!meta author="Daniel Silverstone"]]
[[!meta date="20090220 16:03:17 +0000"]]
[[!tag tech]]
Apart from sounding like a strange ’70s progrock band, the title of
this posting actually has something to do with random numbers. As some
of you are aware, the company I work for has produced a small USB device
which we are aiming at producing random numbers from for use in
situations where a large number of IID random values are needed.
However, rather than aiming at producing the megabits per second which
some companies do for around £400 or so, our goal is to have a
device more than capable of around 16kbits per second for around
£30 or so.
My question is related to the mixing of streams of random numbers. Our
device has two hardware RNGs which produce bitstreams. I am already
using Üli M. Maurer’s “Universal Statistical Test for Random Bit
Generators” to estimate the entropy gathered by the RNGs. What I am
trying to determine is if it is reasonable and safe to use a hashing
function such as SHA256 in the following way:

Assume the two RNG streams have had their entropy estimated (and derated
slightly).

Now create a hash state and feed bytes worth of the data from each of
the two streams, summing the estimates of the entropy being fed into the
hash.

Now, once the sum of estimated entropy reaches some threshold (perhaps
1.5 times the bitsize of the hashing function) we finalise the state
and emit the hash.

Repeat from 2.
What I am trying to determine is whether or not it would be subsequently
reasonable to estimate the entropy in each emitted hash state as being
in the least bit close to the hash size of the function. E.g. assuming
that in the 256 bits coming out of the SHA256 hash, that there is,
perhaps 200 bits of entropy.
Ultimately I am trying to come up with a way to process the streams (in
a microcontroller, so nothing **too** onerous on CPU or RAM) such that
the data coming off the device can be treated confidently as having a
high number of shannons of entropy in every byte. As I stated above, my
goal would be 16kbits/second of entropy and if that came in the form of
20kbits/second of data (2500 bytes/second) where the assumption of at
least six bits of entropy per byte was sound, then I would be very very
pleased indeed.
Some of you familiar with the general area of randomness and
particularly of testing randomness might have noticed that 2500 bytes is
a rather convenient size for applying the FIPS 1401 and 1402 tests to.
My goal is to aim at as high a FIPS 140 rating as I can.
If you think you can help, please do email me.