pull down to refresh
206 sats \ 1 reply \ @BITC0IN 13 Oct 2024 \ on: Statistically significant evidence that Sha256 is somewhat predictable bitcoin
I'm not sure you can conclude much from running a classifier 420,000 times over a static set of 1000 two letter strings + their hash samples, 800 of which are training data, 200 of which are the test data. Such a limited test might be telling you more about the classifier than the hash algorithm.
"Yeah so took another look, and it's doing something... Weird, beyond just the odd static data. Rather than training and predicting based on the hashes, it's training and predicting based on a 10-length boolean array for whether the character at that position in the hex is a 0 (a 1/16 chance). If you actually use the hashes, the performance is (marginally) worse than random chance. So the claimed >50% performance is almost certainly an artifact of the model and/or the very specific, weird test set it's using.
To demonstrate, in each sample, there's a 52% chance of all 10 of the values being false. With only 1000 total pieces of data, 800 of which are training, 200 of which are test data, 5% better-than-random performance is not at all significant.
I also took the original test, and repeated it 999 times but rather than using the same two characters on each iteration, used char(n) and char(n+1) as the prefixes. The performance was ~50%, or random chance.
What's almost certainly happening here is in the training data, there's a slight skew of the approximately 104 expected all-false cases toward one prefix or the other, with a similar bias reflected in the test cases. This is more than enough to explain better than random results without anything being wrong with SHA256.
By repeating with slightly different training/test sets, you reduce the odds of any natural coincidental bias toward the same prefix on the training and test sets, unless there is a real problem. Surprise, surprise, there isn't."
reply