Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Basically, hash all your data items, keep track of the maximum number of leading 0 bits seen in a hash so far.

This is a decent proxy (but not without error) for how many distinct items you've seen so far since large numbers of leading zeros should be uncommon. Finding this means you probably saw a lot of other things to get to that point (intuitively, about N leading zeroes are expected after seeing 2^N items).

This is actually the same thing that a proof of work cryptocurrency does to control difficulty: change target number of leading zeros so miners have to do more/less work to find a match.

Of course, you could get "lucky" with a single counter and the resolution isn't great, so HLL first separates the values into buckets which are estimated separately and then combined to give a more robust estimate.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: