Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Chrome uses a Bloom filter as a quick malicious-site check (alexyakunin.com)
74 points by derwiki on May 16, 2011 | hide | past | favorite | 12 comments


>Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site.

Not quite true. This ignores priors (http://en.wikipedia.org/wiki/Prior_probability) How many websites does an average Chrome user visit in a day? Let's pull a number out of the air -- 1000? If, as the article suggests, 1% of them are false positives, the bloom filter will have 10 false positives per day.

How often does a user hit an actual malicious site? Once a day? a week? Let's say once a day. So 1 true positive malicious sites, 10 false positives. Over 90% of positives are false! So most of the time when you need to precisely verify the maliciousness of a site, the site is safe.

Of course, given that you only need to do this check a handful of times per day, this seems like a valid tradeoff, but Bloom filters here are no panacea.


Your math is a bit topsy-turvy. You're ignoring the 990 sites it didn't have to check.

That's 10 sites to check out of 1000 - 1% of the load of checking (remotely) all 1000 sites (or 0.9% if you only count the false positives) wasted.


I'm ignoring the 990 sites it didn't have to check because it wasn't relevant to the quote I was discussing:

>Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site.

This is saying that most of the time the Bloom filter returns a positive (and therefore Chrome needs to use precise verification), it's a true positive. That's clearly not true.


But you've missed the entire point, which is that normally the browser would have to run the slow check on the 1000 sites, now it only has to run the slow check on 10 of them. That's a huge improvement, even if 9 of them turn out to be false positives.


I didn't emphasize that part of it, no; I was attempting to respond to only the part I quoted. I guess I should've tempered my original comment by saying that although the quote was incorrect, the Bloom filter overall is still a great help.


You're both right, and I'm pretty sure zck understands and appreciate the time savings that you're talking about. He's just pointing that the author's statement that "only malicious sites needed to be tested" is blatantly wrong.


Except, that's not what the author said. He said "practically" that.


"Let's pull a number out of the air -- 1000"

That number seems extremely high; are you sure you didn't pull it out of something else?

I'd think the order of 100 distinct sites per day is reasonable for someone who's also supposed to be working, which gives one true positive and one false positive, which isn't very much per day.


But we're not talking about distinct sites -- if by "sites" you mean "domains", not "urls" -- we're talking about distinct webpages, or urls.

Even with 100 distinct webpages per day, my point is still valid -- fully half of positives are false positives. One could just as easily argue that the number of one malicious site per day is large -- it certainly is in my experience.

Also, Chrome isn't just for people at work; people use Chrome at home too. Especially early adopters, of which I imagine Chrome's audience is made up of in greater proportion than the population at large. Early adopters would view more pages than the average person anyway.


Anyone who's interested in Bloom filters check a great blog post by Adam Langley about its variants with links to papers: http://www.imperialviolet.org/2011/04/29/filters.html


context: Adam Langley is a Chrome developer.


Here's a link to the relevant part of the Chromium source tree: http://src.chromium.org/viewvc/chrome/trunk/src/chrome/brows...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: