How Chrome uses a Bloom filter as a quick malicious-site check

zck · on May 16, 2011

>Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site.

Not quite true. This ignores priors (http://en.wikipedia.org/wiki/Prior_probability) How many websites does an average Chrome user visit in a day? Let's pull a number out of the air -- 1000? If, as the article suggests, 1% of them are false positives, the bloom filter will have 10 false positives per day.

How often does a user hit an actual malicious site? Once a day? a week? Let's say once a day. So 1 true positive malicious sites, 10 false positives. Over 90% of positives are false! So most of the time when you need to precisely verify the maliciousness of a site, the site is safe.

Of course, given that you only need to do this check a handful of times per day, this seems like a valid tradeoff, but Bloom filters here are no panacea.

pudquick · on May 16, 2011

Your math is a bit topsy-turvy. You're ignoring the 990 sites it didn't have to check.

That's 10 sites to check out of 1000 - 1% of the load of checking (remotely) all 1000 sites (or 0.9% if you only count the false positives) wasted.

zck · on May 16, 2011

I'm ignoring the 990 sites it didn't have to check because it wasn't relevant to the quote I was discussing:

>Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site.

This is saying that most of the time the Bloom filter returns a positive (and therefore Chrome needs to use precise verification), it's a true positive. That's clearly not true.

swores · on May 16, 2011

But you've missed the entire point, which is that normally the browser would have to run the slow check on the 1000 sites, now it only has to run the slow check on 10 of them. That's a huge improvement, even if 9 of them turn out to be false positives.

zck · on May 17, 2011

I didn't emphasize that part of it, no; I was attempting to respond to only the part I quoted. I guess I should've tempered my original comment by saying that although the quote was incorrect, the Bloom filter overall is still a great help.

vecter · on May 17, 2011

You're both right, and I'm pretty sure zck understands and appreciate the time savings that you're talking about. He's just pointing that the author's statement that "only malicious sites needed to be tested" is blatantly wrong.

boucher · on May 17, 2011

Except, that's not what the author said. He said "practically" that.

edoloughlin · on May 16, 2011

"Let's pull a number out of the air -- 1000"

That number seems extremely high; are you sure you didn't pull it out of something else?

I'd think the order of 100 distinct sites per day is reasonable for someone who's also supposed to be working, which gives one true positive and one false positive, which isn't very much per day.

zck · on May 16, 2011

But we're not talking about distinct sites -- if by "sites" you mean "domains", not "urls" -- we're talking about distinct webpages, or urls.

Even with 100 distinct webpages per day, my point is still valid -- fully half of positives are false positives. One could just as easily argue that the number of one malicious site per day is large -- it certainly is in my experience.

Also, Chrome isn't just for people at work; people use Chrome at home too. Especially early adopters, of which I imagine Chrome's audience is made up of in greater proportion than the population at large. Early adopters would view more pages than the average person anyway.

vilda · on May 16, 2011

Anyone who's interested in Bloom filters check a great blog post by Adam Langley about its variants with links to papers: http://www.imperialviolet.org/2011/04/29/filters.html

paulirish · on May 17, 2011

context: Adam Langley is a Chrome developer.

bdb · on May 17, 2011

Here's a link to the relevant part of the Chromium source tree: http://src.chromium.org/viewvc/chrome/trunk/src/chrome/brows...