In this week’s #SkepticalTuesday post, I discuss the latest changes that may be coming to Google’s search ranking algorithms: a litmus test for factuality.
Sometimes the truth is unpopular. That’s the problem currently plaguing the search engine Google, which ranks sites according to how many parts of the web link back to any given site. Since the total readership of this blog is roughly 300 people in a given month, for instance, there’s not a lot you can do to make it show up on the first couple pages of Google search results. (Yes, I know I could use SEO, thank you myriad bot commenters). This method of ranking results isn’t actually a bad thing much of the time — if all voices are equal, the crowd very often gets things right. It’s an idea called “the wisdom of the crowd,” and for things like multiple choice answers, anyway, it tends to be right.
But when it comes to more complicated questions, or when the crowd’s responses aren’t equal to one another, the system fails.
The wisdom of the crowd logic is that in any given group, some people will know what the answers are, and others will know what the answers aren’t, and still others won’t know at all. The ones who don’t know at all will balance each other out, but the other two groups will give a response that elevates the correct answer above the others.
But what if some voices speak louder?
That’s the problem Google’s having: thanks to its popularity-based ratings, the people who are either wrong or who are outright lying get a serious boost. Climate change deniers like Senator Inhofe and anti-vaccine spokespeople like Jenny McCarthy have louder voices, because they’re already public figures. And by god, do we love celebrities. Another part of it is that even well-meaning news sources like to provide “balance” on every issue, giving fringe views a lot more publicity than they might generally deserve. Our human propensities tend to skew the results.
So it’s interesting for me to see this study by a bunch of Google engineers, published this week on arxiv.org, in which they suggest a method that might be used to correct this: determine the trustworthiness of a site based on its propensity to be factually correct.
The system they’ve devised produces a value called KBT, or Knowledge-Based Trust. The process would be to extract facts from a given page, and then determine their accuracy, and use the propensity for making accurate statements (the example given, I’m not lying, is “Barack Obama’s nationality”) to determine the general trustworthiness of a given site. But how does it know what statements are facts?
“Inference is an iterative process, since we believe a source is accurate if its facts are correct, and we believe the facts are correct if they are extracted from an accurate source. We leverage the redundancy of information on the web to break the symmetry. Furthermore, we show how to initialize our estimate of the accuracy of sources based on authoritative information, in order to ensure that this iterative process converges to a good solution.”
Basically, they can start the ball rolling by feeding in a few hundred thousand statements (“triples,” they call them, like “Barack Obama, nationality, USA”) that they’ve manually verified as true — for example data pulled from the Knowledge Vault — and then use those to verify trustworthy sources, and use those trustworthy sources to verify other trustworthy sources, and so on. They’ve already human-tested a large subset of the algorithm’s responses, and found them to be pretty accurate so far.
So will this help with, say, the crazy that floats to the top every time you search “vaccines”?
The answer: maybe. But it won’t be a cure-all.
And it probably shouldn’t be.
While I have about as little sympathy for the upset Fox “news” pundits who make a living off denying science to the ignorant as I have for things like staph infections and the bacteria that cause stomach ulcers, I have to admit that Google having the power to change what you see is a little unnerving. It should be: after all, skeptics haven’t been treated well throughout history by the gatekeepers of knowledge.
But it’s not like we aren’t already subject to the same gatekeepers. Right now you’re probably seeing this article because facebook decided you’d be interested in it. Or maybe you saw it on reddit. But these are also catered ways of getting your information.
The question is, am I more worried that Google will deliberately “downvote” things that are scientifically accurate because they’re unpopular, or that they’ll do so because it thinks they’re genuinely incorrect? Because what we have right now is the former, and it’s bad enough that I’m willing to take a chance on the latter.
Besides, as the paper notes: “source trustworthiness provides an additional signal for evaluating the quality of a website” (emphasis mine). Even under the new system, if enough people want something, it’ll still be in Google’s interest to serve it to them.
So let’s all stumble a little closer to the truth, shall we?