Nathan Robinson has a great counter to the kind of attitude prevalent among “data journalists.” (Although I think the argument is applicable to most of “data science” attitude as well.)
The problem with any exercise involving “data” is that data is rarely, if ever, self-explanatory. Data needs to be placed in context, or in terms of “theory.” The data implies X, Y, or Z because we have linked the data points using a set of logic, which require assumptions, which, in turn, are ultimately predicated on “faith.” (i.e. all of physics is predicated on “faith” that the universe as we know it obeys laws of mathematical logic, which, in a sense, is man-made, or, at least, man-justified.) Without explaining the assumptions, the premises, and the theory, the “explanation” is meaningless. Often, these theories and underlying assumptions themselves constitute the “ideology”–the assumption that people are greedy and self-serving undergirds most of “economic theory” that students are taught in intro classes, for example, and having to make the naive economics students unlearn these “untrue but convenient” assumption that they were fed when they were too impressionable is a difficult chore.
In a sense, this is the conceit of Vox (and too much “data science”) that Robinson is getting at: by insisting that they are theory-free and totally “factual,” they become blind to the “theory” that they are placing the data into, the kind of assumptions that they are making that lead to their conclusions. They are blind to the possibility that the same facts, the same data can look different from an alternate perspective.
This is where “data science” views science in a manner opposite that of real science becomes pertinent. During the much ballyhooed debate with the creationists, Bill Nye the science guy did an astonishingly valuable service that is lost on many, by pointing out that he does not “believe in” evolution. He laid out the specific kinds of evidence, if demonstrably true, that would make him a creationist. His opponent, the creationist Ken Hamm insisted that his faith in creationism would withstand any evidence. Nye’s stand is the real science: science rests on conditional acceptance of theories, not on faith that the “scientific” theories of today are true. You accept the theories as long as they are consistent with the data, but you are willing to discard the theories as soon as there is contrary data that is indisputable. (at least, that is the principle.) The best data in real science is the data that show the boundaries of the existing theories: we are pretty sure that, within these bounds, our theories are pretty reliable guide to what will happen, but, outside these bounds, we are not sure. By discovering these contrary data, and refining the theories so that they are consistent with all the data we have, we progress in science.
This is not a very strong argumentative stance. You are not arguing that your position is true. Heck, you don’t even believe it yourself. But, if you are being a good scientist, you should also be able to point to the kind of evidence that would undermine the other side as well. So, what do you believe in? The answer to that, of course, is that you don’t “believe” in anything–anything you believe is not “science.” This, of course, means that you are not really “explaining” the “truth.” The “science,” the honest variety, would say explain: 1) these are what we assume; 2) these are the evidence we have; 3) these are the assumptions that are borne out by the data; 4) and most importantly, if we see X, we need new theories. Often this is why starting with seemingly bad answers/explanations and reasoning the way through data is the more scientific thing to do, like this wonderful article in the Atlantic about why “flat earth” theorists make for good science education. The point is not to ridicule them as being wrong, but using their premises to construct theories that can be matched up against the data, update the theory, rinse, and repeat. Unfortunately, this does not allow for “mansplanation,” no sense of superiority for the “right” side–because, in the end, in the eye of the “Truth,” all theories are oversimplifications that are “wrong” in some fashion.