The blitzkrieg approach to warfare, especially when practiced by mavericks like Rommel, in the empty deserts of North Africa, came as a nasty shock to the British generals (and the British Army in general) who were accustomed to, trained for, and organized around more static set piece battles. To the first set of British generals, the columns of German armor whirling about the battlefield were a nuisance, rather than the substance of the problem–for all its reputation, bulk of Rommel’s forces, especially the Italian troops who made up a majority of his command, were traditional infantry. In response, they took the view that the German tanks had to be nailed down so that they can move on to the real problem–namely, setting up the set piece battle at which they excelled at and defeat the Axis army. This did not work. Even if the armored forces made up a small fraction of Rommel’s command, they were the key to his approach to war. The panzers whirling about the battlefield was, from the German perspective, the substance of the war. Even if the traditional infantry were more numerous, they were simply the supporting element. That the British generals were not fully engaging the panzers, in anticipation of the “real battle” in the traditional mold meant that Rommel could maintain the initiative, run around the main British forces that were unable to react to the German thrusts because they were waiting around for the “real battle” that was not ever going to take place (I borrow this language from the accounts of the battles in North Africa by B. H. Liddell Hart).
The next set of British generals realized this and they felt that the British had to match the German approach, or, in other words, send their own armor (or should I say “armour”?) around the battlefield chasing after the German. This, unfortunately, did not work either. The British were not trained, equipped, or organized as well as the Germans for blitzkrieg and they were, moreover, going against Rommel, who was especially ingenious in the ways of mobile warfare. The columns of British tanks trying to whirl around the battlefield were frequently caught flat-footed by the Germans, ran into ambush, and otherwise got knocked around silly. If the British tried to mimic Germans at something the latter were much better at than the former, they were going to have their collective ass handed to them, and that happened frequently.
The last set of British generals finally came to grips with the full depth of the reality. Rommel was not going to give in to a classic set piece battle as long as he could run around the British with his mobile forces, but the British could not match Rommel if they tried to engage in mobile warfare of their own. (The credit for this usually goes to Montgomerry, but Auchinleck, the top general among the second set, was in fact coming around to this realization when he was replaced. It was Auchinleck who commanded the British during the First Battle of el Alamein, when Rommel’s advance was initially halted.) The consequence of this realization was the Battle (or rather, three separate battles) of el Alamein, all conducted essentially as classic set piece battles (especially so in the case of the last one) on a terrain that was not hospitable to the kind of whirling mobile warfare favored by the Germans. The result is history, as the saying all too commonly goes. But the choice of this mode of battle involved another tradeoff: this was feasible at el Alamein, but NOT elsewhere between Libya and Egypt. Especially important, if the stand at el Alamein failed, there would have been nothing that would prevent Rommel from taking Alexandria and Cairo, with his superior mobile tactics.
There is an interesting insight here about different aptitudes that people bring to a project and the design of that project, especially between the “science” and the “data” mentality. Theory and hypotheses always precede the data in the former. Some sense of what goes with what, what moves when, and how needs to be in place, and from them, an idea of what to expect from the data needs to be formed. Only then, does it make sense to start looking for data (in a manner that fits with the hypotheses). But the project itself and the data availability do not always cooperate, and much less so now than before. Experts are brought in to deal with the stockpiles of data already assembled (or, at least, the architecture for data collection) and the problems already identified. Or, put differently, the Afrika Korps is already whirling about the battlefield. All talk of forming proper theories and proceeding systematically makes as much sense as needing to nail down German panzers so that the British can go on with the real battle.
But can data analyses without much, or, indeed, any theorizing work? A lot of so-called data science revolves around so-called unsupervised learning, or pattern recognition without explicit theorizing. Many allegedly “supervised” learning models are also employed in a rather atheoretical fashion–variables are thrown in because they are there, without much nuance or careful thinking. These are, in turn, encouraged or even dictated by the need to offer simple, easily visualizable “explanations” for nontechnical audiences…but that comes at the overly facile and often prejudiced (knowingly or not) conclusions of the kind that Vox seems to fall to. We just saw the bubble burst on this way of thinking, in the form of the simultaneously over- and under-analyzed election of 2016. Thousands of wonks were poring over the same data and coming to the same conclusions–that the data says Trump will lose. Those who thought the data looked a bit fishy hedged their bets, but they lacked clear enough sense of what was wrong with the data to stick their necks out. And it turns out that the data was a bit fishy indeed. I should feel elated, in a sense, for expecting, if only halfheartedly, exactly what turned out would happen–that Trump would lose the popular vote by a couple of per cent but win electoral college with thin majorities across the Midwest, but this was a result predicted by theory combined with hunch, with only minimal corroboration from the data, and in a universe where the only lingo that is respected, indeed understood, is one laced with data, not something you can explain without a lot of hesitation.
This is where we come to the el Alamein analogy. Social data, in particular, is lousy–they are inherently noisy, complex, and unpredictable. I don’t believe the methodology for analyzing them will ever become completely and universally reliable. Like the British tankers in North Africa, we will never catch up, in mobile warfare terms, to the Afrika Korps of the reality. Some battles may be won, but on average, they (the reality) will beat us where it counts. We may be able to get more than half right, if you want to be technical, but we will be wrong, and badly too, often enough that we should not rely on atheoretical data analytics too much, unconditionally.
But overreliance on theorizing and “scientificalizing” how we deal with social science problems is also problematic. In most instances, we don’t have the wherewithal with which to engage in careful theorizing about the problem. One of the peculiar experience I’ve had in dealing with data science mentality came when working with some data science types who were incredulous that I was hesitant to sic some preexisting data analytical tool at some very big set of data that they had and kept asking about the moving parts in an attempt to form a reasonable theory to wrap the problem around. But, from their perspective, getting the results that were actionable was more important than having a clear sense of what’s what. It is worth remembering that Google Translate succeeded where previous attempts at computerized translation because it abandoned the attempt at subordinating translation to a theory of languages. As it were, languages are too complex a thing to theorize about while practical translation, in most contexts, follows simple enough patterns. Coming up with a complete theory of language around which practical translation can be built around would make as much sense as waiting for the Afrika Korps to be knocked out so that the British could conduct a set piece battle to defeat the Axis forces.
The Montgomerrian approach, of relying on the “theorizing” to deal with the problems when possible, in a sense, might itself be a false choice, because, in most cases, this is not an option. El Alamein made for a good battlefield because the terrain negated the advantages that the Afrika Korps enjoyed in mobile warfare. It could not conducted in most other settings: in order to fight at El Alamein, General Auchinleck abandoned Tobruk and Mersa Matruh and took the risk that, if his gamble failed, there was no way to defend Alexandria and Cairo beyond that point. Throughout World War II and beyond, politically and psychologically valuable targets that could not be easily defended were too heavily reinforced, only to become disasters when they were lost–the British losses of Hong Kong and Singapore and the Germans at Stalingrad and Tunis, among many other examples. A useful theory of language informing a computerized translation algorithm may be able to better translate Evgenii Onegin than Google Translate. But nobody cares about poetry, right? I am being serious on this point: the value added, in economic terms, from being able to reliably translate treaties, contracts, newspaper articles, and technical papers–all written for ease and precision in conveying specific content–is enormous, while that of poetry and literature–whose value depends on nuance, ambiguity, and other complexities–is not. The old saying that pun is the lowest form of humor is right: the value of translating puns is tiny while they are absurdly difficult to properly translate. Thus, puns cannot be very valuable–data says so! What this means is that just because the problem is important does not mean that there will be good solutions, regardless of methodology. You can win by data if you are fighting only Italians and not the Afrika Korps (or, just easy to translate pieces rather than complex and subtle ones). You can win by good theorizing if you are fighting at el Alamein, but not at Tobruk (or, translating Pushkin). If you are trying to defend Singapore, you might need something besides just theorizing or data. If it is Hong Kong, then you might be a fool even thinking it is at all possible, even with a heavily reinforced garrison (just before World War II, the Hong Kong garrison was reinforced to more than a full division, despite the obvious difficulties in defense, and when the war came, Churchill ordered them to fight to the last “for the honor of the British Empire,” in light of the obvious symbolic value of the colony. While the garrison did fight on much longer than expected, they didn’t quite fight to the end. That we think that the problem is important does not mean that we are entitled to an answer.)
The power of science or data, even more so for social sciences, to find good answers is limited. We need to be cognizant of this and hedge our answers accordingly, and lay a clear dividing line between what we can do and what we can’t, and a reasonable answer of why. Only snake oil salesmen can sell unconditional curealls, and that is only because is the only unconditional cureall is snake oil. If “science” tries to be an unconditional cureall, it too will be snake oil. Remember that all “science” is just a theory, and that all theories are only conditionally true. Indeed, it is this conditionality that makes “science” science: we know what we know and, more important, we know what we don’t and to say we don’t know.