By Camber
Whether data is or is not “the plural of anecdote” is all in the analysis. Numbers alone are not a narrative. Or worse, (as in the other data aphorism) if you torture the data long enough it will tell you whatever you want to hear. But leaving the data alone with some machine learning algorithm doesn’t remove human error, it just lets the machine run off with its own biases. In many cases (traffic planning, doing math) this doesn’t matter or is easily managed. But when humans are the data point – and when lives depend on the resultant analysis – it matters a great deal.
The problem is mindset. Julian Jaynes once worried that science would become mere technology, “limping along on economic necessity.” To Jaynes, limping had nothing to do with speed and activity, but rather with motivation and vision. Whatever the pace and desire of the “market,” he feared, would be the pace and course of innovation. The result would be an environment that is usually frenetic, rarely revolutionary, and never unpopular.
This limping can describe much of the mindset in tech today, east coast and west. Speed and scale are the watchwords, automation is the process, and humans are in the way. Build fast, fail fast. Scientific expertise – in, say, epidemiology – can be engineered around with enough data and the right algorithm. Recently, we’ve seen this in the countless disease surveillance dashboards and contact tracing schemes that include excellent engineering and no scientific input: extremely fast to market, extremely cool UX, extremely devoid of epidemiologically informed metrics. It’s the techno-solutionism and blitz-scaling that make for billion-dollar companies, but perhaps not good public health policy.
In contact tracing, the problem is straightforward. Far too many of the factors inherent to transmission are beyond the ability of an app or Bluetooth protocol to capture. This is why Singapore and San Francisco are combining technology with thousands of humans. It’s also why an automated system that would assign color codes – green for unexposed, yellow for possibly exposed, red for likely infected – runs a dangerous risk of gross civil liberties violations. To make the system work, there would have to be some legal sanction for violation. But imagine being assigned red – and thus confined at home – without any understanding of how the machine came to this conclusion.
Less discussed, though equally important, is disease surveillance. This uses location data as a base layer for evaluating other clinical data. The analysis here is areas and groups, not 3-foot radii and individuals; and the goal is to provide modeling that will help state and local health officials make decisions about social distancing measures and resource allocation.
These both require extensive human involvement and subject matter expertise to run. They also require strong frameworks to protect privacy. The best solution – as in the divided three branches of government – is a partnership that includes private infrastructure, academic expertise, and public oversight and implementation. Done right – which is to say: done hand in hand with epidemiologists, scientists, privacy advocates, policy and health experts, public leaders, and technologists – the result can provide essential insights into the efficacy of policy interventions and provide an enduring infrastructure to manage COVID-19 as we slowly restart economic and civil life. Done wrong, it can be useless at best, dangerous at worst. Given the long road ahead, it’s worth taking the time to build it right.