By Ian Allen
The comparison with our current emergency to 9/11 is imperfect, but instructive. Back then, I was in the Marines, and later served seven years at the CIA. Now I run a data analytics company. Looking back, it’s hard not to recall the dangers of making policy in a panic; the enormous potential for action when a nation focuses its mind; the enormous potential for catastrophe when a nation gets blinded in its focus. Looking ahead, it’s hard not to fear the economic damage, social strain, and loss of life. In the present, it feels again like we’re building the airplane while flying it through an inflight emergency. Which is to say: decisions we make now to deal with what’s immediately in front of us could prove regrettable later.
Let’s think through it. First, as in any emergency, one must understand what’s most important right now. To get there, it’s helpful to work backwards: Long term we’ll have a vaccine and some level of herd immunity. Medium term we’ll have more effective testing and treatment protocols. Right now, we need to stop the spread and flatten the curve to buy time.
Public health officials tell us that social distancing is the best tool we have to do this. The question is how to provide those on the front lines with the insights they need to evaluate the efficacy of these interventions and manage limited resources.
One powerful answer is data. The analysis of refined and anonymized movement data – informed by the epidemiologists and other relevant experts – would help policymakers minimize economic and social damage while maximizing efforts to contain the disease, assess likely flare-ups and cooldowns, and inform resource allocation.
However, there are questions. Take feasibility. To provide insights at scale, data must have sufficient density across the socioeconomic spectrum. For example, it was reported recently that smart thermometers had indicated decreasing atypical fevers across the country. However, given the limited penetration of smart thermometers (largely wealthier households, more deliberately health-conscious), these devices do not sufficiently represent the general population to provide statistically significant and inclusive insights. If policy decisions were made based on this data, the most vulnerable – those who are income or housing insecure, for example – would have been invisible in the analysis that informed decisions about resources that they are the most likely to need.
Location data from smartphone apps, while not perfect, is the most inclusive data set available. Which raises the second question: privacy.
A comprehensive legal framework to govern how this data is used is long overdue. Thus, as we use this data to help fight COVID-19 right now, it is incumbent on companies to build the technical and policy frameworks to manage what data is collected and how it is used, how they exclude data irrelevant to the given purpose, who has access to the raw data, how it is secured, how long it is held, how it will be prevented from being used to deny services of any kind, how de-identified data will be prevented from re-identification, how to protect marginalized groups, and on.
Further, companies must work hand in glove with epidemiologists to ensure that we’re building to the right questions. Call this the expertise layer. Without this input from operational experts, any insights gleaned from location data analysis would be like relying solely on smart thermometer data: dangerously misleading.
Here, the oft-used analogy of data to oil or diamonds is useful: In raw form, these resources have limited utility and there are significant ethical and societal issues in how they are mined, refined, stored, and used. Just as an engine does not run on oil pumped straight from the ground, neither does an algorithm run on unrefined raw data. And careless exploitation can negatively impact the quality of life for millions of people.
But as dangerous as careless exploitation of data could be, not all location data and analytical processes are the same. Understanding disease surveillance and the impact of policy interventions requires analysis of areas and trends, not individuals. This requires specialized movement metrics – such as distances traveled, frequency of travel, the volume of activity in commercial areas, etc. It does not require exact knowledge of the path on the ground.*
As such, use of this data is a low risk for tremendous gain. The social and economic impact will be more severe for each day longer this disease spreads uncontrollably, and the cascading effects will be felt not only in our country, but around the world. We have a duty to use the tools we have at our disposal responsibly, ethically, and with all care to everyone in society. The results of this work, through all we face ahead, should be not only a healthier world, but a more resilient and equitable one. Our nation’s mind is focused. Let us not be afraid to act, and let us not be blinded as we do.
* Contact tracing is a very different problem, and beyond the scope here. There are also questions about how impactful this solution would be at this point in the pandemic, and how technological feasible it is to establish an effective contact tracing solution at scale in the short to medium term.