Last Updated: 2020-04-29 12:40:49 UTC
by Johannes Ullrich (Version: 1)
In recent weeks, you probably heard a lot about the "Covid19 Tracing Apps" that Google, Apple, and others. These news reports usually mention the privacy aspects of such an app, but of course, don't cover the protocols in sufficient depth to address how the privacy challenges are being solved.
The essential function of such an application is to alert you if you recently came in contact with a Covid19 infected person. It is the goal of the application to alert users who are asymptomatic so they can get tested and self-isolate to prevent spreading the virus.
There are a few problems with this generic definition of the function of such an application:
- What does "recent" mean: Usually this refers to 14 days, which is the time most reports suggest as the incubation time.
- "contact" is usually defined as being within about 2 meters (6feet) of an infected individual. Some applications also require that the contact lasted longer than a few seconds.
- The infected person may only realize that they are infected until they are tested, and they learn of the result. The data needs to be stored until that determination is made (again, for about 14 days).
But the application does not need to know where you are or where you have been. Geolocation is not required to fulfill this function.
A key privacy feature implemented by these applications is that the application broadcasts a random, rotating identifier. Some of the protocols send a new identifier with each "ping"; others rotate it after a given time (minutes). This ID rotation prevents the most obvious threat of tracking a user using a unique identifier (as it has been done with MAC addresses). One of the critical parameters of these protocols is how often the identifier rotates. Some protocols suggest rotating them with each ping. Others keep the same ID for minutes (or even a day, which is probably too long).
PACT Tracing Protocol Schematic (https://arxiv.org/pdf/2004.03544.pdf)
At the same time, the application receives "pings" sent by others and records them. Initially, there is no need to store these pings centrally. Only the receiving device stores the IDs it received.
Once a user is identified as positive, things become a bit interesting. They upload the IDs they recently sent (and in some protocols also the pings they received) to a database. Some standards are assuming a single central database. Others suggest a more decentralized data store. Instead of uploading each ID sent, some protocols suggest that the IDs are derived from a seed, and only the seed needs to be uploaded, significantly reducing the amount of data being sent and centrally stored.
In addition, malicious uploads need to be prevented. They could be used to overwhelm the data or to cause false positives. The authentication schemes vary between the protocols, but typically the infected user has to provide some form of authentication code from a healthcare provider. The user should be able to exclude some data from the upload (e.g., based on the time the event happened).
Your device downloads the entire database of reported IDs (or seed keys) to check if you have come into contact with an infected individual. This is a lot easier if only seeds are uploaded (one record for each infected person) instead of having to download all individual IDs sent by infected users (about 2000/user for two weeks if the ID changes every 10 minutes). The protocol proposed by Apple and Google suggests the use of "Temporary Exposure Keys" that rotate daily. These keys are used to derive a "Rolling Proximity Identifier" which is rotated every few minutes.
This protocol should solve most of the privacy issues that arise from such an application. It should not allow a third party to identify individuals, and users will not know who of their contacts was positive (unless they only had contact with one individual person).
To assist with the acceptance of the application, users will have control over when the application is active, and what data is uploaded to any data repository.
Of course, in the past, it has been shown that very large anonymized datasets can be used to track individuals. Probably the best protection, aside from robust cryptographic implementations, is the deletion of data as soon as it is no longer relevant for tracking SARS-Cov2 infections. The user interface of the application needs to be carefully designed to allow the user to make sensible choices as to what data to record and upload to the central database.
Ideally, the application would only share the sent IDs (or seeds to derive them) with the central database. But some applications found it useful to report IDs received, Bluetooth signal strength, and the phone model. Proximity tracing with Bluetooth is tricky. Different phone models use Bluetooth chipsets and antenna configurations with different sensitivity and signal strength. Just measuring the absolute signal strength received is a poor indicator of distance. The Apple/Google standard suggests including some encrypted metadata with each ping. The keys used to encrypt the metadata are derived from the same temporary exposure key as the IDs broadcast by the phone. The metadata can only be decrypted after the user uploaded these temporary exposure keys.
This is a classic example of how one has to weight privacy vs. the value of the information received. What makes this more complicated is that a less privacy-sensitive application may collect more valuable data, but may also find fewer volunteer users. The application is only useful if there are many users (some suggest at least 60% of the population needs to use the application). And of course, the application needs to be released "now" leaving little time for an extensive review period.
A quick summary of the proposed protocols:
Apple/Google Contact Tracing
DP3T (Decentralized Privacy-Preserving Proximity Tracing)
PEPP-PT (Pan European Privacy Preserving Proximity Tracing)
PACT (Privacy-Sensitive Protocols And Mechanisms for Mobile Contact Tracing)