COVID-19 contact tracing – technology panacea or privacy nightmare?

The UK Government recently announced that it was ceasing development of its current contact-tracing app; on the same day, the Canadian Government stated that it was developing one. All this in the same week that the Norwegian health authority had to delete all data gathered via its contact-tracing app and suspended further use due to a ruling by the Norwegian Data Protection Authority. And if these examples are not enough to demonstrate the utter confusion, the Australian app is reported to have a bug that stops iPhones from reporting possible close contacts.

It’s clear that there is no single or quick solution that is going to resolve the individual needs of the world's health and government agencies that are attempting to use technology to assist in reducing the infection rates of COVID-19.

According to Wikipedia, more than 30 countries have, or are planning to release, apps designed to contact trace or geo fence their users, for the purposes of limiting and managing the spread of COVID-19. The development cycle and distribution of these time-sensitive solutions is itself unprecedented. Ask the members of any app development team if they could develop an app and the infrastructure to support 100 million or more users in under three months and they would say no – and that’s after they stop laughing at the suggestion.

Coming to a phone near you

The concept of contact tracing is to inform people that they may have come into contact with another person who has contracted or is showing symptoms of an infectious ailment, in this case COVID-19. The recipient of the notification can then take precautionary measures, such as self-isolation. This has proven a successful tool to assist in eradicating other diseases such as smallpox and has been used to control others such as tuberculosis, measles and HIV. With large portions of the world population now carrying a smartphone, technology should be able to play an important role, which is why we are seeing a surge in the development of contact-tracing apps.

The majority of apps available are government sponsored and use a variety of different methods to fulfill their purpose, such as Bluetooth vs. GPS, centralized vs. decentralized, and not all are sensitive to maintaining the privacy of the user.

There are two main methods being used to glean the physical proximity of users. The first is the global positioning system (GPS): this uses satellite-based radio-navigation to approximate the individual’s location and the location of other app users. The second, more prominent, solution uses Bluetooth and signal strength to identify other app users’ proximity, allowing the devices to exchange handshakes rather than track actual location. There are some solutions that use a mix of both Bluetooth and GPS and some even use network-based location tracking, but these methods have significant location-tracking privacy issues and are fortunately limited to only a few developments. The primary technology in use by COVID-19 contact-tracing apps is Bluetooth, as it provides a higher level of privacy protection.

There is an underlying issue though: Bluetooth discovery is not enabled while a phone is locked and the app requesting it is not primary. Until now there has been no reason for this to be enabled. Early versions of apps such as BlueTrace, the Singapore government's solution, relied on its users keeping their phones unlocked. The UK NHS beta app had a unique solution to this, at least for Android, but it would appear the limits implemented by Apple in iOS have meant that this was unachievable and has required developers to work with the official Apple and Google Exposure Notifications API.

The joint Google and Apple solution, Exposure Notifications API, preserves privacy and provides a method of using Bluetooth Low Energy and cryptography to provide a contact-tracing infrastructure. Use of the API is limited to public health authorities and access is only granted when specific criteria around privacy, security and data are met. However, this API is only part of a solution that an app needs to deliver the functionality needed. If an app requests personal information, either directly or by other methods, it could render this privacy-friendly solution questionable. The perception of a potential user of a contact-tracing app using this solution may be that the app, due the Google and Apple solution, has been developed to preserve the privacy of the individual; this could give a false sense of security.

There is also speculation that the use of the Exposure Notification API and Bluetooth for proximity and distance measuring in iOS may not be accurate; this was alluded to by the UK Government when announcing the cessation of the development of its own solution. Some of the potential issues are detailed in an article published by MIT Technology Review: it claims that if a phone is standing up in your pocket in portrait rather than landscape, then this alone can adjust the received power and make it look like someone is across the room as opposed to being next to you. The research also mentions the issue of signals passing through bodies – for example, if two people are standing back to back, the signal may appear weak, and thus record an incorrect distance. The UK Government claims to have developed algorithms that alleviate some of these issues; let’s hope the tech giants at Apple are willing to at least explore the potential solution the NHS team claims to have.

Google and Apple’s solution joins eight other frameworks that have been created since the beginning of the pandemic. The frameworks have been created in parallel by a mix of technology companies, privacy organizations, academia and governments. If the world adopted one framework there would of course be standardization, but this also adds a single point of failure if the framework is compromised or fails to deliver the expected results. As frameworks have evolved, app development projects have changed course to switch to the most appropriate for their requirements at that moment in time; without doubt this will continue as the technology for contact tracing matures.

The terms decentralized and centralized are frequently used and give a visualization of where data collected by a contact-tracing app is processed and stored; it gives the perception that centralization creates a greater risk to privacy. This may not be true as there are designs that use a centralized approach yet potentially afford the user of the app the desired level of privacy. The underlying issue is whether the use of data can be misused and the identity of the user is known or can easily be worked out. There are several reasons to create a centralized system, for processing power and scientific research and health management to name just a few; they do not necessarily result in a lack of privacy.

In the right place at the right time?

When a contact-tracing app comes into contact with another device running the same app there is a handshake and an exchange of keys. These keys are typically continually changing and are generated on a time basis and unique to the device. When device A meets device B, they share keys based on a predetermined distance and time requirement, for example within 2 meters for 15 minutes. The device either holds on to the keys or passes them to a central server; when users confirm they may be positive for infection then all the keys they have generated are added to a cloud system. All other devices will collect these on a frequent basis to see if there is a match with keys that have been collected or alternatively this match will be processed in the cloud. If there is a match, then those users are warned that they have been in contact with another device that is now reporting positive; they have no clue which device.

If the user is identifiable and all data is held and processed centrally, then there is clearly a privacy issue; if, however, the user is not identifiable and the central cloud is only processing for matches, this could be more efficient than asking the local device to do this processing, especially if the end device is limited on resources … which could be the case in some areas of the world. This approach also gives the centralized system the ability to identify potential false positives, where some malicious users say they are infected, yet in reality they are not and are just attempting to create chaos for users, companies and society in general. Using complex algorithms to identify false positives in a decentralized approach is less realistic due to resource limitations.

A benefit of partial centralization is that the portion of centralized data that is being processed could be used to inform scientists how the population as a whole moves around and to quickly identify hotspots to enable medical resources to be allocated. If, for example, a ZIP code is requested at the time of installation, then data scientists may be able to predict disease spread. This is unlikely to enable the user to be identified, as a single ZIP code is used by hundreds or thousands of people; it does narrow the potential to identify an individual, but it may offer an acceptable compromise on privacy.

Each country has adopted either its own or one the nine frameworks that have been developed, each providing a different balance between efficiency and privacy. The use of the different frameworks can cause issues: for example, most European countries have adopted the Google and Apple Exposure Notification API, whereas France has not and is processing data centrally. When borders open between countries, then it is improbable that there will be any synchronization between an app from Germany and an app from France.

Even the solutions that claim to be the most privacy sensitive are open to abuse: take the extreme scenario where video surveillance is used in conjunction with capturing Bluetooth signals emitted from devices and capturing the keys that are being exchanged. Combined with facial recognition technology and the location of the device at a known time could mean the user is identifiable. While this may seem extreme, it demonstrates that no one system offers a privacy guarantee.

Can it work?

So many problems and issues, such little time! There is no perfect solution in the timeframe offered to bring a solution to market. The uncertainty of what data may be useful in the future, what data users may be willing to share, the emerging technology frameworks, challenges with approximating distance and the immense pressure for apps to be delivered demonstrate the challenges that face both developers and governments in bringing a solution to market that will work efficiently and be acceptable from a privacy perspective. As this has never been done before, we should expect to see mistakes made and projects change direction; it’s through the pain of trial and error the best solutions will be found.

There are key factors that an app developer and a government requesting the development of an app should consider. The protocols that have been developed with privacy in mind are only as good as the developers' willingness to adhere to only collecting and transmitting the very minimum amount of data; don’t hide behind a privacy-aware framework and state privacy awareness when in reality you are collecting other identifying data and storing it centrally. If there are elements of centralization, then clearly state the reasons for collection and what the use is and publish the limits on how long data will be kept and who has access. The concept that non-identifying data, such as a ZIP code, is used to divert medical resources to the right place is potentially acceptable to many people, but there should be clear guidance that this data is time-sensitive and will be removed once it becomes obsolete.

For the benefit of public trust: all governments, in my opinion, regardless of their approach to this issue, should legislate on the acceptable use of data and create criteria on when an app will reach its end of life and be removed from devices. No more infections, no more app or data.

Would I run a contact-tracing app? Yes, on the conditions that use is anonymous and that data is not being used for anything other than stopping this specific disease from spreading and continuing to wreak havoc. In the event generalized location information such as ZIP codes may help science defeat this disease and put medical resources in the right place at the right moment, then my desire for absolute privacy is outweighed by my willingness to play my part.