Did you know that National Data Privacy Day is Jan. 28? Congress first recognized this day in 2009 to raise awareness about the importance of protecting one’s personal data on the internet.
Ten years later, and thanks especially to last year’s Facebook-Cambridge Analytica scandal, it’s probably fair to say that awareness that one’s online information might be misappropriated is at an all-time high. But are we any better at protecting ourselves?
In the area of genetics, there is some reason to think not. We live in a time of intense interest in personal genetics that has fueled the proliferation of large databases populated with individual genetic information. Some of these databases are assembled and managed by researchers.
For example, the 100,000 Genomes Project in the United Kingdom and the All of Us initiative in the United States are two national research programs building databases from the genetic information of participants. The development of these databases is critical to the advancement of precision medicine, which requires large volumes of data to reach statistically significant conclusions.
Other genetic databases are owned and operated by direct-to-consumer (DTC) genetic testing companies, such as 23andMe. Many of these companies ask customers for permission to include their information in research databases and also possibly “share” it (sometimes for cash) with third parties. These third parties may in turn deposit individuals’ genetic information into their own research databases.
Genetic databases are not only built and probed for research purposes. They also support genealogical and other personal genetic services offered by companies like Ancestry.com and citizen science initiatives like GEDmatch. One popular service identifies genetic relatives for users by querying databases of genetic information. The larger the database, the more relatives will be identified.
As my colleagues and I have discussed elsewhere, researchers and genetic testing companies are increasingly giving individuals the opportunity to download their uninterpreted “raw” genetic data. A secondary market has cropped up to help individuals use their data for personal benefit
–for example, to link to information about their disease risk, improve their diet and fitness, and even buy nutritional supplements tailored to their DNA profiles. Some of these secondary services are –you guessed it –also building collections of individual genetic information.
It has become standard practice among researchers and service providers (and sometimes also is legally required) to ask individuals for their consent before storing, probing, and sharing their genetic information with others. So what’s the problem from a privacy perspective?
For one thing, it isn’t clear that individuals are providing truly informed consent to secondary uses of their genetic data. Although there are federal rules in place to maximize understanding of research participants, those rules don’t apply to research that isn’t federally funded, supported or conducted outside of institutions that have voluntarily agreed to comply with these rules for both covered and non-covered research. They also don’t apply to non-research activities. That leaves a large swath of personal genetic activities –especially in commercial and citizen science spaces –with a lot of communicative leeway.
In some cases, notification of potential uses of individual genetic information may be buried in privacy policies and terms-of-use agreements that customers never read. But for those who do, a recent empirical study of DTC genetic testing company policies found large variation in the kinds and quality of information that is provided. What’s worse: almost 40 percent of the 90 companies that were surveyed did not publish any readily accessible policy explaining how they collected, used, or shared their customers’ genetic data.
Another concern is that individuals may not fully appreciate the potential consequences of sharing their genetic data with researchers and service providers, including who might access those data. Research initiatives like the Personal Genome Project have long informed participants of the possibility that their data might be used for criminal investigation purposes. However, when a suspect in California’s notorious Golden State Killer cases was identified by police using GEDmatch in 2018, this possibility suddenly became more realistic.
In a survey conducted last summer, my colleagues and I found that most people who responded (77-80 percent) supported police use of genetic genealogy databases to catch violent criminals and to identify missing persons. However, significantly fewer people (39 percent) supported police access to identify perpetrators of nonviolent crimes.
DTC genetic testing companies like 23andMe have been vocal about their resistance to law enforcement requests for customer data. However, it isn’t just genealogical databases that are vulnerable to police access. Law professor Natalie Ram warns that clinical and research databases also are subject to potential searches. She also suggests that, given the current state of the law, there may not be much we can do to protect ourselves against this.
The ability of researchers to amass large databases of genetic information is critical to scientific advancement. However, as Apple’s CEO has argued, more legal protections and transparency requirements ultimately may be necessary to give the public confidence in the security and use of their personal data. In a few weeks, a team of researchers from Baylor College of Medicine and other institutions will be reporting their findings on how to create participant-centric, trustworthy databases in a special issue of the Journal of Law, Medicine & Ethics.
In the meantime, and especially before sharing your genetic data in non-research contexts, let me suggest the following: read the fine print, ask questions, and if still in doubt, opt out.