Patricia Thaine is a young founder of a startup, Private AI, a Toronto and Berlin-based startup creating a suite of privacy tools that make it easy to comply with data protection regulations, mitigate cybersecurity threats, and maintain customer trust. I sat down with Patricia to discuss the state of privacy, how we got here, whether privacy as a default becomes the norm and how Private AI will enable a privacy mindset among businesses.
2020 and the start of this new year lay witness to events that have forced the global community to finally shed the complacency of the last decade and embrace the inevitable. A pandemic has been the accelerant to bring to bear long-held predictions about the future of work and the rising gig economy, the disruption of higher education and the move towards online learning.
What has also materialized are the concurrent technological opportunities to surveil a virus to assist in mitigating its spread. In addition, within the education sector Covid-19 has expedited adoption towards remote learning solutions, video conferencing platforms and tools that would serve as proxies to gauge student engagement and learning abilities. The rise in demand for data to drive critical decisions especially during a time of uncertainty has also driven a level of consumer concern we have not seen before.
Data privacy is becoming mainstream and the public has become increasingly aware of the pending threats of surveillance in our remote lives and the stigmas associated with tracking location and behaviours in justified attempts to get ahead of a virus. This rising privacy concern has revealed an undercurrent of privacy practitioners, innovators and technologists – all waiting for the other shoe to drop. 2020 was that year.
Thaine agrees that Privacy has come front and centre in recent months and the public is more concerned. Covid introduced contact tracing technology that,, in many early cases, relied on location to determine the movement of the virus. Governments like India and UK authorized mandatory downloads; tracking location every 15 minutes, while the government absolved itself from liability. In a matter of months newer versions of exposure notification applications were introduced, however growing apprehension surfaced when purpose specification was unclear beyond what was disclosed as well as lack of clarity with respect to data minimization, data sharing, and the risk of re-identification.
Thaine, a Computer Science Ph.D. Candidate at the University of Toronto and a postgraduate affiliate at the Vector Institute, whose research focus is on privacy-preserving natural language processing and applied cryptography, points out this environment to urgently arm decision-makers to make swift decisions with the aid of technology has split consumers into multiple groups….
“I think there are those 1) who are not aware of what kind of information can be collected and who can see it; 2) there are also the optimists who think that democracy is unshakable and that as a result there’s nothing to worry about, and finally, 3) the pessimists who think that there’s nothing we can do about it anyway so we might as well not even try.”
There is so much noise out there and so much information. I think Ann Cavoukian does a fantastic job of getting the point across because she breaks down information into its essential pieces and repeats them until the points get across. And it works! Keeping things straightforward, accessible, and factual is the best thing we can do as a community to get otherwise busy people who are inundated with information on more than one important subject to pay attention to. That addresses (1) and maybe (2). For (3), as privacy technologies advance and data protection regulations start making more and more of an impact, the hopelessness from (3) might turn into hope.”
Canada has one of the strictest privacy legislations – far more aligned with the European General Data Protection Regulation. More recently, the Canadian federal government overhauled its existing privacy regulations to what is now known as the Consumer Privacy Protection Act (CPPA), which makes businesses more financially accountable for non-compliance. Thaine addresses both the California Consumer Privacy Act (CCPA) and Canadian CPPA as two wake-up calls making consumers more aware of the risks that different technologies available on the market can pose to their privacy.
“The GDPR requires all businesses servicing EU citizens to be compliant. My biggest complaint against the California Consumer Privacy Act (CCPA) is that the only businesses who must comply with the regulation are the ones with an annual gross revenue larger than $25 million, that receive or disclose the personal information of 50,000 or more California residents, households, or devices each year, or whose annual revenue is 50 percent or more from selling California residents’ personal information. In my opinion, that does not encourage privacy by design from the companies that are just now creating their core products and they will, in the end, be scrambling to retrofit their tech once they get big enough to have to comply with the CCPA. The California Privacy Rights Act (CPRA) makes the same mistake, with a few changes with respect to whom the law applies to, however, it’s not a blanket law for any company doing business in California, thereby incentivizes companies to consider privacy an afterthought… after they’ve grown, received ample funding and now are able to hire the right people. I can’t see how that won’t lead to many organizations scrambling to comply through inadequate patchwork. Canada’s Consumer Privacy Protection Act (CPPA), on the other hand, seems like it will apply to all businesses. There’s really not much I can say with certainty about the CPPA, since it’s still being drafted.
What’s clear is that the proposed Canadian CPPA has the teeth it needs to make businesses sit up and take notice, but may not go far enough. However, technology needs to lead in the face of lagging legislation, so perhaps, in the interim, the intersection between legislation and privacy technology is the answer. Thaine points to the common knowledge that we, as a society, have exchanged privacy for convenience. Because of that, she maintains laws now drive innovation in the privacy space, which will allow us to maintain the level of convenience consumers have grown to expect, but with much higher expectations of privacy rights. This can’t happen without continuous and significant technological innovation.
Will the ethical use of information by technology companies be curtailed if we have privacy as a foundation? Thaine referenced the GDPR definition of Privacy in her response:
“…it says: “data privacy means empowering your users to make their own decisions about who can process their data and for what purpose.” If users are honestly being told what their data are being used for in a comprehensive way; they give positive consent and they have an option to use a service without providing consent for something they are not comfortable with — that’s a great foundational piece for ethical data use. But it’s not the only piece. Humans are not independent and identically distributed data points. If I grant you access to my personal data that might inadvertently affect my son — e.g., here’s my DNA for you to store and use forever as you would like. Is my DNA also my son’s personal data? Would he get a chance to revoke consent?
It begs the question of the inclusion of data proxies and transfer of data rights, the same way assets are bequeathed in wills, or through power of attorney. Managing data as an individual asset has yet to be thought through if the individual were to continue to receive the convenience his/her data affords while fully in control of them.
The greater concern is if impending legislation will make it increasingly difficult to operate without embedding privacy into existing technologies, to do so will come with tremendous costs and difficulty. Thaine adds,
You either need a background in machine learning and privacy or one in homomorphic encryption, secure multiparty computation, differential privacy, etc. Talent in these areas is rare and expensive to hire for. And it is difficult to come up with solutions that are generalizable to multiple problems. That’s why we set out to build Private AI. With our tech, businesses both small and large can redact or identify personal data within the most difficult type of data to analyze: unstructured text, images, and video. That means they don’t have to store personal data when they don’t need to, even if they are dealing with massive amounts of call transcripts, for example. It also means they can be more selective about what kinds of information different employees see. And also importantly, they can keep track of where in their systems personal data are located. This makes it much easier to know where there might be security vulnerabilities and where more stringent protections need to be put in place.
Accessibility for small businesses was important for Thaine and her team decided to offer a product suite that includes de-identification of text, images, and video that can be integrated with only three lines of code into any software pipeline.
The idea for De-identification as a Service started out with a few pain hypotheses that natural language processing could help solve,
“As we worked through the tech stack that solutions would require to be privacy-preserving we realized that there was no way that someone without our privacy backgrounds could integrate privacy into their apps, browser extensions, or on-premise or private cloud deployments. Since that was a part of the product we were going to build, we decided to commercialize that after speaking to software developers, managers, and VCs like David Dorsey to discover what problems they were seeing in the privacy space that needed solving.”
For Thaine, the use of Private AI’s technology meant enabling detection where personally identifiable information (PII) and quasi-identifiable information were located. They provided the option to redact that information so personal data could remain uncompromised in the case of a data breach or for regulatory compliance.
“One of our customers is a mental health company, Animo AI, which redacts personal data from slack messages before the data even hits their servers. The point is two-fold: (1) to prevent their employees from seeing identifiable information if they need to debug, and (2) to train their models in such a way that they don’t memorize personal information and become vulnerable to model inference attacks. Model inference attacks are essentially a way for a malicious party to gain knowledge about the information the model was trained on.”
Private AI also works with Legal Data Innovation Institute (LIDI) to de-identify court transcripts of defendants’ data. These transcripts, if accessed one at a time or a few at a time, are publicly viewable. However, a massive database has not been made available due to privacy concerns, in particular, the concern that AI could automatically create profiles of defendants and witnesses. Says Thaine,
The lack of a large dataset of court documents makes doing research and calculating statistics on Canadian court cases very difficult, which means any systemic issues are difficult to unveil! LIDI is solving that problem.
We’re also being asked for transcript redaction, where very high accuracy is required. Call transcripts contain incredibly useful data and we help ensure they don’t contain information that can put whoever is storing the data at risk.
Private AI’s technology can also be integrated into network management systems to allow for more fine-grained access control. Not all tasks require sensitive information from a document to be viewed.
Increasing public awareness, mounting legislation are incentives that make businesses more likely to integrate privacy technology. Thaine agrees,
Tech is finally to the level that it needed to be at for there to be excitement around it. No one gets excited about 70%+ redaction accuracy. But when you start reaching the numbers we can reach for in-domain PII detection — that’s when eyes pop and it’s so satisfying to see!
If Covid-19 is just the tip of the iceberg, we will start to see more pandemics on the horizon and we’ll be dealing with a new normal that integrates mobility solutions related to disease status, and the right of passage. It stands to reason that privacy will automatically be embedded in these solutions. Thaine notes that this should not be debated. The combination of highly sensitive health information and mobility demands more private and secure solutions while making the data informative. Not integrating privacy will create a “giant security vulnerability for the population”
Thaine sees the comeuppance among the woke consumer.
“We’ve seen it a lot during this pandemic — even to the extent that many are refusing to even use apps that are legitimately privacy-preserving (e.g., Canada’s COVID Alert) because they don’t trust the app. This is a great example of what happens once trust is breached: it’s so hard to regain. And that’s damaging to more than just the company or government that breached the trust in the first place. It darkens the world with a fog of distrust. That’s another reason why strong privacy regulations that are enforceable and enforced is so important: We need to recreate an ecosystem that brings back trust in technology. We’re making magic! It should be the magic that gets people excited, not anxious and afraid.”
This evolution from consumer-centric design, where we know everything about the consumer towards human-centric design, where the individual is part of the process and dictates his/her experience with full control and consent of his/her information has just begun. While the onus has always been on individuals to carry the burden of understanding the lengthy legalese and consenting to data practices that included opaque data sharing and selling, a new market is emerging that lays accountability squarely on the shoulders of businesses.
The global privacy applications market is predicted to grow at a CAGR of 16.2% between 2018 and 2023. New privacy software solutions need to be in step with the “ever-changing privacy regulation landscape”. Privacy technology as a foundation for all business is the panacea but for Thaine, she proceeds with cautious optimism:
“It’s getting there, but it’s certainly not there yet. It can still be costly to create task-specific privacy-preserving tech. It usually requires in-house experts or a specialized company to come in and help. That’s why, at Private AI, we’re trying to solve this problem by focusing on the following 3 main tenants: ease of integration, efficiency, and accuracy. That’s how we intend to help privacy become ubiquitous.”
Whether this is a trend or whether Data Privacy has arrived is yet to be determined. The market predictions, however, indicate that Data Privacy is a viable business, and perhaps a movement that brings organizations one step closer to rebuilding consumer trust.