Exploiting Data Network Effect Securely through Data Clean Rooms
May 29, 2023
In a mall, all shops contribute to its overall footfall. This is called the network effect. Similarly, in the data context, every entity that generates data contributes more value to the network. To exploit the data network effects in an industry, we must:
The data set owner (Disney) can restrict questions (queries) asked. If the question is - how many customers have:
- Upload data to the cloud.
- Make this data available to others for analysis
- Ensure data privacy and protection of personal information
Snowflake Data Clean Room
Snowflake has created a Cloud Data Platform for data commerce. This platform gives users access to data within or outside accounts through:- Role-based access control
- Row-level security
- Column data masking
Unleashing data network effects in the advertising
Customers transact on the internet with multiple parties. They want the parties to know them to make their experience personal. Customers may object if details of their transactions are given to third parties -i.e., parties not involved in the transaction. Traditionally, customers were identified by placing cookies on the browser. But due to greater stress on customer privacy, Google Chrome has announced the discontinuance of cookies in Chrome browser. Also, regulations are becoming stricter regarding how personal information must be handled. Data clean room solutions are emerging as one of the most popular privacy-enhanced technology to facilitate data sharing and collaboration. Let's say I am a Disney customer. I will most likely associate with a particular advert. Disney determines this association based on my prior usage. I don't want third parties to know what I do on the Disney application. I may watch only adult content or cartoons, which is none of anyone's business. That said, it is Disney's business to maximize revenue from adverts. In this example, Disney has data about every customer's:- Favorite show
- Maximum association towards available adverts.
| Name | Favorite Show | Max Association with Ad |
| Sumukh (Me) | Baywatch | Nike Shoes |
- Baywatch as their favorite show AND
- Have the maximum association with Nike Shoes advert
- Allow this question
- Decide to put conditions even after allowing the question. Like, revealing the answer only if the count is more than fifty. This will avoid reidentification.
- The party may be willing to pay a premium for ads to these 1000
- Define success if the customer visits the store within five days
Implementing the data clean room solutions
Forbes says that every company is a software company. What does this mean to service partners of Snowflake like us? We create software to implement a specific solution at scale. We will provide services as software. You read it right. Service as software and not software as service. This will comprise the following.- An application to configure Snowflake accounts. A distributed clean room in the case above.
- A self-service, business-friendly user interface for setting up rules. Which column to show or hide? Which column to aggregate (count of, sum of, mean of, etc.)? Which column to use as a common key between one data set and another? In our example, we cannot show the column "Name". We can reveal the column "Favorite show" and the Aggregate Count of "Name." We can join the two datasets on the column "Name." We will set up these rules using a business-friendly interface. These rules will translate into query templates in Snowflake. This will scale as we can change or add rules without going to the IT department.
- Alerts/messages about datasets shared and linked rules through SMS or email.
- Stored procedures to validate query requests against the rules. Once before we send the request and once when received.
- We can also use Snowflake's data masking to mask data when necessary. Show only two digits of a phone number and mask the rest.
- An audit dashboard for showing which queries ran, who ran them, and when.
LTM's clean room solution
If you do not have data in Snowflake and want to upload it from your existing data warehouse to Snowflake, LTM has a solution for that as well.