In today’s big data world, many organizations would derive huge commercial benefits from having a wider view of the behaviours and preferences of their customers. Having the ability to combine datasets with data from other sources greatly enhances the quality and applicability of the outputs generated. However, the data that a company has access to is limited to the scope of that company’s endeavors, which limits the analytical opportunities available to them.
Combined analytics scenarios
Combining insights from different datasets can assist business growth and enable consumer services and experiences to be improved. The examples below illustrate just a handful of real-world opportunities combined analytics presents:
- Smart cities projects use data from sensors that are positioned in multiple places, including traffic lights, roads, CCTV, public service vehicles, etc. Being able to leverage additional data from sources such as privately-owned vehicles, would greatly enhance the data-driven improvements that could be delivered to the community, such as traffic forecasting, capacity planning, etc.
- Financial institutions analyze the amount spent with merchants, but do not know what was purchased. Access to POS (point of sale) transaction information from merchants could improve customer offerings.
- Airlines produce aggregated statistics on routes, passengers, cost of flights, etc., but they do not have visibility on what their passengers do at their destination. Access to this information would enable airlines to increase their revenue-per-seat with relevant offerings.
- Loyalty programs have data on member offers and redemptions but not their other spending habits, which could assist with driving more relevant offers.
The limitless value in combined analytics via anonymization
The value associated with combining insights across datasets is limitless. Enhancing datasets based on segment-level insights from another avoids directly matching individuals across the datasets.
However, the only way to lawfully conduct combined analytics is to first anonymize both datasets separately so that combined analysis can be performed while ensuring the re-identification of individuals cannot occur.
Once anonymized, issues surrounding consent, data minimization and data retention no longer apply.
But of course, there are some constraints such as practical limitations to how combined analytics may be performed. For example:
- Associating an anonymized individual in one dataset with a group of similar individuals in another can enhance the individual’s data with aggregate statistics. In fact, enriching the information of individuals or groups with insights or learnings gleaned from the activities of similar users is a common analytical goal.
- The outputs of combined analytics cannot be at the level of individuals as this could enable re-identification. An individual’s data, when enriched with insights from another source, could be associated with their data from the original source, resulting in privacy harm. Thus, combined analytics should produce either aggregate statistics or trained model code.
As such, anonymization requires a unique blend of data science and data privacy expertise, but the value accruing from combined analytics is such that it more than justifies the effort required.
When done right, the whole can be greater than the sum of the parts. Companies can realize immense value without compromising on the trust and respect of their customers and while acting in an ethical way.