Andreea Bodnari

View Original

Data Products: Strategy Blueprint

See this content in the original post

In today’s business climate, strategic moats are built with data. Long gone are the days when you could build a new business line on software without a data play. Data was originally compared to oil, suggesting data fuels innovation engines. More recently, the Economist penned the comparison of data to sunlight because, like solar rays, data will be everywhere and underlie everything. Data is also the new infrastructure on which savvy business people erect differentiated business models.

Designing data products is costly. Data scientists and machine learning engineers top the charts of highest paid professionals, next to surgeons and doctors. Needless to say, it takes financial prowess and aligned business incentives to graduate a data science project from an experiment into a production application. The blueprint for successful data products consists of three core elements: business workflows, distribution channels, data sources.

Business workflows

Data products emerge as an application layer built on top of business workflows. Data products have a track record of success when deployed in operational settings such as admin process automation, customer support, regulatory compliance. That is to say that data products are currently assigned to the "safe" back-office where failures in performance are less costly.

Not every business workflow can enable a data product. I've prepared and vetted with a number of enterprise companies a scorecard to qualify business workflows for data product applications. Check it out!

Data sources

Public data or open data is available for everyone to access, modify, reuse, and share. Open data organizations are the counterparts of organizations supporting open source software. Their work empowers citizens and can strengthen democracies, streamline processes and systems in society, government, and private businesses. A few awesome open data sources are World Bank Open Data, Global Health Observatory Data, Google Public Data Explorer, Registry of Open Data on AWS, US Census Bureau.

Private data sources are the backbone of well-differentiated companies like Google, Amazon, and Facebook. A first-mover strategy enables a company to leapfrog in data aggregation games → data gravity. Search results, product/movie recommendations, and social networks improve with data. That's why established players are here to stay unless we make it plain simple for machine learning systems to share and learn from disparate data sources.
Licensing rights for private data get complex. A common problem across the board is that the owner of the data source cannot sub-license data externally. This means that private data can only be leveraged by products owned by the same organization that owns the data. Catch-22? If data was collected according to a license with sub-licensing clauses, this opens up opportunities for commercializing private data outside the parent organization.
We have to address the elephant in the room. Across companies, data management practices fall on a broad spectrum. Leading companies set an example by following ethical, privacy, and security rules. Some industries took matters in their own hands and established data privacy standards and frameworks. In healthcare and financial services, data privacy is enforced by regulatory agencies. Consumer industries have to abide by consumer privacy acts. Rule of thumb for everyone and anyone: always de-identify data and license silos of aggregated data as often as possible.

Synthetic data is a saving grace depending on the data product at hand. Computer algorithms have gotten really good at generating synthetic data: be it videos of celebrities or Nature articles, we can fake it all. Similar techniques can be used to generate synthetic data that trains the machine learning models behind a data product. To bootstrap such algorithms with relevant data seeds, companies can setup data donation programs - internal or or external- with the proper data use agreement in place.

Distribution channels

A product well built is only half the story. Your product is signed and sealed, now it needs to be delivered. A few distribution channels are available for enterprise products. Each distribution channel has implications on the product pricing model and on the overall product strategy (build vs buy vs acquire).

On a final note, data-driven products will require continuous monitoring for quality performance. You might ask why all this scrutiny, humans doing the same task are not monitored 24/7. Let's just say that humans undergo quarterly training on ethics and are responsible for their actions. Machines act in silence so we need to inquire about their behavior using monitoring scripts. It's a good practice to monitor product performance and flag corner cases. Start by defining internal policies for failure management, product ethics, and human-in-the-loop review.