The Case Against Self-Service Data Preparation
Henry Ford is often credited with saying something like “If I had asked customers what they want, they would have said faster horses.” While the message holds a lot of merit for certain aspects of innovation, it turns out Ford probably didn’t say it. But no matter. What he DID talk about on the record was a multi-dimensional vision of what his “gasoline buggy” could be and could mean for American society:
"I will build a car for the great multitude. It will be large enough for the family, but small enough for the individual to run and care for. It will be constructed of the best materials, by the best men to be hired, after the simplest designs that modern engineering can devise. But it will be so low in price that no man making a good salary will be unable to own one – and enjoy with his family the blessing of hours of pleasure in God's great open spaces."
When I hear the renewed call for self-service capabilities in data and analytics, in particular the latest emphasis on “self-service data preparation,” it brings to mind Ford’s falsified faster horses quip. Certainly, there is value in faster data preparation and integration, but in the grand scheme of things, these kind of incremental changes are small potatoes. The real game-changer is the capabilities that open up as a result—by Ford’s reasoning, the newfound time to enjoy “the blessing of hours of pleasure in God’s great open spaces” that common car ownership would allow. The underlying business needs that are giving rise to this movement requires a step-change in how we work with data.
Learning the Hard Way
For decades, “self-service reporting” or “self-service BI” were the siren call of every business intelligence software vendor on the market. Vendors built semantic layers to encapsulate the complicated work of understanding data structures and coding relational logic in SQL in order to give users the ability to write their own reports without having to worry about the messy technical work. In Ford’s vernacular, this would have been the manufacture of an automated machine for grooming and tacking up your horses. In 2016, Gartner recognized the limitations of this approach, changed the criteria for their Magic Quandrant, and controversially swept away two-thirds of the former leaders.
The “self-service reporting” movement rightfully signaled to technology companies that businesses needed more flexibility and responsiveness in how they could get value and ultimately take action from insights buried in their data. Finding the right way to address that took a couple of decades of experimenting and advances in underlying database and data visualization technologies.
Looking Back to the Future
The emergence of “self-service data preparation” tools is a similar signal in the data management and data integration space. In fact, it is the inevitable after-shock of the “self-service reporting” movement. The current state of reporting is that analytical users are moving toward tools (data visualization) and developing the skills (data science) to take business information and generate meaningful insights that drive value-producing actions. What these knowledge workers still lack is the ability to convert siloed and disintegrated transactional data into an analytical data set.
In the current generation, self-service data preparation tools provide a convenient way for analysts to interactively load a data set, reformat and derive new fields, restructure the data, merge the data with other sources, and render a rectangular output that works as the input into their data visualization, exploration and analysis tools. As with the first generation of “self-service BI” tools, however, these tools only provide a new way of doing the same old data quality and data integration work that developers have been doing for years. It is a meaningful but incremental change, not a step-change to how we turn transactional data into business impact.
A Better Way Forward
One of the common characteristics in the new leaders of Gartner’s BI Magic Quadrant is that they all enter organizations through functional business teams rather than through technology departments. Their strength comes from their impact to business decision-making processes rather than their impact to technology effectiveness. These tools start with the data that users have in an analytical data set or information model for their business, and enable them to drive toward an understanding of that data.
Today’s “self-service data preparation” tools start from where the data is rather than where the users are. Learning from the early mistakes of the self-service reporting movement, data preparation needs to start from the business model that the analyst has in mind and work backward toward the possible data sources – starting where the user is as opposed to where the data is. Not only will this require a shift in the process of identifying relationships between data rather than the procedural steps required to transform data from one format to another, but also a greater focus on the business processes that create the data rather than the data available in transactional databases.
As with data analytics, achieving the needed step-change in data preparation will require the industry to try various approaches, make mistakes and learn from them. If history is any guide, no one should be too surprised when Gartner steps in suddenly and redefines the data integration space, sweeping out the leaders who have been there for 25 years.
In the meantime, stay ahead of the curve by keeping your focus on realizing business value through technology as opposed to focusing solely on technical optimization.
If you’re interested in learning more about this or other current trends in data and analytics, feel free to contact us directly or catch up with us this Friday at the TDWI St. Louis meeting. We hope to see you there!