Cheaper cloud storage and the advent of Hadoop has made cloud data solutions a de-facto standard. Data warehousing has only been getting faster/cheaper, and the end user has increasingly been demanding a more developer-centric solution (think MacOS & Linux). Hence the solution: a BI that sits separately "on top of" a Data Warehouse, with all query logic curated through a proprietary developer-centric language. So a much leaner server footprint due to lack of separate data extraction engine and no more clunky windows-era menus on the client-facing side.
Back in 2012-2014, neither SQL-on-Hadoop nor MPP data warehouses could compete on both price and technical capability with Snowflake. These days, not only can Amazon’s Redshift Spectrum and Google’s BigQuery compete with Snowflake head on, but the open source stack has improved significantly...
In June, we saw two major acquisitions in the Business Intelligence domain: one of Looker by Google for $2.6 billion, and one of Tableau by Salesforce for $15.7 billion. There has been all sorts of speculation for what that means. Both companies are not the typical start-up acquisitions. Tableau has more than 4,000 employees. Looker, while relatively small, has more than 800 employees. As a former early employee at Looker myself, I have my own take on the events. But instead of offering yet another speculative viewpoint, I will offer what I think ought to happen for all parties to derive the most value from these acquisitions. I will also offer my take on how these acquisitions affect other major data vendors, namely Snowflake.
The high level summary is that I would suggest other major clouds (AWS, Azure, Oracle, and again Salesforce) follow suit and begin acquisition of Databricks, Snowflake, MongoDB, and MemSQL. The motivation for this is due to how Salesforce can leverage Tableau to entirely lock all business data within its native domain, thus leaving Google no option but to leverage Looker’s unique strength in a developer-centric modeling language for a Java-like or SQL-like adoption by the entire technical data analytics and data science community. With Looker’s LookML becoming the new standard for data modeling, other IaaS vendors would need to react by fixing their respective weaknesses: 1) AWS - support and professional services, 2) Azure - sales, 3) Oracle - modern product stack, 4) Salesforce - legit external interface for its rich internal data.
The two eras
First, let’s do a quick overview of how we got here. When Tableau was founded in the early 2000s, storage was expensive, compute was even more expensive, and cloud data warehousing was not yet a thing. Tableau's value proposition became the unified package of data extraction server (a kind of mini data warehouse) and client-facing BI. At the time, fast data warehousing was expensive (e.g. Teradata, HP Vertica, Oracle, etc), so client-facing tools were forced to create their own data servers for faster extraction and transformations at run time.
Contrast this with Looker’s founding almost a decade later. Cheaper cloud storage and the advent of Hadoop has made cloud data solutions a de-facto standard. Data warehousing has only been getting faster/cheaper, and the end user has increasingly been demanding a more developer-centric solution (think MacOS & Linux). Hence the solution: a BI that sits separately "on top of" a Data Warehouse, with all query logic curated through a proprietary developer-centric language. So a much leaner server footprint due to lack of separate data extraction engine and no more clunky windows-era menus on the client-facing side.
Essentially, the product offerings reflected more the difference in the context of two respective eras, than the difference in perspective—both companies call themselves “modern analytics” these days and do some amount of pre-extraction in the tool’s own server. The significance of each tool, however, cannot be overlooked. Both have become catalysts in their own right of pushing a whole ecosystem of software.
With the rise of Tableau during the first decade, companies such as SnapLogic and Talend established themselves as leaders in enterprise ETL and as a backbone to Tableau’s data engine. Similarly, with the rise of Looker, a number of modern data solutions (e.g. FiveTran, Panoply, Stitch) attached themselves to many Looker deals, establishing their own right to be the backbone of modern data analytics. For awhile, if you were making a solution choice, you pretty much had to decide which camp do you belong: Looker-Fivetran-Redshift or Tableau-SnapLogic-SQLServer, etc. But these days, companies have diversified their partner relationships, and everyone partners with pretty much everyone, making the landscape one of the most competitive ecosystems in IT. This suggests that whatever merges we have seen already (Stitch and Talend, Periscope and Sisense, Attunity and Qlik…), we have not seen all of it yet.
One other aspect worth mentioning is the impact of major cloud infrastructure companies: AWS, Azure, GCP, IBM, Oracle, and now Salesforce. These IaaS cloud vendors now include commodity-like products for every independent vendor product in the marketplace. And while some of these commodity products (e.g. GCP’s Dataflow) are relatively rudimentary at this stage in comparison to commercially available independent offerings, others are one for one competitive with independent vendors (e.g. Amazon’s DynamoDB vs. MongoDB, Google’s BigQuery vs. Snowflake). With that in mind, it does not take too big of an imagination to foresee a scenario when all major software categories are fulfilled by commodity services in one of several infrastructure clouds.
Given all of the above, you might be asking yourself now these vendor-specific questions:
Back in 2012-2014, neither SQL-on-Hadoop nor MPP data warehouses could compete on both price and technical capability with Snowflake. These days, not only can Amazon’s Redshift Spectrum and Google’s BigQuery compete with Snowflake head on, but the open source stack has improved significantly. This makes it increasingly difficult for Snowflake to survive independently, and accelerates the potential for a Microsoft-Snowflake deal. With Snowflake’s leadership coming from Microsoft, it is only trivial that MS would be a natural fit. Snowflake has an impressive sales force as well as technical pre-sales / post-sales teams, which is precisely where Microsoft lags with Azure--especially with the last batch of layoffs.
MemSQL, a much smaller Snowflake competitor, is also a prime acquisition target, but almost for the opposite reasons. A technical team with less audacious product-market positioning and only a meager sales footprint, the company would do well in Salesforce land by borrowing some of their marketing and sales resources to create an external SQL-based database to all of Salesforce—a long overdue project in its own right as any Salesforce API developer would tell you.
MongoDB, a public company, might have a longer independent lifespan, but at the end of the day, it too is increasingly competing with AWS DynamoDB and similar cloud commodity products. Since Mongo is a full package--became open source, has clear marketing and sales pipeline, and a strong engineering team—it benefits less from the likes of Microsoft, Salesforce, or Google. Instead, it would make a good extension to the existing product line of Oracle behemoth. And there is precedent: MySQL.
With all these above acquisitions, one question emerges: will AWS be able to keep up? And the answer I think is twofold. Technology-wise, AWS has lots of resources to continue doing what they do best: package deep tech into clear-positioned products. However there is a lot more to why companies such as Snowflake and MongoDB are winning: services and support. And looking over Google-Looker and Salesforce-Tableau deals, much of what both clouds get from this is in exactly those two areas: on-boarding and supporting clients. Perhaps while AWS will not change its bias overnight for building products in-house, it might start looking for consulting and professional service organizations to buy—and in the absence of a good culture match, buy another software vendor with a large services team: Databricks - I am looking at you…
So far I’ve covered Microsoft-Snowflake, Salesforce-MemSQL, Oracle-MongoDB, AWS-Databricks, but I have not explained what would be the catalyst for these acquisitions: Salesforce’s acquisition of Tableau.
Today various ETL vendors offer to export Salesforce data into external data warehouses, thus allowing hundreds (if not thousands) of tools to leverage that data for their applications, be that insight apps or reporting software. Typically that data is hosted in one of other IaaS clouds (e.g. AWS). This means that AWS, GCP, Azure can all bundle together a dozen other products on top of their existing data warehouse to leverage the business data through 3rd party applications developed on the cloud. And historically, what drove initial conversations for exporting Salesforce data has been the Tableau use case: bringing interdepartmental data into one view to avoid “data silos” within sales and marketing. With Salesforce taking that migration incentive out of common external vendor playbook and putting the tool into its own cloud, there is a lot less incentive to migrate data elsewhere and Salesforce becomes the de-facto destination for all business analytics.
As Salesforce establishes itself a majority owner of the business analytics space, Google will also be looking for answers. Historically, Google catered well to technical types. Even Google’s Next conference predominantly caters to developers rather than business executives. With that, the likely answer to salesforce dominance is going to be Google’s attempt at leveraging Looker’s BI modeling language, LookML, to make it an industry standard. While it made sense for a relatively small start-up to guard LookML as a proprietary language, it no longer makes sense to do so in the context of Google. Instead, GCP would be smart to make a Java-like standard out of LookML, and open source it for a wider adoption across other BI platforms and software.
As a corporate decision maker, it is easy to dismiss all these changes. No doubt, you might have watched the whole data space evolve over the past 20 - 30 years to the point of actually making you wonder what is actually fundamentally new and what is the same. After all, you might be reading about Looker’s approach to data governance and wondering how is this different from that of Microstrategy, the product that has been around for 30 years. Or you might come across another BI product, Alteryx, and wonder how such a product, which is so similar to Tableau, can co-exist alongside Tableau within your own enterprise. And you might be right. That said, data tools have never been about technology alone. At the core of every data software is some intuition about how people work with data. At the end of the day, fundamental technology concepts might or might not be evolving drastically from the days of early pioneers, but how we rely on data definitely is.