Data Lake vs Data Warehouse | Avoiding the data lake vs warehouse myths

Choosing A Data Destination

The technologies fueling data lakes and cloud warehouses has matured over the past few years. This has given rise to hybrid uses cases for the role each can play within an enterprise.

Data Lakes

Data lakes are not just a place that retains all data for data scientists or storage repositories of large volumes of raw data. For example, you can use Amazon Athena, and Tableau to create an efficient and costs effective AWS serverless data lake analytics stack.

Cloud Warehouse

Cloud warehouses like these deliver cost-effective and straightforward platforms for data analytics. Set up and deploy a new data warehouse in minutes and then scale it down for keeping costs in check. For example, you can activate a serverless, on-demand Snowflake warehouse in about 5 minutes.

Hybrid data warehouse and data lake architecture

The technology for warehouses has extended to support data lakes (see Amazon Redshift Spectrum, Google BigQuery, and Snowflake). This "hybrid" model of pairing a lake and a warehouse takes advantage of optimized data formats, using compression, partitioning, and data catalogs.

Pairing a data lake and warehouse can ensure flexibility across business capabilities. The warehouse plus data lake service model are about delivering business value, not a data storage solution.

Understanding data lake or warehouse costs

AWS, Google, Azure, and Snowflake offer exceptional price value, including "on-demand" usage billing. For example, Snowflake automatically turns on and off which means you only pay for what you use. BigQuery only charges you for usage which is about $0.020 per GB for storage and $5.00 per TB of data scanned in a query.

Example: Amazon Athena Data Lake

Amazon Athena charges $5 per TB of data scanned. If you run 100 queries a day on average, scanning 25 GB of data per query will cost about $370.00. (3,042 queries per month x 0.0244140625 TB x $5.00 USD = $371.34 USD). If you ran no queries, on a given day, your query costs are $0.

Being efficient with how you run queries can reduce costs. In our post Beginners Guide For Faster, Cost-Effective SQL Queries, we detailed how you can significantly reduce costs by by following a few simple steps.

In a real-world example, we detailed how to create a low-cost, serverless analytics stack with Amazon Athena and Tableau. An AWS Athena S3 data lake used with tools like Tableau deliver a compelling cost/performance value proposition. Tableau supports "caching" data by using an “in-memory data engine technology, designed for fast data ingest and analytical query processing on large or complex data sets.”. As a result the number of queries Tableau sends to Athena is reduced significantly.

Cost benefit

The key takeaway for costs is that efficient usage delivers lower costs. Modern data lakes or cloud warehouses deliver unparalleled value when paired with a common-sense usage model.

Freedom from vendor lock-in

An open, on-demand warehouse or data lake strategy means you can run queries directly against your raw structured and unstructured data from a wide variety of tools like; Tableau, Microsoft Power BI, Looker, Amazon Quicksight, and many others.

Another consideration for a warehouse, cloud data lake or on-premise data lake design is not only supporting a broad array of industry-leading analytic data tools, but others DBT, Tableau Prep or Azure Machine Learning.

Pairing your solution with an open, standards-based architecture gives you the flexibility to undertake analysis with your preferred tools.

Which path? Data lake or cloud warehouse?

Given an accelerating rate of change in the data warehouse, data mart, query engine, and data analytics market, defining a overarching strategy can go a long way is building a strong foundation. Ask yourself these questions:

What cloud platform are you using today? You are licensed to use the Item to create one End Product for yourself or for one client (a “single application”), and the End Product maybe sold or distributed for free.

What data analytic tools do you use today? You are licensed to use the Item to create one End Product for yourself or for one client (a “single application”), and the End Product maybe sold or distributed for free.

Done right, a data lake or cloud warehouse can accelerate business consumption of data with your favorite data tools.

Getting Started: Start small, be agile

Lastly, start small and be agile. Hidden opportunities are often revealed by demonstrating what is possible by narrowing the initial scope. Keep everything agile helps to avoid complexity or the need for significant budget investments that do not deliver on expectations. Taking this approach will go a long way in supporting your data-driven insights aspirations.

To help you get started, take advantage of our 30-day no risk, no charge, free trial to see what we can do for your team.

References:

Have a question?

Not sure exactly what is the best option for you? Don't go it alone solving the toughest data strategy, engineering, and infrastructure challenges. Contact us!

Documentation

docs.openbridge.com

Support

support@openbridge

Data Lake vs. Data Warehouse