data lake pros and cons

Data warehouse vs data lake pros and cons

The technologies fueling lakes and warehouses has matured over the past few years. This has given rise to confusion of the role each can play within an enterprise. For example, data lakes are not just a place that retains all data for data scientists or storage repositories of large volumes of raw data. The types of data, as well as curation and data preparation, have broadened beyond these use cases.

Likewise, the technology for warehouses has extended to support data lakes (see AWS Redshift Spectrum). This "hybrid" model of pairing a lake and a warehouse takes advantage of optimized data formats, using compression, partitioning, and data catalogs.

Given an accelerating rate of change in the data warehouse, data mart, query engine, and data analytics market, minimizing risk should be a core part of any strategy.

Done right, a lake used with a warehouse can minimize the technical debt while accelerating a business consumption of data.

data-lake-data-warehouse
data-lake-data-warehouse

Next generation data architectures

Hybrid data warehouse and data lake architecture

Emerging hybrid models reflect opportunities on how a lake and warehouse can coexist. These new models can support a new class of analysts and business users that want to take advantage of what traditionally have been expensive, cumbersome big data technologies.

Pairing an optimized AWS or Azure Data Lake opens new possibilities for analytics. For example, you can use Amazon Athena, and Tableau to create an efficient and costs effective AWS serverless data lake analytics stack. Compared to an "always-on" traditional data warehouse, an AWS or Azure serverless analytics model can deliver value in the form of rapid data access to those who need it.

A data warehouse can benefit from a data lake as well. Specific jobs and tasks, traditionally done in a warehouse, can be offloaded to a data lake for cost efficiencies for infrequently used, large amounts of data. Pairing a data lake and warehouse can ensure flexibility across business capabilities. The warehouse plus data lake service model are about delivering business value, not a data storage solution.

data warehouse vs data lake vs data mart

AWS data lake vs data warehouse

Exploring the use of an data lake is not uncommon for those currently using a cloud warehouse like Amazon Redshift. Amazon released Redshift Spectrum to allow teams the ability to execute a hybrid strategy.

By taking a hybrid approach, data engineers can minimize the energy on around a data warehouse vs. data lake vs. data mart bakeoff. Adding an AWS data lake to a warehouse like Redshift delivers a solution that is well algined to the types data model a business needs.

For example, Specific jobs and tasks, traditionally done in a warehouse, can be offloaded to a data lake for cost efficiencies. Let’s say you have a 100 GB transactional table of infrequently accessed data in a warehouse. Why pay to store that data in Amazon Redshift when moving it to external tables on AWS S3 and query data with AWS Redshift Spectrum is an option? This approach can minimize the need to scale Redshift with a new node, which can be expensive!

Data lake best practices embrace a hybrid warehouse approach that optimize for downstream consumption. Consumption might be within analytic tools like Looker, Tableau, and Power BI. In addition to analytics tools, ETL applications that handle loading data from a lake to a cloud data warehouse like Amazon Redshift or Google BigQuery can benefit as well.

serverless data lake analytics

data lake or cloud data warehouse

Serverless analytics for data lakes or cloud data warehouses

As one of the top data lake vendors for Azure Data Lake, Amazon Athena and Redshift Spectrum, the Openbridge platform offers code-free, fully automated ELT data pipelines, and lake formation services. Our zero administration data lake technology stack allows you to get set up in less than sixty seconds.

Both Spectrum and Athena take advantage of a data catalog for data lake metadata management. The use of a data catalog is key to avoiding a characteristic data lake limitations of dumping everything into an unorganized data lake folder structure. The catalog affords a curated layer for both a data lake or cloud warehouse that greatly simplify access in tools like Tableau, Looker, Grow, Mode Analytics, or Amazon QuickSight.

Our Redshift Spectrum target destination illustrates how a data lake and data warehouse together can deliver incredible value and efficiencies.

Going serverless with Azure, Amazon Athena or Spectrum provides the business benefits of a data lake for operations, engineering, and analysis use cases.

on-premise data lake solutions vs cloud platforms

On-premise data lake, cloud data lake or data warehouse

On-premise data lakes require significant resources in both technology and people. Networks, storage, governance, and operations can be a significant investment for even deep pocketed companies.

As one of the top data lake vendors for Amazon Athena and Redshift Spectrum, the Openbridge platform offers code-free, fully automated ELT data pipelines, and lake formation services. Openbridge also offers lake formation automated data ingestion into for Azure Data Lake Storage Gen2.

Get a free trial of the Openbridge zero administration data lake formation service which allows you to get set up in less than sixty seconds.

onpremise data lake

Openbridge data lake as a service

Deliveing best practices for data lake and cloud warehouse architectures

It has never been easier to leverage a serverless query engine like Amazon Athena or Amazon Redshift Spectrum. With our zero administration AWS Athena or Redshift Spectrum data lake service you simply push data from supported data sources and our service will automatically load it into your target destination:

  • Automatic partitioning of data — Allows you to optimize the amount of data scanned by each query, improving performance and reducing the cost for data stored in AWS S3 storage services as you run queries
  • Automatic conversion to Apache Parquet — Converts data into an efficient and optimized open-source columnar format, Apache Parquet
  • Automatic data compression — Compression is performed column by column using Google Snappy, which means not only supports query optimizations, it reduces the size of the data stored in your Amazon S3 bucket which further reduces costs
  • Automated data catalog with database, view, and table creation — Data is analyzed and the system “trained” to infer schemas to automate the creation of a data catalog
  • No coding required — Using the Openbridge interface, users can create and configure data destinations for use with Athena, Spectrum, or Azure data lake

What is AWS lake formation pricing? There is no additional charges for the service from Openbridge. You are only charged for the usage of undelying AWS services like Athena or Redshift Spectrum. If you are an Azure cloud customer, check out our Azure data lake service.

If you were looking for a solution focused on cost optimization and simplicity in managing data lakes, give our service a try with a 14 day free trial!

Fuel your favorite BI, reporting, and data tools

Automated data pipelines process, route, and load to a target data lake or cloud warehouse

Our ELT systems integration solution provides an automated data pipeline architecture to leading cloud data warehouses and data lakes like Amazon Redshift, Amazon Redshift Spectrum, Google BigQuery, Azure Data Lake, and Amazon Athena.

Coming Soon: Extending our platform data pipeline tools to support Snowflake.

Data Lake vs Data Warehouse Frequently Asked Questions

What are Data lake advantages and disadvantages

We have referenced some of them on this page as well as on our blog.

Are Data lakes just a storage repository?

No, they reflect an architecture, technology, and strategy for data. Describing a data lake as storage mischracterizies the purpose and intent of the model. We address the idea of a lake only being storage in our post Data Lakes? Big Myths About Architecture, Strategy, and Analytics

Are data lakes are just for raw data?

No. They can support raw data, but data lakes deliver the most value when they incorporate a layer of curation. See How To Create A Serverless, Zero Infrastructure, Zero Administration Data Lake With Amazon S3, Amazon Athena and Apache Parquet

Do data lake store all data

Yes, but it depends on what you mean by "all". Generally, value of a data lake is realized through a base layer of curation. Typically you will have a landing zone for data in your lake, but then you would have a layer of curation that gives your lake value to data consumers.

Does a data lake use ELT?

Yes, a data lake can use ETL. However, Extract, Load, and Transform (ELT) is also commonly employed by a data lake and cloud warehouses. For more details on the how both can be used for your data lake or cloud data warehouse, see our post The ELT vs. ETL Process for more context.

Are data lakes dead?

LOL, who told you that? No, data lakes are not dead but there are many myths. There are design and implementation approaches that are antiquated, so in some sense the old ways of executing and thinking about data lakes are dead. However, the architecutre and strategy that drive adoption of lakes is very much alive. See Data Lakes? Big Myths About Architecture, Strategy, and Analytics.

How long does it take to create a data lake?

In many cases, it will be minutes for AWS or Azure. For more information on how quickly you can be up and running check out out data lake formation service.

Do data lakes support all data types?

Generally, yes data lakes support a broad array of data types. However, your choices can reflect performance and costs. See How to be a hero with the open-source columnar data format on Google, Azure and Amazon cloud.

How do I compare a data lake vs blob storage?

Blob storage is just that, a storage system. However, a data lake is an architecutre model. A data lake leverages blob storage as a component of its technical implelementation.

How do I compare a data lake vs relational database?

We published How is AWS Redshift Spectrum different than AWS Athena? which covers the topic more broadly.

Is a data lake just for raw structured and unstructured data?

No, a data lake is not just for raw structured and unstructured data. Typically this type of data is only present in a landing zone within a lake. A curation and cataloging process would move data to an accessbile state. We decsribe this state of accessibility as being "anlytics ready" for data lakes.

What are the benefits of an AWS vs Azure data lake?

If you are an Microsoft Azure customer leveraging the Azure Data Lake Storage Gen2 is a logical choice. Likewise, if you are an AWS customer, then leveraging an AWS data lake makes sense. However, there may be cases where a hybrid approach is warranted or neccessary. For example, we have a customer that leverages an AWS data lake to Oracle Cloud for Adobe data feeds. The benefit of one or another will almost always be a function of your overall data and technology strategies.

Openbridge Serverless Data Lake Platform

Go faster, be more flexible and deliver cost-efficiency

Looking at on-premise data lake solution? Work faster with the leading cloud data lake provider. Join over 2,000 companies that trust us.


START FREE TRIAL

14-day free trial • Quick setup • No credit card, no charge, no risk