Delta Lake Blogs
How to use Delta Lake generated columns
April 12, 2023 by Matthew Powers
How to create Delta Lake tables with generated columns and the benefits of this feature
Introducing Support for Delta Lake Tables in AWS Lambda
April 6, 2023 by Nick Karpov
How to use deltalake in AWS Lambda with AWS SDK for pandas
How to create and append to Delta Lake tables with pandas
April 1, 2023 by Matthew Powers
This post explains how to create and append to Delta Lake tables with pandas
Running ML Workflows with Delta Lake and Ray
March 23, 2023 by Jim Hibbard
This post explains how you can read Delta Lake with the Ray compute framework
How to Convert from CSV to Delta Lake
March 22, 2023 by Matthew Powers
This post explains how to convert from a CSV data lake to Delta Lake, which offers much better features.
Getting started contributing to Delta Lake Spark
March 7, 2023 by Nick Karpov
This post explains the full development loop with the Delta Lake Spark connector. You'll learn how to retrieve and navigate the codebase, make changes, and package and debug custom builds.
New features in the Python deltalake 0.7.0 release of delta-rs
February 27, 2023 by Will Jones, Matthew Powers
This post explains the new features in the deltalake 0.7.0 release
Delta Lake Schema Evolution
February 8, 2023 by Matthew Powers
This post shows how to enable schema evolution in Delta tables and when this is a good option.
Delta Lake Time Travel
February 1, 2023 by Matthew Powers
This post shows how to time travel between different versions of a Delta table.
Delta Lake Small File Compaction with OPTIMIZE
January 25, 2023 by Matthew Powers
This post shows compact small files in Delta tables with OPTMIZE.
Adding and Deleting Partitions in Delta Lake tables
January 18, 2023 by Matthew Powers, Ryan Zhu
This post shows add partitions and remove partitions from Delta Lake tables.
Delta Lake Vacuum Command
January 3, 2023 by Matthew Powers, Nick Karpov
This blog post explains how to vacuum files marked for deletion from storage with the Delta Lake Vacuum command.
Reading Delta Lake Tables into Polars DataFrames
December 22, 2022 by Matthew Powers, Chitral Verma
This post shows how to read Delta Lake tables into Polars DataFrames.
Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR
December 13, 2022 by Vedant Jain, Denny Lee
In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.
Data Sharing across Government Agencies using Delta Sharing
December 8, 2022 by Li Yu, Mubashir Kazia, Jon D. Ceanfaglione, Prabha Rajendran, Purushotam Shrestha, Shawn A. Benjamin
This post shows how government agencies are sharing data with Delta Sharing.
How to Delete Rows from a Delta Lake Table
December 7, 2022 by Matthew Powers
This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.
Delta Lake Constraints and Checks
November 21, 2022 by Matthew Powers
This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.
Delta Lake Schema Enforcement
November 16, 2022 by Matthew Powers
This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes
Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables
November 1, 2022 by Matthew Powers
This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.
How to Create Delta Lake tables
October 25, 2022 by Matthew Powers
This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.
How to Version Your Data with pandas and Delta Lake
October 15, 2022 by Matthew Powers
This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.
Sharing a Delta Table’s Change Data Feed with Delta Sharing 0.5.0
October 10, 2022 by Will Girten
We are excited to announce the release of Delta Sharing 0.5.0.
How to Rollback a Delta Lake Table to a Previous Version with Restore
October 3, 2022 by Matthew Powers
This post shows you how to rollback Delta Lake tables to previous versions with restore.
Converting from Parquet to Delta Lake
September 23, 2022 by Matthew Powers
This post shows how to convert a Parquet table to a Delta Lake.
Why we migrated to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team
September 14, 2022 by Robert Thompson, Geoff Freeman
In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly....
How to drop columns from a Delta Lake table
August 29, 2022 by Matthew Powers
This post shows you two ways to drop columns from Delta Lake tables.
Apache Flink Source Connector for Delta Lake tables
August 11, 2022 by Krzysztof Chmielewski, Scott Sandre, Denny Lee
We are excited to announce the release of Delta Connectors 0.5.0, which introduces the new Flink/Delta Source Connector on Apache Flink™ 1.13 that can read directly from Delta tables using Flink’s DataStream API.
Delta 2.0 - The Foundation of your Data Lakehouse is Open
August 2, 2022 by Tathagata Das, Denny Lee
We are happy to announce the release of the Delta Lake 2.0 on Apache Spark™ 3.2! The significance of Delta Lake 2.0 is not just a number - though it is timed quite nicely with Delta Lake’s 3rd birthday....
Multi-cluster writes to Delta Lake Storage in S3
May 18, 2022 by Scott Sandre, Denny Lee, Mariusz Kryński (Samba TV)
While Delta Lake has supported concurrent reads from multiple clusters since its inception, there were limitations for multi-cluster writes specifically to Amazon S3. Note, this was not a limitation for Azure ADLSgen2 nor Google GCS, as S3 currently lacks...
Delta Lake 1.2 - More Speed, Efficiency and Extensibility Than Ever
May 5, 2022 by Venki Korukanti, Scott Sandre, Tathagata Das, Allison Portis, Denny Lee, Vini Jaiswal
Introducing performance optimizations that will supercharge your data pipelines at any scale.
Writing to Delta Lake from Apache Flink
April 27, 2022 by Fabian Paul, Pawel Kubit, Scott Sandre, Tathagata Das, Denny Lee
Learn more about how you can write from Apache Flink to Delta Lake about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
Extending Delta Sharing to Google Cloud Storage
March 11, 2022 by Will Girten, Shixiong Zhu
Learn more about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
Delta Connectors 0.3.0 Released
January 6, 2022 by Allison Portis
We are excited to announce the release of Delta Connectors 0.3.0.
Delta Lake 1.1.0 Released
December 3, 2021 by Scott Sandre
We are excited to announce the release of Delta Lake 1.1.0.
Delta Sharing 0.3.0 Released
December 1, 2021 by Lin Zhou
We are excited to announce the release of Delta Sharing 0.3.0.
Power BI Delta Sharing Connector
November 16, 2021 by Denny Lee
We are excited about the recently announced preview of the Power BI Delta Sharing connector
Delta Lake User Survey (2021 H2)
September 16, 2021 by Denny Lee
We would like to invite you to provide your feedback on Delta Lake OSS.
Delta Lake 1.0.0 Released
May 24, 2021 by Tathagata Das
We are excited to announce the release of Delta Lake 1.0.0 on Apache Spark 3.1.
AMA: Growing the Delta Lake ecosystem
March 2, 2021 by Denny Lee
On March 11th, 2021 9:00 am PT, join us for this fun Delta Lake AMA session where we discuss with QP Hou, Christian Williams, and Alexander Kushnir from Scribd on growing the Delta Lake open-source ecosystem.
Salesforce Engineering: Delta Lake Tech Talk Series
March 2, 2021 by Denny Lee
We are happy to announce the Salesforce Engineering Delta Lake Tech Talk Series for March and April 2021.
Delta Lake 0.8.0 Released
February 4, 2021 by Denny Lee
We are excited to announce the release of Delta Lake 0.8.0.
Salesforce Engineering: Delta Lake Blog Series
October 20, 2020 by Denny Lee
Salesforce Engineering has published a series of blogs on how they use Delta Lake.
Salesforce Engineering: Global Synchronousness and Ordering in Delta Lake
October 14, 2020 by Denny Lee
At Salesforce, we maintain a platform to capture customer activity — various kinds of sales events such as emails, meetings, and videos. These events are either consumed by downstream products in real time or stored in our data lake, which...
Salesforce Engineering: Engagement Activity Delta Lake, Redshift Sectrum supports Delta Lake
September 25, 2020 by Denny Lee
We have a couple of exciting call outs this week!
Getting Started with Delta Lake
September 16, 2020 by Denny Lee
Want to learn more about Delta Lake? Check out this series of Delta Lake videos.
Delta Lake Sessions at Spark+AI Summit North America 2020
June 22, 2020 by Denny Lee
We're really excited for the numerous Delta Lake training and conference sessions that will be showcased throughout Spark+AI Summit NA 2020.
Delta Lake 0.7.0 Released
June 18, 2020 by Denny Lee
We are excited to announce the release of Delta Lake 0.7.0 on Apache Spark 3.0. This is the first release on Spark 3.x and adds support for metastore-defined tables and SQL DDLs.
Delta Lake 0.6.1 Released
May 26, 2020 by Denny Lee
We are excited to announce the release of Delta Lake 0.6.1, which fixes a few critical bugs in merge operation and operation metrics. If you are using version 0.6.0, it is strongly recommended that you upgrade to version 0.6.1.
Delta Lake 0.6.0 Released
April 22, 2020 by Denny Lee
We are excited to announce the release of Delta Lake 0.6.0, which introduces schema evolution and performance improvements in merge, and operation metrics in table history.
Delta Lake Newsletter: 2020-03-20 Edition
March 20, 2020 by Denny Lee
For this edition of the Delta Lake Newsletter, find out more about the latest and upcoming tech talks and videos.
Diving into Delta Lake Online Tech Talk Series
March 13, 2020 by Denny Lee
For our next series of Delta Lake online tech talks, we're excited to dive into the internals with our Diving into Delta Lake series. This will be a fun set of tech talks with live demos and Q&A. Check them...
Delta Lake Online Tech Talks
February 21, 2020 by Denny Lee
We’re excited to announce the next series of Delta Lake online tech talks over the next few weeks. This will be a fun set of tech talks with live demos and Q&A. Check them out!
Delta Lake 0.5.0 Released
December 13, 2019 by Denny Lee
We are excited to announce the release of Delta Lake 0.5.0, which introduces Presto/Athena support and improved concurrency.
Delta Lake Newsletter: 2019-10-03 Edition (incl. SAIS EU 2019 Sessions)
October 3, 2019 by Denny Lee
This edition of the Delta Lake Newsletter, find out more about the latest and upcoming webinars, meetups, and publications. For this edition, we will also focus on the many sessions at Spark+AI Summit EU 2019 in Amsterdam.
Delta Lake 0.4.0 Released
October 1, 2019 by Denny Lee
We are excited to announce the release of Delta Lake 0.4.0 which introduces Python APIs for manipulating and managing data in Delta tables.
Delta Lake 0.3.0 Released
August 1, 2019 by Denny Lee
We are happy to announce the availability of Delta Lake 0.3.0! Features include: Scala Java APIs for DML commands, Scala/Java APIs for query commit history, and Scala/Java APIs for vacuuming old files.
Delta Lake 0.2.0 Released
July 19, 2019 by Denny Lee
We are happy to announce the availability of Delta Lake 0.2.0! It brings support for cloud storage (e.g. Amazon S3 and Azure Blob Storage) and improved concurrency.
Delta Lake 0.1.0 Released
April 22, 2019 by Denny Lee
We are happy to announce the availability of Delta Lake 0.1.0! Initial version of the open source Delta Lake.