dbt
What is it?
dbt is an open-source command line tool designed to apply data transformations in a data pipeline. It compiles Jinja-wrapped SQL code to pure SQL and executes the code against a configured data warehouse.
Related tools include Airflow, Liquibase, Stitch and Flyway.
Who is it for?
- Data engineering teams who need to transform raw data with models within elastic data warehouses like Redshift, BigQuery or Snowflake.
- SQL-savvy data analysts in the analytics world who wish to leverage software engineering practices.
When should you use it?
dbt is best-suited to transforming data after it has loaded into your data warehouse. This means it is well-suited for cloud-based data warehousing with an ELT (extract-load-transform) approach, but isn't ideal for ETL (extract-transform-load) workflows.
Why should you use it?
- dbt is easy to use with native SQL queries and only takes a few parameters to set up. No additional skills are required.
- It generates boilerplate code to create tables and views with a
create table as select
approach. You can use thecast
function in select queries to define data types of columns. - dbt looks after model references and dependencies automatically.
- It augments SQL with programmatic features such as control structures and reusable macros.
- It is easy to create multiple environments, such as for development and production.
- dbt has built-in tests for schema and data, although performance might be a factor to consider if too many tests are written.
Why shouldn't you use it?
- You cannot build infrastructure as code in dbt.
- It only covers data transforms — the T of ELT. You will need other tools to perform data extraction and load.
- dbt may not be the best choice for working with data streams as it is primarily designed for batch-based processes.
How do you use it?
dbt CLI can be installed for free using homebrew, pip or via source.
dbt cloud is a hosted service with integrated CI/CD, monitoring and IDE. It is available as a single-license Developer plan for free, or a multiple-license Team plan for enterprise customers with a price of \$50/developer/month.
Are there any common gotchas?
DBT does not support Snowflake's materialized view.