Skip to main content

dbt

What is it?

dbt is an open-source command line tool designed to apply data transformations in a data pipeline. It compiles Jinja-wrapped SQL code to pure SQL and executes the code against a configured data warehouse.

Related tools include Airflow, Liquibase, Stitch and Flyway.

Who is it for?

  • Data engineering teams who need to transform raw data with models within elastic data warehouses like Redshift, BigQuery or Snowflake.
  • SQL-savvy data analysts in the analytics world who wish to leverage software engineering practices.

When should you use it?

dbt is best-suited to transforming data after it has loaded into your data warehouse. This means it is well-suited for cloud-based data warehousing with an ELT (extract-load-transform) approach, but isn't ideal for ETL (extract-transform-load) workflows.

Why should you use it?

  • dbt is easy to use with native SQL queries and only takes a few parameters to set up. No additional skills are required.
  • It generates boilerplate code to create tables and views with a create table as select approach. You can use the cast function in select queries to define data types of columns.
  • dbt looks after model references and dependencies automatically.
  • It augments SQL with programmatic features such as control structures and reusable macros.
  • It is easy to create multiple environments, such as for development and production.
  • dbt has built-in tests for schema and data, although performance might be a factor to consider if too many tests are written.

Why shouldn't you use it?

  • You cannot build infrastructure as code in dbt.
  • It only covers data transforms the T of ELT. You will need other tools to perform data extraction and load.
  • dbt may not be the best choice for working with data streams as it is primarily designed for batch-based processes.

How do you use it?

dbt CLI can be installed for free using homebrew, pip or via source.

dbt cloud is a hosted service with integrated CI/CD, monitoring and IDE. It is available as a single-license Developer plan for free, or a multiple-license Team plan for enterprise customers with a price of \$50/developer/month.

Are there any common gotchas?

DBT does not support Snowflake's materialized view.