Jinja for DBT

Jinja

Jinja is a templeting framework built in python which goes beyond simple usage of fstring in python to generate dynamic strings.
Quite often than not, we need our string to be populate during runtime, for example a welcome message like, “Hello John” for John and “Hello Jen Doe” for Jen Doe. Programmatically it’s not a complicate operation, one can always concatenate two or more string (same or of different datatypes). In python you can write something like this:

python
name = get_name() #Part of application logic
message = "Hello " + name
print(message)

## Better Approach
message = f"Hello {name}"
print(message)

Using fstring, ensures that as a programmer you don’t have to be worried about concatenating wrong datatype, since in python concatenating a string with int will result into an error. This works well unless you are trying to concatenate large strings , which will be the case for SQL Statements.

DBT

SQL is a powerful way to abstract tabular data which has been saving tons of time and codes while trying to get insights from your data; dbt is going to save even more time since we write a lot of repeated SQL statements. Software Engineers like nothing more than spending their time to save other’s time. Dbt can use Jinja templeting easily to inserts it’s patterns in order to make reusable SQL statements and I am going to show you how.

Jinja with DBT

Referencing a Source or another Model

In SQL we often need to query one or more tables to create a view or another table and more often then not, a lot of complicated transformations are required, let me show you an example.

with base as 
(SELECT idv, timestampv, attr1, attr2 
FROM db.schema.customers),

base_with_rank as
(
SELECT idv, timestampv, attr1, attr2, ROW_NUMBER() OVER (PARTITION BY idv ORDER BY timestampv desc) as rn
)

SELECT idv, timestampv, attr1, attr2
FROM base_with_rank where rn = 1;

In this case we are interested in the latest idv for one to many join, as this could be an append only data model, but the final code would explode with more business transformation and joins.

In such cases, having one query with layers of CTEs is inefficient and harder to maintain. You can create create temp views though , in which case maintaining many objects and it’s references is going to be a challenge.
With dbt, we can create separate models for upstream logic, and refer that in your main model.



SELECT idv, timestampv, attr1, attr2
FROM {{ ref('base_with_rank') }} where rn = 1 

In this case, base_with_rank is the dbt model name, and we are referring it using ‘{{ }}’. Another useful pattern is referencing a source ({{ source(‘source_name’, ‘table_name’) }}) defined in your dbt project.

Conditional Query

While developing, imagine if you want to put a limit or top block, however you don’t want it to be included when the model runs in production, you can do something like below.

SELECT idv, timestampv, attr1, attr2
FROM {{ ref('base_with_rank') }} where rn = 1
{% if this.database != 'your_prod_db' %}
LIMIT 100
{% endif %}  

In the query, this.database is used to access the current database, other useful variables are this.schema and this for the current model. {% code here %}, lets you insert flow controls.

Declaring a variable and Looping

We can declare variables using set keyword, Jinja supports, numbers, strings, list and map.

{% set table_details = {"sys_log": "logv", "tmp_log": "tempv", "boot_log" : "blog"} %}

{% for k,v in table_details.items() %}
SELECT {{ v }} from {{ k }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}


-- SQL Compilation of above
SELECT logv from sys_log

UNION ALL

SELECT tempv from tmp_log

UNION ALL

SELECT blog from boot_log

With looping, we can save yourselves from writing tons of repeated codes especially in case of union and case statements. loop.last pattern allows me to write a separate logic for the end (useful in case of comma or keywords like union).
Just like map, we can loop over a list.

Miscellaneous Patterns

-- To stop additional lines use minus sign
{%- your logic -%}

-- To put comment
{#
This is a comment and it won't be rendered.
#}

-- else with if
{% set env = 'dev' %}
{% set time = 10 %}
{%- if env == 'dev' and time < 10 -%}
A logic
{%- elif env == 'dev' and time >= 10 -%}
A different logic
{%- else -%}
This one if nothing else
{%- endif -%}

Conclusion

Use Jinja in your dbt code to make your query compact and modular. If you come from SQL background and new to dbt, these pattern will help you identify SQL from the dbt code. Remember, everything is compiled into SQL hence understanding Jinja will help you understand the final SQL that will be executed on your data platform.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *