Category: Technical Notes

Technical Notes

  • All Hail Linux:2: Finding your Way

    Objective

    Through this chapter, I will teach you to be slight comfortable with Linux GUI and Terminal. I will cover basic command like ls, cp, mv, cd etc. show that you can draw parallel to GUI based action to command based action.

    GNOME Desktop Environment(Linux)

    Getting around in your GUI is very straight forward. You can click show apps to see installed apps. You get a lot of utilities while installing Ubuntu such as libre office, calculator etc. and if you need more apps, go to App Center and browse for apps. For example, if you require postman which is a tool for API testing and verification, you get the option to download it. You can choose to explore under different sections. Steam is also available for Ubuntu, if you are a gaming fan and the creator of Steam is Canonical, the company behind Ubuntu.

    By clicking on the right top corner, you can get direct access to Wi-Fi, Bluetooth, screenshot etc. Click on settings to adjust your computer preference, for example, changing default apps.

    The Terminal

    While, GUI is good for regular desktop usage, the power of a Linux/Unix system comes from its terminal which offers extensive tools to take full advantage of the bare metal processors. To open a terminal in your Ubuntu System, you can either search for it or just type Ctrl+Alt+t ; which opens a gnome terminal in your system.

    By default, you land into your home directory. You can verify that by running pwd command (type pwd in terminal and press enter). You can run ls command to list the directory. To change directory, you can use cd target_directory_name (eg. cd Documents). Whatever you type in the terminal is case sensitive, something to keep in mind. Additionally, you can use tab for auto completion, or tab tab for auto suggestion. Using cd as is, will always move you to your home directory.

    Relative vs Absolute Path

    Since we are just starting, I won’t burden you with the details of Linux file systems, but allow me to tell you just this much, everything starts from / (known as root directory). The output of pwd command is /home/alok, so in a way it is implying that inside root (/) there is home directory and inside home there is alok directory. If I do ls. I can see other directories like Downloads, Destop etc.
    If I am present in my home directory (/home/alok) and I want to go to a subdirectory Desktop, I can do that in two ways:

    1. Using relative path: cd Desktop
    2. Using absolute path: cd /home/alok/Desktop

    Once I am inside Desktop directory and I decided to move to Downloads directory, I can do that again in two ways:

    1. Using relative path: cd ../Downloads
    2. Using absolute path: cd /home/alok/Downloads

    .. (Double dots) is a notation for parent directory while . (single dot) is the notation for current directory.

    Absolute path should be easier to understand if you understand that every directory can be traced from the root.
    To make sense of relative path you can try running ls ../.. and see what the output is.

    If you feel lost, don’t forget to do pwd to check your current directory. You can get the same information in your prompt text if it set correctly.

    Date with Who?

    If you are in terminal and doesn’t have access to GUI (maybe because, you ssh into the machine or you are in multi-user.target), in order to keep track of the time, you can use date command. The output of the date command will be human readable date and time along with the time zone.

    If you want to check the logged username, run whoami.
    The output will be a simple print of username.

    If you want to run something akin to Hello World program, then do
    echo “Hello World” .

    It’s a wrap

    Alright then, in this blog I tried to make you familiar with your GNU/Linux laptop.
    We started our long journey of Linux shell commands (what is shell? We shall know soon.)
    Let me list the commands we have learned so far:

    1. pwd (Print the working direcory)
    2. ls (list the content of the working directory)
    3. cd (change to a directory )
    4. whoami (Prints the username)
    5. echo “Hello World” (Prints Hello World in the terminal)

    In the next blog, I will go deeper into the known commands, and I will teach you how to get help and documentation, see you then.

  • All Hail Linux:1: Getting Started

    The Linux System Administrator

    My first encounter with Unix like command environment was in college computer lab. I was awestruck by a completely different way of using computers, but it was 6 years later that I became fully fledged Unix / Linux user. My first laptop crashed due to overhitting and coincidently, just 2 months later I was inducted into a project where the ETL jobs were scheduled on a proprietary Unix system. I was able to use my laptop only in windows recovery mode and hence I got an idea to run a lightweight 32bit Ubuntu on my laptop. This was a game changer for me, not only I loved Ubuntu, but I also got a free terminal to practice Linux commands, VIM etc. which was directly beneficial to me in my work, Fast forward to today, I have a Dell G15 (32 GB RAM and 20 Threads Processor) running Ubuntu, while my old HP laptop (8GB RAM and 4 Thread Processor) is running Arch Linux.

    I opted for and cleared the certification examination of Red Hat Certified System Administrator, in order to understand my laptop deeper which alleviated any fear of unstable system though to be honest, Linux is the most stable operating system. Our internet works on Linux System (remember the web links have forward slash, eg. amazon.in/gp/css/order-history); Android uses Linux kernel; Super Computers run Linux – All Hail Linux – need I go any further.

    The Secure System

    I am running, Arch Linux in my laptop and I have hosted it like a personal server which I can connect to from office or travel, to maybe run Apache Spark or something else. I don’t have a static IP, I just check my public IP to connect to my server, and I have opened port forwarding (22 on my Network to 22 on my Arch Linux), I didn’t know this, but it’s pretty insecure since 22 is a well-known SSH port, hence there were tens of attempt daily to hack onto my arch Linux which I got to know recently. I analyzed my logs and evidently there was zero success on my system. Despite thousands of attempts over six months, no one could hack into my system, all thanks to the default configuration of SSH-Server. I have made it more secure now and even changed the port mapping and since then there has been zero attempt, yet I am so proud that my system in its default state was able to fend of all attacks.

    The Objective of this Series

    While I was preparing for RHCSA exam, I realized that a lot of learning material for Linux is closely guarded which is a direct contradiction of Open Principal, Linux follows. Additionally, I felt bad that people teaching Linux are not running Linux themselves. Hence, I not only want to teach Linux but also wants to inspire folks to adopt Linux (unless they are using MAC), and a biproduct of these teachings can be a successful attempt to Red Hat or other system administration examinations.

    The Course Outline

    • Basic Linux Commands.
    • Linux Utility Commands.
    • File Management.
    • Process Management.
    • Storage Management.
    • Software Management.

    Installing Linux

    While there are many flavors of Linux available, the top two option I would suggest for beginners are:

    1. Ubuntu

    Ubuntu by canonical is designed for easy installation and desktop usage. Additionally, you get free Pro Subscription which entitled you to get support in case of issue. Download the installable ISO from Download Ubuntu Desktop | Ubuntu.

    2. Red Hat Enterprise Linux

    Red hat is widely used and hence you get a lot of software’s through Red Hat repos. You need to register your RHEL to download software’s through dnf command. Get your installable from Red Hat Developers | Red Hat Developer site after registering.

    Installation General Steps

    • Get hold of a bootable USB device, through apps like Rufus. On a linux machine, you can do the same yourself using dd command
      dd if=/path_to/osfile.iso of=/dev/sdx bs=4M
      where /dev/sdx is the USB device name, can be looked up by lsblk command (I am going to cover this later), but my point here is, you don’t need anything external as the tool is built into your linux system and it goes for a lot of things.
    • Ensure that secure boot is off then go to boot menu (usually pressing esc, f9, f11 or f12 works, check online for your own computer), and boot through the USB device created in step one.
    • The bootable USB device starts a Linux System, which also installs Linux on the computer HDD/SSD. Installer programs are fairly straightforward, and it may take some time (up to 20-30 minutes for the installation to finish). Once installation is done, the installer will ask you to remove the boot device and restart. Once you do that, Linux Kernel will be loaded from your computer’s hard drive (or SSD).
    • Boot sequence is something you may have to check if you don’t see your Linux loading. It can be accessed through the same boot menu and ensure that your Linux is on top of that so that it automatically loads, it is something to note though, likely it won’t be an issue.
    • Finally, there are tons of videos online, check it out for installation process to ensure that you are following the instruction correctly. Don’t forget to Back up your data, because installation will wipe out your disk clean.

    Final Thoughts

    Stick with me, I will try to do my best teaching and recording the sessions. I will run the commands on Bare Matel Ubuntu Instance or a Virtual RHEL instance. I won’t cover any Unix specific concept since my focus is on Linux.
    I am also not going to start a history session however if you feel like you need to know a bit about Linux origin story, the following video is one of the best on YouTube:
    https://youtu.be/obJOwEy62bk?si=81S26Pn5kiVBKQRI

    I think we are all set for the next blog where I deep dive into the Linux Terminal.

  • Jinja for DBT

    Jinja

    Jinja is a templeting framework built in python which goes beyond simple usage of fstring in python to generate dynamic strings.
    Quite often than not, we need our string to be populate during runtime, for example a welcome message like, “Hello John” for John and “Hello Jen Doe” for Jen Doe. Programmatically it’s not a complicate operation, one can always concatenate two or more string (same or of different datatypes). In python you can write something like this:

    python
    name = get_name() #Part of application logic
    message = "Hello " + name
    print(message)
    
    ## Better Approach
    message = f"Hello {name}"
    print(message)

    Using fstring, ensures that as a programmer you don’t have to be worried about concatenating wrong datatype, since in python concatenating a string with int will result into an error. This works well unless you are trying to concatenate large strings , which will be the case for SQL Statements.

    DBT

    SQL is a powerful way to abstract tabular data which has been saving tons of time and codes while trying to get insights from your data; dbt is going to save even more time since we write a lot of repeated SQL statements. Software Engineers like nothing more than spending their time to save other’s time. Dbt can use Jinja templeting easily to inserts it’s patterns in order to make reusable SQL statements and I am going to show you how.

    Jinja with DBT

    Referencing a Source or another Model

    In SQL we often need to query one or more tables to create a view or another table and more often then not, a lot of complicated transformations are required, let me show you an example.

    with base as 
    (SELECT idv, timestampv, attr1, attr2 
    FROM db.schema.customers),
    
    base_with_rank as
    (
    SELECT idv, timestampv, attr1, attr2, ROW_NUMBER() OVER (PARTITION BY idv ORDER BY timestampv desc) as rn
    )
    
    SELECT idv, timestampv, attr1, attr2
    FROM base_with_rank where rn = 1;

    In this case we are interested in the latest idv for one to many join, as this could be an append only data model, but the final code would explode with more business transformation and joins.

    In such cases, having one query with layers of CTEs is inefficient and harder to maintain. You can create create temp views though , in which case maintaining many objects and it’s references is going to be a challenge.
    With dbt, we can create separate models for upstream logic, and refer that in your main model.

    
    
    SELECT idv, timestampv, attr1, attr2
    FROM {{ ref('base_with_rank') }} where rn = 1 

    In this case, base_with_rank is the dbt model name, and we are referring it using ‘{{ }}’. Another useful pattern is referencing a source ({{ source(‘source_name’, ‘table_name’) }}) defined in your dbt project.

    Conditional Query

    While developing, imagine if you want to put a limit or top block, however you don’t want it to be included when the model runs in production, you can do something like below.

    SELECT idv, timestampv, attr1, attr2
    FROM {{ ref('base_with_rank') }} where rn = 1
    {% if this.database != 'your_prod_db' %}
    LIMIT 100
    {% endif %}  

    In the query, this.database is used to access the current database, other useful variables are this.schema and this for the current model. {% code here %}, lets you insert flow controls.

    Declaring a variable and Looping

    We can declare variables using set keyword, Jinja supports, numbers, strings, list and map.

    {% set table_details = {"sys_log": "logv", "tmp_log": "tempv", "boot_log" : "blog"} %}
    
    {% for k,v in table_details.items() %}
    SELECT {{ v }} from {{ k }}
    {% if not loop.last %}
    UNION ALL
    {% endif %}
    {% endfor %}
    
    
    -- SQL Compilation of above
    SELECT logv from sys_log
    
    UNION ALL
    
    SELECT tempv from tmp_log
    
    UNION ALL
    
    SELECT blog from boot_log

    With looping, we can save yourselves from writing tons of repeated codes especially in case of union and case statements. loop.last pattern allows me to write a separate logic for the end (useful in case of comma or keywords like union).
    Just like map, we can loop over a list.

    Miscellaneous Patterns

    -- To stop additional lines use minus sign
    {%- your logic -%}
    
    -- To put comment
    {#
    This is a comment and it won't be rendered.
    #}
    
    -- else with if
    {% set env = 'dev' %}
    {% set time = 10 %}
    {%- if env == 'dev' and time < 10 -%}
    A logic
    {%- elif env == 'dev' and time >= 10 -%}
    A different logic
    {%- else -%}
    This one if nothing else
    {%- endif -%}

    Conclusion

    Use Jinja in your dbt code to make your query compact and modular. If you come from SQL background and new to dbt, these pattern will help you identify SQL from the dbt code. Remember, everything is compiled into SQL hence understanding Jinja will help you understand the final SQL that will be executed on your data platform.