AIRFLOW

This course provides a practical introduction to Apache Airflow, an open-source platform used to programmatically author, schedule, and monitor data workflows. Students will learn how to build, manage, and optimize data pipelines using Directed Acyclic Graphs (DAGs), integrate Airflow with cloud services and databases, and apply best practices for automation and monitoring in modern data engineering.

Why should I join?

Airflow helps you manage:

Cloud workflows

ETL/ELT processes

Data movement between systems

Machine learning pipelines

Please contact us if you have any questions relating to any of the vika technologies features.

Course Content

Oracle Introduction
3 Topics
What is Airflow?
Airflow vs other schedulers (Cron, Luigi)
Airflow architecture (Web Server, Scheduler, Worker, Metadata DB)
Unix Commands
3 Topics
Installing Airflow with Pip, Docker, or in the Cloud
Understanding the Airflow UI
Airflow CLI basics
OS information
4 Topics
What is a DAG?
Creating your first DAG in Python
DAG arguments: start_date, schedule_interval, default_args
Backfilling and catchup
Copy and move commands using cp, mv command
4 Topics
• PythonOperator, BashOperator, DummyOperator
Sensors (e.g., FileSensor, ExternalTaskSensor)
Branching and conditional tasks
Task retries, SLA, and timeouts
Compression and un-compression
3 Topics
Inter-task communication using XComs
Task dependencies with set_upstream() / >> / <<
Dynamic task mapping (Airflow 2.3+)
Changing file permission
4 Topics
Scheduling vs Triggering
DAG runs and task instances
Monitoring and alerting with email or Slack
Logs and error handling
Scheduling the job
4 Topics
Running Airflow with Docker or Kubernetes
Airflow on AWS (MWAA), GCP (Cloud Composer), Azure
Managing connections and secrets
Scaling with CeleryExecutor or KubernetesExecutor
Editor command
4 Topics
S3, GCS, Azure Blob storage
Snowflake, BigQuery, Redshift
Databricks, Spark, Hadoop
SQL databases (PostgreSQL, MySQL)
Best Practices and Advanced Concepts
4 Topics
• DAG versioning and modularization
CI/CD with Airflow
Airflow plugins and custom operators
Monitoring DAG health
Final Project Ideas
4 Topics
Daily ETL pipeline (CSV → Clean → Load to Database
ML model retraining pipeline
Data warehouse pipeline with Snowflake or BigQuery
Real-time file ingestion using sensors
Includes
10 Lessons
37 Topics