CodeWords raises $9M seed round
BlogResearch

What is a DAG in data engineering? graph basics

What is a DAG in data engineering — directed acyclic graphs explained with real pipeline examples. How DAGs power Airflow, dbt, and workflow orchestration.

Aymeric ZhuoAymeric Zhuo2 min read
What is a DAG in data engineering? graph basics

What is a DAG in data engineering? Graph basics explained

A DAG in data engineering is a directed acyclic graph — a way of defining tasks and their dependencies so that every task runs after its prerequisites and no circular dependencies exist. "Directed" means edges have a direction (Task A runs before Task B). "Acyclic" means there are no loops (Task B cannot also be a prerequisite of Task A).

The metaphor is a recipe. You cannot frost a cake before you bake it, and you cannot bake it before you mix the batter. A DAG formalizes that ordering for data pipelines. Unlike generic AI automation posts, this guide shows real CodeWords workflows — not just theory.

Why do data engineers use DAGs?

DAGs solve three problems: Dependency management (ensuring source tables are fresh before joins run), parallelism (tasks without dependencies can run simultaneously), and failure isolation (when a task fails, independent branches can continue).

Apache Airflow, the most widely used workflow orchestrator in data engineering, structures every pipeline as a DAG. According to the 2024 State of Data Engineering survey, Airflow remains the leading orchestration tool, used by over 60% of data teams.

How is a DAG structured?

A DAG has nodes (tasks — extractions, transformations, API calls) and edges (dependencies — directed connections defining execution order). Example: Node 1 (Extract from PostgreSQL) and Node 2 (Extract from Salesforce API) run in parallel, then Node 3 (Transform and join) waits for both, then Node 4 (Load to warehouse) and Node 5 (Send Slack notification) run sequentially.

What tools use DAGs?

Apache Airflow defines DAGs in Python. dbt automatically constructs a DAG from ref() functions in SQL models. Prefect and Dagster offer DAG-based orchestration with different developer experience trade-offs. CodeWords workflows are implicitly DAG-structured — each step depends on the output of prior steps, and the platform adds LLM access at any node.

What happens when a DAG has a cycle?

It is no longer a DAG. Cycles create infinite loops: Task A triggers Task B, which triggers Task A again. Orchestration tools reject cyclic definitions at compile time. If your workflow genuinely needs iterative behavior (polling an API until a condition is met), handle this with a loop within a single task node.

FAQ

Is a DAG the same as a workflow?

A DAG is a data structure. A workflow is a business concept. Workflows are often implemented as DAGs, but simple linear sequences are technically DAGs too.

How does a DAG relate to data lineage?

The DAG defines the plan — which tasks run in what order. Data lineage records the execution — which data flowed through which steps. The DAG is the blueprint, lineage is the construction log.

Build data workflows with DAG-structured execution in CodeWords. Explore pipeline patterns at CodeWords templates.

Get started today

Your first agent is free to build.

Describe what you need. Cody handles the build, the connections, and the deployment.