Topological Sort

December 31, 2025

Overview

Topological sort orders the nodes of a directed graph so that for every edge u → v, node u appears before node v. It’s the canonical solution for prerequisite-driven problems: build systems, dependency graphs, task pipelines, and course schedules.

Topological sort only applies to Directed Acyclic Graphs (DAGs). If a cycle exists, no valid ordering is possible.

Table of Contents

When You Need Topological Sort
Key Nuances (That Often Trip People Up)
Two Standard Algorithms
Kahn vs DFS: Trade-offs
Complexity
Handling Non-Unique Orders (Determinism)
Common Pitfalls (and Fixes)
Practical Example: Course Schedule
Beyond Basics: Scheduling With Parallelism
Alternatives and Related Concepts
Conclusion

When You Need Topological Sort
^

  • Build / compilation order
  • Task scheduling and workflows
  • Course prerequisites
  • ETL pipelines and DAG schedulers
  • Import/module dependency resolution
  • Spreadsheet recalculation order

Key Nuances (That Often Trip People Up)
^

  • Order is not unique — many valid answers may exist
  • Cycle detection is mandatory in real systems
  • Disconnected graphs are valid — results interleave components
  • Edge direction mattersu → v means u must come first
  • Adjacency lists are almost always the right representation

Two Standard Algorithms
^

1) Kahn’s Algorithm (In-degree / BFS-style)

  1. Compute in-degree for every node
  2. Enqueue all nodes with in-degree 0
  3. Pop nodes, reduce neighbors’ in-degree
  4. If nodes remain unprocessed → cycle exists
from collections import deque, defaultdict

def topo_sort_kahn(nodes, edges):
    adj = defaultdict(list)
    indeg = {n: 0 for n in nodes}

    for u, v in edges:
        adj[u].append(v)
        indeg[v] += 1

    q = deque([n for n in nodes if indeg[n] == 0])
    order = []

    while q:
        u = q.popleft()
        order.append(u)
        for v in adj[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)

    if len(order) != len(nodes):
        raise ValueError("Graph has a cycle")

    return order

Nuance: Use a priority queue for deterministic output.

2) DFS Postorder (Stack-style)

Idea

  • DFS the graph
  • Add nodes to output after exploring children
  • Reverse at the end
  • Detect cycles via recursion stack
from collections import defaultdict

def topo_sort_dfs(nodes, edges):
    adj = defaultdict(list)
    for u, v in edges:
        adj[u].append(v)

    visiting = set()
    visited = set()
    order = []

    def dfs(u):
        if u in visiting:
            raise ValueError("Cycle detected")
        if u in visited:
            return
        visiting.add(u)
        for v in adj[u]:
            dfs(v)
        visiting.remove(u)
        visited.add(u)
        order.append(u)

    for n in nodes:
        if n not in visited:
            dfs(n)

    order.reverse()
    return order

Kahn vs DFS: Trade-offs
^

AspectKahn’s AlgorithmDFS Postorder
Cycle detectionRemaining nodes after processingBack-edge detection
DeterminismEasy with heapDepends on traversal order
Scheduling modelNatural “ready queue”Less explicit

Complexity
^

  • Time: O(V + E)
  • Space: O(V + E)

Handling Non-Unique Orders (Determinism)
^

  • Use a min-heap with Kahn’s algorithm
  • Sort adjacency lists for DFS

Trade-off: determinism often costs log V.

Common Pitfalls (and Fixes)
^

PitfallFix
Isolated nodes missingInitialize all nodes explicitly
Wrong edge directionUse prereq → dependent
Silent cyclesValidate full traversal

Practical Example: Course Schedule
^

Valid outputs are often multiple, and that’s expected.

Beyond Basics: Scheduling With Parallelism
^

Nodes with in-degree 0 can run in parallel. Kahn’s algorithm directly exposes this.

Alternatives and Related Concepts
^

  • Cycle detection only → SCC algorithms
  • Scheduling with durations → longest path in DAG
  • Priority ordering → weighted topological sort

Conclusion
^

Topological sort is foundational for dependency-driven systems. Understanding cycle handling, non-uniqueness, and algorithm trade-offs is what turns it from an interview topic into a production-grade tool.

Key Takeaways

  • Only DAGs can be topologically sorted
  • Ordering is often not unique
  • Kahn’s algorithm fits scheduling pipelines
  • DFS fits graph-analysis workflows