Detecting deadlocks for Databases
August 12, 2025
Databases are so magical, and I wanted to learn more about the inner workings. I was reading up on locking, MVCC, and concurrency control, and fell down a deadlock rabbit hole. Once I had the problem statement, I wrote two approaches.
Problem statement
You’re given a set of locks and a transaction ID. Each lock has:
- holders: array of transaction IDs currently holding the lock
- waitQueue: array of transaction IDs waiting for the lock (FIFO)
A simple Python-ish struct:
class Lock:
def __init__(self, id, holders, waiters):
self.id = id
self.holders = holders
self.waiters = waiters
Each transaction may wait on at most one lock at a time. If a transaction is waiting on a lock, it's blocked by all transactions in that lock's holders.
Task: Given the set of locks and a target transaction ID, determine whether that transaction is currently in a deadlock.
Lock-based dependency graph
First pass: model locks as nodes and ask, “can the locks I need eventually become unlockable?” If the lock dependency chain has a cycle, I'm stuck.
How I do it:
- Get all the locks the target transaction is waiting on (roots).
- Build a map from txn to locks it's currently waiting on.
- Run DFS over locks; follow: lock |> its holders |> the locks those holders are waiting on.
Cycle check:
- Mark a lock as “visiting” (gray) when you enter it.
- If you see a gray lock again, that's a cycle |> deadlock.
- When done with a lock, mark it “visited” (black).
# We use a DFS with white/gray/black coloring encoded as:
# visited[lock_id] = False -> GRAY (currently exploring this lock)
# visited[lock_id] = True -> BLACK (finished exploring this lock)
# Unvisited locks are simply absent from 'visited'.
def detect_cycle_for_locks(lock, visited, graph) -> bool:
# Mark current lock as GRAY
visited[lock.id] = False
# For every transaction that currently HOLDS this lock...
for holder in lock.holders:
# ...look up every lock that holder is WAITING ON
# 'graph' maps txn_id -> List[Lock] it is waiting for
for lock_waiting in graph.get(holder, []):
# If we see a dependent lock that is already GRAY,
# we found a back edge |> cycle |> deadlock
if lock_waiting.id in visited and visited[lock_waiting.id] is False:
return True
# If not visited, recurse into that dependent lock
if lock_waiting.id not in visited:
if detect_cycle_for_locks(lock_waiting, visited, graph):
return True
# All dependencies explored; mark current as BLACK (done visiting)
visited[lock.id] = True
return False
def detect_deadlock_using_lock_graph(txn_id: str, locks: Dict[str, Lock]) -> bool:
# 1) Gather all locks that the target txn is CURRENTLY waiting on (roots)
locks_waiting_on = []
# 2) Build txn -> locks it is waiting on (the dependency graph we walk)
txn_id_to_wanted_locks = {}
for lock in locks.values():
# Root set: locks our txn is waiting for right now.
if txn_id in lock.waiters:
locks_waiting_on.append(lock)
# Populate wait map: each waiter txn -> list of locks it wants
for waiter in lock.waiters:
if waiter not in txn_id_to_wanted_locks:
txn_id_to_wanted_locks[waiter] = []
txn_id_to_wanted_locks[waiter].append(lock)
# 3) Run DFS from every root lock; if any DFS sees a cycle, it's a deadlock
for waiting_lock in locks_waiting_on:
visited = {}
if detect_cycle_for_locks(waiting_lock, visited, txn_id_to_wanted_locks):
return True
# No cycle found along any root path
return False
Transaction based dependency graph
The lock graph works, but it's more work than needed when I only care about one txn. A cleaner path is to build a wait-for graph: nodes are txns; edges go from a waiting txn to the txn(s) blocking it. Locks are just how we derive these edges.
Why this is nicer:
- Smaller graph (one node per active txn).
- Edges have direct meaning (“who blocks me”).
- Easy to enforce FIFO by adding edges to earlier waiters.
Basic idea:
- Build txn to blockers from the lock table.
- Add edges from later waiters to earlier waiters for FIFO.
- Run DFS from the target txn; back-edge |> deadlock.
# DFS over the transaction wait-for graph
def detect_cycle_for_transactions(txn_id, visited, blockers) -> bool:
visited[txn_id] = False # GRAY
for next_txn_id in blockers.get(txn_id, ()):
# back-edge to a GRAY node |> cycle (deadlock)
if next_txn_id in visited:
if visited[next_txn_id] is False:
return True
else:
if detect_cycle_for_transactions(next_txn_id, visited, blockers):
return True
visited[txn_id] = True # BLACK
return False
def detect_deadlock_using_transaction_graph(txn_id, locks) -> bool:
# Build wait-for graph: txn -> {txns blocking it}
transaction_blockers = {}
for lock in locks.values():
for waiter in lock.waiters:
if waiter not in transaction_blockers:
transaction_blockers[waiter] = []
transaction_blockers[waiter].extend(lock.holders)
# FIFO fairness: earlier waiters block later waiters
for i, w in enumerate(lock.waiters):
transaction_blockers[w].update(lock.waiters[:i])
# Run DFS from the given transaction
return detect_cycle_for_transactions(txn_id, {}, transaction_blockers)
Comparing the 2 algorithms:
- Lock graph: more literal (resources as nodes), but bigger; great for reasoning about resource chains.
- Transaction graph: smaller, faster, and directly encodes “who's waiting on whom.”