How Neo4j Stores Graphs on Disk
systems-internals Apr 2024 1 min
Why storage internals matter
Understanding how your database stores data tells you exactly why some queries are fast and others are not. Neo4j is no exception.
The native graph storage model
Neo4j uses a doubly-linked list per node to store its relationships. Each node record holds a pointer to its first relationship; each relationship record holds pointers to the next relationship for both its start and end nodes.
This is why Neo4j’s local traversal is O(1) per hop — it never scans a full relationship table. It just follows the pointer.
Property storage
Properties are stored in a separate file, not in the node or relationship records. Each record holds a pointer to the first property in a linked list. This keeps node and relationship records small and fixed-width, which makes random seeks fast.
What this means for queries
- Pattern matching (
MATCH (a)-[:KNOWS]->(b)) is fast because it walks pointers — no joins. - Full-graph scans are slow for the same reason — there is no index on relationship type across the whole graph by default.