Database Indexing Explained: B-Tree, Hash, Geospatial Indexes, and Inverted Indexes

If query performance is the difference between a product that feels instant and one that feels broken, then database indexing is one of the most important topics a software engineer, data engineer, or backend developer can master. An index is not just a database feature; it is a deliberate data-access strategy.

This guide explains what is indexing, how databases locate rows when no index exists, how indexes reduce query cost, and where different index structures fit in practice. It also goes beyond generic tutorials by separating classic row-oriented indexes from spatial indexing strategies and inverted indexing for search-heavy systems. The goal is accuracy first, but also practical clarity so you can make better design choices in production.

What Is Indexing in a Database?

A database index is a separate data structure that helps the engine find matching rows faster than scanning the entire table. PostgreSQL describes indexes as a way to find and retrieve specific rows much faster, while also noting that indexes add write and storage overhead and therefore must be used deliberately. In other words, an index is a performance optimization, not a free feature.

A useful mental model is this: a table is like the full content of a book, while an index is like the book’s back-of-book lookup structure. Without the index, the database may need to inspect row after row. With the right index, it can narrow the search to a much smaller set of locations before touching the actual data pages.

How to Find a Row in a Database Without an Index

Without an index, the database typically uses a sequential scan or full table scan, meaning it reads table data block by block and checks rows against the filter condition. PostgreSQL’s EXPLAIN documentation explicitly distinguishes sequential scans from index scans when showing how a query will be executed. This is why a query like WHERE email = 'a@b.com' can become expensive on a large table if email is not indexed.

That does not mean a sequential scan is always bad. If a query touches a large percentage of rows, databases may prefer scanning the table directly because jumping through an index and then back to table pages can cost more random I/O than reading data in order. PostgreSQL documentation notes this tradeoff explicitly in partitioning guidance.

How Do Indexes Speed Up Queries?

Indexes speed queries by organizing lookup keys in a structure that is cheaper to traverse than scanning every row. Instead of evaluating the predicate against the whole table, the engine first searches the index, gets matching row locations, and then fetches only those rows from the table. PostgreSQL’s index-scanning documentation explains that an index scan returns tuple IDs for matching rows, after which the engine fetches the corresponding tuples from the parent table.

The performance gain depends on three things: the index algorithm, the selectivity of the predicate, and whether the database can satisfy the query from the index alone or must visit the base table as well. PostgreSQL also supports index-only scans in some situations, which can avoid heap access when visibility requirements are satisfied.

Core Database Indexing Techniques You Should Actually Understand

B-Tree Index

The B-Tree index is the default and most common general-purpose index in major databases. PostgreSQL says B-Tree indexes support equality and range queries on data that can be sorted, and MySQL documentation similarly notes that the B-tree structure can quickly find a specific value, a set of values, or a range of values for predicates such as =, >, <=, BETWEEN, and IN. That is why B-Tree is usually the first choice for primary keys, unique keys, timestamps, prices, ordered IDs, and many common WHERE clauses.

Why it works well:

It preserves sorted order.
It supports equality and range filtering.
It can also help with ordered retrieval, such as ORDER BY created_at.

Typical use cases:

WHERE user_id = ?
WHERE created_at BETWEEN ...
ORDER BY created_at DESC
unique constraints and primary-key style access patterns.

Where it is not ideal:

highly specialized text search
pure equality lookups on some engines where hash indexing is more appropriate
complex spatial geometry predicates.

Hash Index

A Hash Index stores a hash code derived from the indexed value rather than maintaining sorted order. PostgreSQL documentation states that hash indexes support only simple equality comparisons, are single-column, and do not provide uniqueness checking. PostgreSQL also explains that hash indexes can be attractive for large equality-heavy workloads because they access bucket pages directly instead of descending a tree.

This leads to a clean rule: use a Hash Index only when your workload is dominated by exact-match predicates such as WHERE session_id = ?, and only after confirming your database engine actually implements and optimizes hash indexes well. For many production systems, B-Tree still wins by being more flexible, because it covers equality plus ranges and ordering.

What Hash Index does well:

exact equality lookups
unsorted data types that do not need range semantics.

What it does not do well:

range scans
sorted retrieval
prefix and interval-style queries.

Geospatial Indexing: The Three Models Engineers Commonly Confuse

Spatial indexing deserves its own discussion because latitude/longitude, polygons, and shapes do not behave like ordinary scalar values. A normal B-Tree on a geometry column is usually not what you want. PostGIS explicitly notes that when geometry is bound to GiST, the default spatial index behaves as an R-Tree, and that creating a standard PostgreSQL B-Tree on a geometry column will not help spatial queries.

Geohashing

Geohashing is a spatial encoding strategy that converts geographic coordinates into string-like cells representing progressively smaller bounding areas. In practice, it is widely used in search and distributed geo workloads for bucketing, partitioning, and proximity pre-filtering. Elastic’s geohash-grid aggregation shows this model directly: geo fields can be aggregated into geohash-based cells at different precision levels.

The important technical nuance is this: Geohashing is excellent for approximate spatial grouping, nearby search pre-bucketing, and partitioning by location, but it is not the same thing as an exact geometry index for arbitrary polygon intersection. It is best thought of as a grid-encoding strategy that narrows the search space before precise geometry checks.

Use Geohashing when:

you need fast nearby bucketing
you are clustering points geographically
you want location-based partition keys or coarse geospatial filtering.

Quadtrees

A Quadtree recursively divides a 2D space into four quadrants, then subdivides again where density or complexity requires it. Oracle documentation shows that spatial indexes can be implemented as quadtree indexes, although Oracle strongly encourages R-Tree for most spatial workloads today. GEOS documentation also describes quadtree as a structure for efficient querying of 2D rectangles, typically returning candidates that then need exact secondary filtering.

That is the key architectural point: Quadtrees are good for hierarchical 2D decomposition, especially point-heavy or region-based workloads, but they often act as a primary filter rather than the final truth for exact spatial predicates. In modern production databases, they are more important as an algorithmic concept than as the default index you create every day.

Use Quadtrees when:

your data is naturally 2D and benefits from recursive tiling
you are building custom geo engines, map services, or point-region lookup systems
you want adaptive subdivision rather than fixed buckets.

R-Trees

R-Trees are the classic answer for indexing rectangles, bounding boxes, and more general spatial objects. PostgreSQL’s GiST documentation says that R-trees and other indexing schemes can be implemented through GiST, while PostGIS states that a GiST geometry index behaves as an R-Tree. SQLite’s R*Tree module likewise documents its support for R-tree-family indexing for spatial searching and interval overlap patterns.

This is why R-Trees remain the most important spatial index to understand for practical database work. They are designed around minimum bounding rectangles, making them a strong fit for predicates such as intersects, overlaps, containment, and window queries. Oracle’s current spatial guidance also recommends spatial R-tree indexes for efficient access.

Use R-Trees when:

you store polygons, lines, bounding boxes, or mixed geometry
you need spatial predicates like intersects, within, overlap, or nearest-neighbor style workflows
you are using PostGIS, SQLite R*Tree, or Oracle Spatial-style geometry systems.

Inverted Index: The Right Model for Search, Arrays, and Multi-Value Fields

An Inverted Index flips the lookup direction. Instead of mapping one row to one value, it maps each term or component value to the rows or documents that contain it. PostgreSQL’s GIN documentation defines GIN as a generalized inverted index for composite values, such as documents containing words or arrays containing elements. PostgreSQL’s text-search docs further note that GIN indexes are preferred for text search because they keep an entry for each lexeme with a compressed list of matches.

This is fundamentally different from B-Tree and Hash Index. If your query is “find rows where this word appears,” or “find documents containing these terms,” or “find rows whose JSON/array field contains this value,” an Inverted Index is often the correct structure. In search engines, the same principle powers fast full-text retrieval at scale. Elastic documentation also references information added to the inverted index for searchable field types.

Use Inverted Index when:

you need full-text search
you search arrays, tags, JSON elements, or tokenized content
one row contains many searchable components.

When to Choose Which Index

If your column is scalar and sortable, start with B-Tree. If your workload is dominated by exact-match equality and your engine meaningfully supports hash indexing, evaluate Hash Index. If your problem is about geometry and location, think in terms of R-Trees, with Geohashing and Quadtrees as spatial partitioning or filtering strategies depending on the engine and use case. If your data is documents, tokens, tags, arrays, or lexemes, an Inverted Index is usually the right mental model.

A practical summary:

B-Tree: best default for equality, ranges, and ordering.
Hash Index: exact equality only.
Geohashing: coarse geo partitioning and bucketing.
Quadtrees: recursive 2D decomposition, often as a primary filter.
R-Trees: practical geometry indexing for spatial predicates.
Inverted Index: text, arrays, JSON, and document search.

What Most Engineers Get Wrong About Database Indexing

The most common mistake is treating indexing as a checklist item instead of a workload decision. Adding an index to every column is not optimization; it is often a write penalty waiting to happen. PostgreSQL explicitly warns that indexes add system overhead. The second mistake is picking a familiar index type for the wrong data shape, such as using scalar-thinking for geometry or using B-Tree for search-heavy text workloads that really need an Inverted Index.

The best indexing strategy starts with query patterns:

What predicates are used most often?
Are the lookups equality, range, containment, proximity, or keyword search?
Is the engine scanning a small subset or a large percentage of the table?
Are you optimizing reads only, or balancing reads with inserts and updates?

If you want help choosing the right index design for PostgreSQL, MySQL, search systems, or geospatial workloads, TheCodeWizard team can work with you in a 1:1 session or help implement it directly in your stack. Reach out through TheCodeWizard Services page if you want hands-on guidance, query-plan reviews, or production indexing support tailored to your workload.