Understanding How Database Indexing Works

As databases grow larger, efficient data retrieval becomes essential for performance. One of the critical mechanisms to enhance data access speed is indexing. In this blog post, we’ll explore the concept of database indexing, how it works, and its relevance in optimizing queries.

What is Database Indexing?

Indexing in databases is akin to having an index in a book. It helps to quickly locate information without the need to search through every entry individually.

  • Definition: An index is a sorted data structure that holds the values of a specific field in a table and points to the corresponding records.
  • Purpose: By creating an index, you allow the database engine to perform faster searches using efficient algorithms like binary search.

Why is Indexing Needed?

When data is stored on disk, it’s organized in blocks, which makes access operations crucially dependent on how efficiently these blocks can be searched. Here’s a breakdown:

  • Data Structure: Disk blocks are similar to linked lists, containing data and pointers to the next block.
  • Search Efficiency:
    • Without Indexing: Searching through a non-sorted field employs a linear search, needing an average of (N + 1) / 2 block accesses, where N is the number of blocks.
    • With Indexing: A sorted field allows for a binary search, drastically reducing average block accesses to log2 N.

How Does Indexing Work?

To understand how indexing operates, let’s consider a simple database schema:

Sample Database Schema

Field name       Data type      Size on disk
id (Primary key) Unsigned INT   4 bytes
firstName        Char(50)       50 bytes
lastName         Char(50)       50 bytes
emailAddress     Char(100)      100 bytes

Example Scenarios

  1. Linear Search on an unsorted field (e.g., firstName):

    • With 5 million records, the average search would involve 1 million block accesses - a significant drain on resources.
  2. Binary Search on a sorted field (e.g., id):

    • Here, you only require about 20 block accesses, showcasing the efficiency gained through indexing.

The Impact of Indexing on Query Performance

When an index is created for a specific field, the new structure consists of the field values and pointers to the original records:

Index Schema for firstName

Field name       Data type      Size on disk
firstName        Char(50)       50 bytes
(record pointer) Special        4 bytes
  • This means there are fewer records to sift through:
    • Indexed searches now only require about 20 block accesses using binary search versus 1 million without indexing.

When Should Indexing Be Used?

While indexing improves search speeds, it also requires additional space and can impact insert and delete operations. Here are some key considerations:

  • Disk Space: Keep in mind that creating indices increases the total space used significantly.
  • Field Cardinality: High cardinality (uniqueness) of field values enhances indexing effectiveness. For low cardinality fields, the performance gains may not justify the additional resource usage.
  • Avoid Over-Indexing: Too many indices can lead to excessive disk usage and slow down write operations.

Conclusion

Database indexing is an essential component for managing large datasets efficiently. By thoughtfully applying indexing strategies, you can significantly enhance search performance while being mindful of the trade-offs involved.

By understanding these principles, database administrators and developers can optimize their databases effectively, ensuring they harness the true power of indexing to meet their data retrieval needs.