What is partition and its types?

What is Partition?

Partitioning is the process of logically dividing a large database table into smaller, more manageable parts. The goal is to improve query performance, management, and availability of very large tables and indexes. Partitioning breaks up data into smaller, faster-to-access pieces, allowing queries to work on just a portion of the data rather than the entire table.

The purpose of partitioning is to aid in maintenance of large tables and indexes by letting you operate on portions of data more efficiently. Partitioning is useful when dealing with data that has logical divisions, like data sorted by date. It splits the table and index into smaller parts that can be managed independently. This makes queries faster since they only access the relevant portions of a table.

Partitioning is commonly used in data warehousing environments, large OLTP applications, and systems that manage time series/historical data. It helps support very large tables and indexes by letting you add, drop, and operate upon partitions instead of the entire table. Partitioning large tables improves scalability and performance.

Types of Partitioning

There are three main types of partitioning:

Horizontal Partitioning

With horizontal partitioning, also known as sharding, the rows of a table are spread across multiple partitions. Each partition contains a subset of the data and is stored on a separate database server instance. This allows distributing load and storage across servers. Horizontal partitioning can provide scalability by enabling parallel processing across partitions. However, queries accessing data from multiple partitions will require coordination across servers [1].

Vertical Partitioning

With vertical partitioning, the columns of a table are divided into multiple partitions. Each partition stores a subset of the columns for all rows. This approach minimizes storage and I/O requirements by accessing only necessary columns for queries. However, joins may be required to reconstruct the original table. Vertical partitioning can improve performance for specific workloads but may have overhead for mixed workloads [2].

Hybrid Partitioning

Hybrid partitioning combines both horizontal and vertical partitioning. A table is first vertically partitioned, then each vertical partition is horizontally partitioned. This provides the benefits of both approaches but increases complexity. Careful planning is required to maximize performance gains from the increased granularity. Hybrid partitioning can provide performance optimization when workloads and storage patterns are well understood.

Horizontal Partitioning

Horizontal partitioning, also known as sharding, is a database partitioning method where each partition represents a portion of the rows of a table. The partitions are spread across different databases or database servers.

In horizontal partitioning, each partition contains the same columns but fewer rows compared to the original table. For example, a table with 1 million rows could be partitioned into 4 horizontal partitions, each containing around 250,000 rows.

Some key advantages of horizontal partitioning include:

  • Improved performance since queries access smaller partitions
  • Easier expansion by adding new partitions on new servers
  • Failure isolation since a failure impacts only a subset of partitions

Some potential disadvantages include:

  • Overhead of partition management
  • More complex queries across partitions
  • Skewed partitions may create hotspots

Overall, horizontal partitioning helps scale databases by spreading data across multiple servers. It works best when the tables have a large number of rows.

Vertical Partitioning

Vertical partitioning, also known as vertical splitting, involves dividing tables into multiple tables based on columns or fields. In this partitioning method, each partition contains a subset of columns from the original table. All records still belong to each newly divided table.

For example, a table containing customer details like name, address, orders, etc. can be partitioned vertically into two tables – one containing name and address columns, while the other contains order details. This segregates columns into logical groupings.

The key advantages of vertical partitioning are:

  • Improved query performance since queries only access necessary columns.
  • Reduced table sizes and index sizes as columns are distributed across multiple tables.
  • Easy maintenance as changes need updating in fewer columns and tables.

The disadvantages are:

  • Joins required between tables to reconstruct complete records.
  • Data duplication if schemas not designed properly.
  • Constraints and relationships difficult to enforce across tables.

Hybrid Partitioning

Hybrid partitioning combines aspects of both horizontal and vertical partitioning. As explained in Hybrid Partitioned Tables – an introduction with examples, hybrid partitioning stores some partitions internally in the database while other partitions are stored externally. This enables large partitions to be stored outside the database while smaller, more active partitions are kept inside.

The key advantage of hybrid partitioning is flexibility. You can optimize storage and performance by placing partitions appropriately. Large, infrequently accessed historical data can be stored cost-effectively in external storage like Hadoop or object storage. More active recent data remains in the database for fast query performance (Oracle, 2019).

A potential disadvantage is increased complexity in managing partitions across different storage systems. Database queries need to seamlessly access both internal and external partitions. This requires integration between the database and external storage. There is also a risk of creating silos if not managed properly.

Partitioning Criteria

Partitioning criteria refers to how the partitions are created in a database table. The two main aspects of partitioning criteria are partitioning attributes and partitioning methods.

Partitioning attributes define which column or columns will be used to partition the data. This is usually a column that has evenly distributed values that can be used to split the data, such as a timestamp or ID column. The partitioning attribute should be chosen carefully based on the query patterns and data distribution.

Common partitioning methods include:

  • Range partitioning – Data is partitioned by ranges of values e.g. date ranges
  • List partitioning – Data is partitioned by a list of discrete values e.g. region or country codes
  • Hash partitioning – Data is partitioned using a hash function on the partitioning key
  • Round robin partitioning – Data is distributed evenly but randomly across partitions

The partitioning method will determine how the data is actually separated into partitions. This should align with the chosen partitioning attribute and expected query patterns.

Partitioning in SQL

Partitioning helps manage large tables and indexes by dividing them into smaller, more manageable pieces called partitions. In SQL, you can create partitions when creating a table using the PARTITION BY clause (Oracle, n.d.).

The basic syntax for creating a partitioned table in Oracle SQL is:

CREATE TABLE table_name 
(
  column1 datatype,
  column2 datatype,
  ...
)
PARTITION BY [RANGE|LIST|HASH] (column_list)  
( 
  PARTITION partition_name VALUES (value_list),
  PARTITION partition_name VALUES (value_list),
  ...
);

This creates partitions on the specified column(s) using either range, list, or hash partitioning. The partitions are defined by specifying names and values.

Once created, partitions can be managed using statements like ALTER TABLE. For example, you can ADD, DROP, SPLIT, MERGE, MOVE, TRUNCATE, EXCHANGE or MODIFY existing partitions (Oracle, n.d.). These enable dynamically maintaining partitions as data volumes change.

Overall, partitioning provides a way to physically organize tables and indexes into smaller, more manageable pieces to enhance manageability and performance for large database applications.

Source:
Oracle. (n.d.). Partitioning in Oracle Database 11g. Oracle. https://docs.oracle.com/cd/E11882_01/server.112/e25789/chapter11.htm#AUTOSTB16

Partition Pruning

Partition pruning is an optimization technique in Oracle databases that eliminates unnecessary partitions when processing SQL statements. According to Partition Pruning with Example – ORACLE-HELP, partition pruning greatly improves query performance because Oracle only has to access the partitions that contain the relevant data.

Partition pruning works by analyzing the SQL statement’s selection criteria and only accessing the partitions that contain matching rows. For example, if a query has a WHERE clause filtering on a date column that is used for partitioning, Oracle will determine which partition contains the relevant date range and only access that partition. This can greatly reduce I/O and improve performance compared to a full table scan.

Overall, partition pruning is an optimization that allows Oracle to only access the minimum number of partitions required to satisfy a query’s selection criteria. By eliminating unnecessary partition scans, it provides substantial performance gains, especially for large, partitioned tables. According to What is Partition Pruning in Oracle? – Stechies, partition pruning is a key reason why partitioning is so beneficial in Oracle databases.

Partitioning Use Cases

Partitioning can provide performance and manageability benefits in certain use cases. Some examples of when partitioning may be useful include:

Managing large tables – Partitioning can help break up a large table into more manageable chunks and avoid issues like index bloat on a giant table. This can improve query performance.

Purging old data – If old data can be dropped entirely, partitioning by date allows you to quickly drop old partitions without scanning the entire table. This is very useful for managing rolling windows of data. MariaDB notes this as a key use case for partitioning.

Frequent queries on recent data – Partition pruning can dramatically improve query performance for queries that only touch a few newer partitions. This is applicable for time series or IoT sensor data that is often queried by date range. Anayrat covers partitioning for time series analysis.

Distributing data across nodes – In MPP databases like Redshift, partitioning can be used to distribute data across nodes for improved parallelism. Dremio notes distributed query performance as a key driver for partitioning.

Summary

In this article, we learned about partition and its different types in SQL databases. Here is a recap of some of the key points:

  • Partitioning refers to splitting a large database table into smaller, more manageable parts called partitions.
  • The three main types of partitioning are horizontal, vertical, and hybrid partitioning.
  • Horizontal partitioning splits a table by rows. Vertical partitioning splits a table by columns.
  • Partitioning provides benefits like easier data management, faster query performance, and higher availability.
  • Partition pruning allows only relevant partitions to be accessed when running a query.
  • Common partitioning criteria include ranges, lists, hash, and round-robin.

Overall, database partitioning is an effective technique for managing large tables and improving query performance. By splitting tables into smaller partitions, queries can be executed more efficiently by only accessing the relevant partitions. The flexibility of different partitioning types and criteria allows databases to be tuned for optimal performance based on access patterns.