What is the downside of partitioning?

Partitioning is a database management technique that separates a single logical database into multiple physical databases. It helps improve performance by spreading data across multiple databases, allowing for parallel processing. However, partitioning also has some downsides that must be considered.

Increased Complexity

One of the main downsides of partitioning is increased complexity. With a partitioned database, you now have multiple physical databases to manage instead of just one. This adds overhead for the database administrator. Some specific sources of added complexity include:

  • Partition management – Tables and indexes must be split across multiple physical databases. This requires additional administrative tasks for managing the partitions.
  • Query optimization – The query optimizer must now account for partitions and route queries accordingly. More complexity is added to the optimization process.
  • Data skew – If the data is not distributed evenly, some partitions may have more data and load than others. This requires monitoring and potential reorganization.
  • Partition pruning – For optimal performance, queries should only hit the needed partitions. The partitioning scheme must support partition pruning.
  • Integrity enforcement – PK/FK relationships and other constraints may span partitions. This requires coordination between the physical databases.

While partitioning software can help manage some of this complexity, the DBA takes on more responsibility as the number of partitions grows. A well-designed partitioning strategy is needed to minimize overhead.

Increased Hardware Costs

Partitioning relies on spreading data across multiple physical servers and disks. This inherently comes with increased hardware costs. Some examples include:

  • Multiple servers – Data will need to reside on multiple physical servers, increasing capital and maintenance costs.
  • Increased storage – More disks will be needed to store data that is partitioned across servers.
  • Additional memory – Indexes and buffers will need to be duplicated on the partitioned servers.
  • More powerful servers – Servers may need faster CPUs and more memory to handle parallel queries.
  • Additional network costs – Faster networks may be required for inter-partition communication.

While commodity hardware can be used to reduce costs, high performance systems will generally require higher-end hardware. The increased costs must be weighed against performance benefits.

Maintenance Overhead

Ongoing database maintenance tasks become more complex and time-consuming with partitioning. Some examples include:

  • Backups – Backing up multiple physical databases instead of one adds overhead.
  • Restore – Restoring a partitioned database is more complex due to inter-dependencies.
  • Reorganization – Periodically reorganizing partitions on multiple servers takes more effort.
  • Index maintenance – Index maintenance involves more systems when indexes span partitions.
  • Stat maintenance – Collecting statistics on partitioned database objects adds overhead.
  • Storage management – Managing storage for partitioned database files on multiple servers adds overhead.

While maintenance windows can be scheduled to minimize impact, reducing downtime for maintenance is still more difficult with a partitioned database.

More Difficult Troubleshooting

A partitioned database naturally adds more components that can fail or have problems. Troubleshooting performance issues and outages becomes more difficult due to the distributed nature of partitioning. Some examples include:

  • Finding root cause – An issue may be localized to a single partition or server, requiring investigation across systems.
  • Diagnosing network issues – More components communicating introduces more possibilities for network errors.
  • Identifying skew – Imbalances in data distribution must be diagnosed across partitions.
  • Coordinating failover – If a server fails, addressing it requires failover coordination across servers.
  • Tracing queries – Following a query as it spans multiple physical servers is challenging.
  • Replication lags – More systems increase the likelihood of replication lags between partitions.

Specialized monitoring and troubleshooting skills are required to address issues in complex partitioned environments.

More Difficult Database Changes

Large database changes become more complicated with partitioning. Some examples include:

  • Schema changes – Schema changes may need to be coordinated across all database partitions.
  • Version upgrades – Rolling out major database version upgrades is slowed by having to upgrade individual partitions separately.
  • Hardware migrations – Migrating storage or servers means migrating each partition.
  • Reorganization – Changing the partitioning scheme requires reorganizing data across all partitions.
  • Altering partitions – Adding, removing, merging or splitting partitions requires changes across multiple systems.

Extensive planning and testing is required when making database changes to minimize downtime and issues.

Vendor Lock-in

Partitioning increases reliance on specific database vendors and makes migrating to alternatives more difficult. Some examples include:

  • Proprietary features – Relying on vendor-specific partitioning features can create lock-in.
  • Migration complexity – Migrating a partitioned database to a new vendor is exponentially more complex.
  • Staff expertise – Developing expertise in a vendor’s partitioning tools and APIs leads to inertia.
  • Tooling gaps – Alternatives may lack partitioning tools as sophisticated as the incumbent vendor.
  • Supplementary features – Additional integrated features like compression and replication may be lost.

Organizations that want to remain vendor neutral should be cautious about relying heavily on proprietary partitioning capabilities.

Data Movement Challenges

The distributed nature of partitioning can make moving data in and out more challenging. Some examples include:

  • ETL complexity – Extract, transform and load (ETL) processes must account for spreading data across partitions.
  • Slow data loads – Loading data may require inefficient INSERTs or SLOW INSERTs to coordinate writing across partitions.
  • Query federation – Tools may have to implement query federation to gather results from across partitions.
  • Migrations – Bulk copying data out of a partitioned database will be constrained by network bandwidth.
  • Backup and restore – Similarly, backup and restore are slowed by transferring data across multiple physical servers.

Data integration and migration is not as straightforward when data is distributed across partitions. Careful planning is required.

When Not to Partition

Given the downsides, partitioning is not recommended for every database. It makes most sense for very large databases under high load. Some cases when partitioning should be avoided include:

  • Smaller databases – Overhead may outweigh benefits for smaller datasets.
  • Applications with simpler queries – If queries do not require frequent full table scans or joins between very large tables, performance gains will be minimal.
  • OLTP workloads – Online transactional processing with frequent small writes often does not benefit from partitioning.
  • Unpredictable workloads – If the application’s queries and data volumes are unpredictable, maintaining partitions is difficult.
  • Limited hardware budget – Partitioning requires investing in more hardware.
  • Limited DBA resources – The complexity may be too much for smaller teams to manage.

Other alternatives like caching, indexing, and tuning should be explored before deciding to take on the overhead of database partitioning.

Key Considerations When Partitioning

If partitioning is implemented, some key considerations include:

  • Partitioning scheme – Choose the partitioning scheme (range, list, hash) carefully based on data and query patterns.
  • Indexing strategy – Plan how indexes will be managed across partitions.
  • Hardware selection – Choose hardware to support the level of parallelism required.
  • ETL processes – Design ETL/ELT processes to work with partitioning from the start.
  • Monitoring – Have visibility into data distribution, hardware utilization and performance across partitions.
  • Partition management – Automate and optimize partition management tasks like pruning and splitting where possible.
  • Backup and restore – Design an efficient partition-aware backup and recovery strategy.

Getting these factors right will help maximize the performance benefits while minimizing the downsides of database partitioning.

Conclusion

Partitioning can provide substantial performance improvements for large, complex databases by enabling parallel processing across commodity servers. However, it also introduces significant overhead that should not be underestimated. The increased complexity, hardware costs, maintenance, and loss of flexibility require careful evaluation. Partitioning is most likely to be beneficial when databases grow very large and complex. The downsides may outweigh the benefits for smaller databases. Organizations should carefully weigh these tradeoffs when considering database partitioning.