Sunday, February 6. 2011
Very Slow Data Repartitioning in SQL Server Replication with Precomputed Partitions
Background
I am using Merge Replication with SQL Server 2008 R2 (server and clients). I have setup a publication with a rather complex filtering hierarchy rooted in a table with Parameterized Row Filters extended many times through Join Filters. Making modifications to data in a table near the root of this hierarchy would trigger repartitioning of the replicated data which would never complete and would cause deadlock errors in the other connections to the database while it ran (I let it run overnight once in single user mode, but had to kill it after 13 hours...).
Investigation Technical Details
After a lot of watching in SQL Profiler and digging around in the triggers/procedures which perform the repartitioning I found the culprit. The replication DML trigger on the table (MSMerge_op_GUID
) called the repartitioning procedure (MSmerge_expand_sp_GUID
) which included several queries with the following subquery:
FROM dbo.MSmerge_current_partition_mappings cpm WITH (ROWLOCK)
INNER JOIN dbo.MSmerge_contents mc2 WITH (ROWLOCK) ON cpm.rowguid = mc2.rowguid AND mc2.marker = @marker
Looking at the execution plan for any of the queries showed that this subquery was responsible for at least 40% of the total query cost. Both of these tables are quite large (~800k and ~425k rows respectively in my DB) and neither had indexes to cover this (sub-)query.
Solution
So, of course, I added the following indexes (with naming conventions to match the existing indexes):
ON dbo.MSmerge_current_partition_mappings (rowguid, partition_id);
CREATE INDEX nc6MSmerge_contents
ON dbo.MSmerge_contents (marker, rowguid);
After adding these indexes, the repartitioning operations completed in under 20 minutes!
Caveat
Both of these tables are heavily used and often modified (depending on the workload), so adding more indexes may not be the best solution for databases with high-performance requirements where repartitioning is rare and non-repartitioning operations are the norm. If that is the case, I suggest creating the above indexes before major repartitioning operations and removing them once the repartition is complete. However, that being said, I have been able to leave these indexes on the tables with no noticeable performance impact and a significant reduction in execution time for data modifications which involve repartitioning data.