System Administration Programming News from Bozeman, MT

Saturday, October 22. 2011

Problems with NULL Large Object Data Types and 64-bit Replication

I recently encountered an interesting problem where constraint checks on VARBINARY(MAX) columns were failing during synchronization for merge-replicated clients. After a lot of investigation, I have found that when a 64-bit SQL Server 2008 R2 subscriber synchronizes a table which contains large object data types on an article published with @stream_blob_columns = N'true' any constraint checks which depend on the values being NULL will fail. Note that 32-bit clients work properly and that without the constraint checks the final value would be NULL, but apparently during synchronization it is in a temporary state of being non-NULL and fails before reaching that point.

Although the release notes in KB2463333 do not mention a fix for this problem, it appears that Service Pack 1 for SQL Server 2008 R2 at least contains a workaround. When testing with SP1, the insert fails due to the constraint violation but the retry succeeds. However, even with the workaround, performance problems related to the retry are a legitimate concern. I would suggest considering disabling @stream_blob_columns on the affected articles for anyone encountering this problem.

Posted by Kevin Locke in SQL Server at 13:13

Wednesday, July 13. 2011

String Concatenation with XML and Entity Encoding Problems

The Problem

Unfortunately, SQL Server does not provide a built-in string concatenation aggregate operator, a la GROUP_CONCAT from MySQL. Because of this deficiency, many workarounds, hacks, and custom code has been written to accomplish this goal. An excellent survey of some of the available methods was compiled by Anith Sen in his post Concatenating Row Values in Transact-SQL. For anyone using "The blackbox XML methods" to concatenate strings, there is at least one more important caveat to be aware of: XML Entity Encoding. Consider the following example:

DECLARE @Strings TABLE (Text NVARCHAR(4000) NOT NULL);
INSERT INTO @Strings VALUES ('x < y when: ');
INSERT INTO @Strings VALUES ('x = 1 &&');
INSERT INTO @Strings VALUES ('y = 2');
SELECT Text + ' '
FROM @Strings
FOR XML PATH('')

This results in the text x < y when: x = 1 && y = 2. But why? The answer is that when an xml value is cast to a string (as it is implicitly in the above query, because the TYPE directive in the FOR XML statement is omitted) it is entity-encoded for interpretation by an XML parser. Without entity encoding, the string would not parse into equivalent XML, if it was even valid XML at all.

A Solution

To avoid this problem, we can query the value of the resulting XML expression in a way that avoids entity encoding. This can be done using the value() method of the xml type as follows:

DECLARE @Strings TABLE (Text NVARCHAR(4000) NOT NULL);
INSERT INTO @Strings VALUES ('x < y when: ');
INSERT INTO @Strings VALUES ('x = 1 &&');
INSERT INTO @Strings VALUES ('y = 2');
SELECT (SELECT Text + ' '
FROM @Strings
FOR XML PATH(''), TYPE).VALUE('.', 'NVARCHAR(4000)')

Note the addition of the TYPE directive to the FOR XML statement, it is required. Also note that NVARCHAR(4000) can be replaced with NVARCHAR(MAX) or any other SQL data type, as appropriate. Since the value method does not escape its output, the text is returned unchanged.

For the curious, I have attempted to benchmark the differences between these two methods and encountered inconsistent and conflicting results. In most scenarios using the value method was slightly slower than the alternative, but the difference seems to be overwhelmed by other minor differences in test configuration. As always, your mileage may vary.

Posted by Kevin Locke in SQL Server at 12:06

Wednesday, June 8. 2011

IPv6 Not Forwarding Through Linux Box

To join the fun of World IPv6 Day, I decided to setup IPv6 connectivity to the Digital Engine office. This was accomplished using Hurricane Electric's awesome (free) tunnelbroker.net service. All went smoothly, except that the office gateway, running Debian Squeeze (Linux), would not forward IPv6 traffic from computers on the network. Traffic originating at the gateway worked fine, traffic originating on the local network was silently discarded. For others facing a similar problem, here are some things to check:

Check that IPv6 connectivity is working from the gateway computer (ping6 ipv6.google.com or similar).
Check that network computers receive appropriate IPv6 addresses (that radvd or dhcp6 are working properly) and that they can connect/ping to the gateway over IPv6.
Check that forwarding is enabled for IPv6 on all (or at least both local and public) interfaces in the kernel (sysctl net.ipv6.conf.all.forwarding). This can be set permanently in /etc/sysctl.conf.
Check that the gateway computer has an explicit default route. This is important! If an IPv6 gateway address is not specified, traffic will not be forwarded. (Make sure there is a gateway statement in /etc/network/interfaces and/or that route -6 lists a global default route with a gateway address.)
Check that there are no firewall rules blocking traffic and that the default FORWARD policy is ACCEPT (ip6tables -L).
Check that the internal network address of the gateway has a global IPv6 address assigned. A link-local address is not sufficient and Linux will not forward traffic from interfaces with only link-local addresses!

For the curious, it was the last point which was causing my problem. As a reference, here is a snippet from /etc/network/interfaces which I used to configure the tunnel:

iface eth1 inet static
        address 192.168.z.z
        netmask 255.255.255.0
iface eth1 inet6 static
        address 2001:470:y:xxx::1
        netmask 64

auto he-ipv6
iface he-ipv6 inet6 v4tunnel
        endpoint 216.218.226.238
        local z.z.z.z
        address 2001:470:x:xxx::2
        netmask 64
        gateway 2001:470:x:xxx::1

Posted by Kevin Locke in System Administration at 10:02

Friday, May 6. 2011

Corrupted MSmerge_contents90_forall.bcp in Dynamic Snapshot

Executive Summary

A problem with dynamic snapshot generation may cause some BCP files to contain stale data at the end of the file. Re-creation of the dynamic snapshot does not solve the problem. Removing the old dynamic snapshot before re-creating it may solve the problem temporarily.

Background

I am using Merge Replication with SQL Server 2008 R2 (server and clients) and recently encountered the following error message during synchronization on one of the subscribers:

The merge process was unable to deliver the snapshot to the Subscriber. If using Web synchronization, the merge process may have been unable to create or write to the message file. When troubleshooting, restart the synchronization with verbose history logging and specify an output file to which to write. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147201001)
Get help: http://help/MSSQL_REPL-2147201001
The process could not bulk copy into table '"dbo"."MSmerge_contents"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20037)
Get help: http://help/MSSQL_REPL20037
Field size too large (Source: MSSQLServer, Error number: 0)
Get help: http://help/0
To obtain an error file with details on the errors encountered when initializing the subscribing table, execute the bcp command that appears below.  Consult the BOL for more information on the bcp utility and its supported options. (Source: MSSQLServer, Error number: 20253)
Get help: http://help/20253
bcp "dbname"."dbo"."MSmerge_contents" in "\\UNCPath\MSmerge_contents90_forall.bcp" -e "errorfile" -t"\n<x$3>\n" -r"\n<,@g>\n" -m10000 -SCOMPUTERNAME -T -w (Source: MSSQLServer, Error number: 20253)
Get help: http://help/20253

This message only appeared on one of the subscribers. It was completely reproducible and occurred every time the subscriber attempted to synchronize. Reinitializing the subscriber and recreating the snapshot had no effect. The error message would also appear on a test machine set to use the same partition (via setting the HOST_NAME property to match the subscriber with the errors).

Note that this problem appears to be the same as the one reported in Microsoft Connect #646157, that was closed as a duplicate of #646156... which appears to be inaccessible. How very frustrating!

Investigation Technical Details

One of the initial steps that I used to try to isolate the problem was to copy MSmerge_contents90_forall.bcp (which is listed in the bcp invocation in the error message) to a test machine and attempt to load it. This can be done using the SQL BULK INSERT statement, or using the bcp utility. I tried the following SQL:

BULK INSERT MSmerge_contents
FROM 'C:\File\Path\MSmerge_contents90_forall.bcp'
WITH (DATAFILETYPE = 'widenative')

Which produced the following output:

Msg 4866, Level 16, State 7, Line 1
The bulk load failed. The column is too long in the data file for row 1355, column 6. Verify that the field terminator and row terminator are specified correctly.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

It confirms that the file is invalid and we have a row number that may be close to where the error occurs in the file, but not enough to isolate the problem yet. So, next a try with bcp. Unfortunately, the bcp command in the synchronization error message is not applicable to replication configurations using snapshots in native format (the default). Instead, I used the following command:

bcp <span class="replaceable">dbname</span>.dbo.MSmerge_contents in msmerge_contents90_forall.bcp -N -T -e errfile.dat

A description of the meaning of these command options can be found on the bcp Utility page on MSDN. The command produced the following output:

Starting copy...
1000 rows sent to SQL Server. Total sent: 1000
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation
SQLState = 22001, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]String data, right truncation

1355 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total     : 93     Average : (14569.89 rows per sec.)

At least some of the rows were successfully loaded. The rows saved to errfile.dat, which could not be loaded, do not appear to be sane (e.g. negative values in the tablenick column), suggesting some sort of data corruption. But again, no real indication of what is happening.

At this point I was lost. I looked at SQL Profiler traces during snapshot creation and poked around in the data without success. I decided to write a bcp file parser to determine the exact source and nature of the corruption. What I found was 11 bytes which were invalid:

001c820: ffff ff10 48a2 984a 33cb c046 a44e 1d9d  ....H..J3..F.N..
001c830: b826 5368 ffff ff84 c1b3 d97a 56cd e5ff  .&Sh.......zV...
001c840: ffff d088 0b05 109b 6e58 9e16 c611 e0ba  ........nX......
001c850: 8800 145e 281b 87ed 0c00 0000 0000 0008  ...^(...........
001c860: 700c 0000 0000 0000 0b00 ece6 e53e de7e  p............>.~
001c870: 0400 0000 ffff ff10 7482 32a1 50b6 4d4a  ........t.2.P.MJ

If these bytes were removed, the file would parse completely without error. Now we are closer to a real cause, and perhaps a solution.

Next I recreated the dynamic snapshot for the partition while running Process Monitor (a tool I highly recommend) and looked for accesses to MSmerge_contents90_forall.bcp, particularly at the offset where the corruption begins. What I found is that data is written up to the byte before this offset, but not at this byte or after it (by any process). Looking back further in the log revealed that the file was opened with disposition OpenIf, rather than CreateIf, meaning that it was not truncated. Also, there is no call to SetEndOfFileInformationFile/SetAllocationInformationFile made to truncate the file or resize it after writing. Eureka! We've found the problem!

If the size of MSmerge_contents90_forall.bcp (or any other bcp file) shrinks between the previous snapshot and the current snapshot, stale data will be left at the end of the file and errors will occur (unless it happens to end on a row boundary, in which case the stale data will be loaded, potentially causing future errors). The workaround was simple: Delete the folder for the dynamic partition (or individual bcp files with errors) and recreate the snapshot.

Best of luck in solving your problems.

Posted by Kevin Locke in SQL Server at 11:27

Sunday, February 6. 2011

Very Slow Data Repartitioning in SQL Server Replication with Precomputed Partitions

Background

I am using Merge Replication with SQL Server 2008 R2 (server and clients). I have setup a publication with a rather complex filtering hierarchy rooted in a table with Parameterized Row Filters extended many times through Join Filters. Making modifications to data in a table near the root of this hierarchy would trigger repartitioning of the replicated data which would never complete and would cause deadlock errors in the other connections to the database while it ran (I let it run overnight once in single user mode, but had to kill it after 13 hours...).

Investigation Technical Details

After a lot of watching in SQL Profiler and digging around in the triggers/procedures which perform the repartitioning I found the culprit. The replication DML trigger on the table (MSMerge_op_GUID) called the repartitioning procedure (MSmerge_expand_sp_GUID) which included several queries with the following subquery:

SELECT partition_id
FROM dbo.MSmerge_current_partition_mappings cpm WITH (ROWLOCK)
INNER JOIN dbo.MSmerge_contents mc2 WITH (ROWLOCK) ON cpm.rowguid = mc2.rowguid AND mc2.marker = @marker

Looking at the execution plan for any of the queries showed that this subquery was responsible for at least 40% of the total query cost. Both of these tables are quite large (~800k and ~425k rows respectively in my DB) and neither had indexes to cover this (sub-)query.

Solution

So, of course, I added the following indexes (with naming conventions to match the existing indexes):

CREATE INDEX nc2MSmerge_current_partition_mappings
ON dbo.MSmerge_current_partition_mappings (rowguid, partition_id);
CREATE INDEX nc6MSmerge_contents
ON dbo.MSmerge_contents (marker, rowguid);

After adding these indexes, the repartitioning operations completed in under 20 minutes!

Caveat

Both of these tables are heavily used and often modified (depending on the workload), so adding more indexes may not be the best solution for databases with high-performance requirements where repartitioning is rare and non-repartitioning operations are the norm. If that is the case, I suggest creating the above indexes before major repartitioning operations and removing them once the repartition is complete. However, that being said, I have been able to leave these indexes on the tables with no noticeable performance impact and a significant reduction in execution time for data modifications which involve repartitioning data.

Posted by Kevin Locke in SQL Server at 12:55 | Comment (1)

Friday, February 4. 2011

1and1 Blocking Port 25

After a bit of frustration at Joomla! for discarding important error information, and phpMailer for not providing a method of retaining that error information, I have discovered that my client's web host, 1and1, blocks outbound connections on port 25 to all hosts except smtp.1and1.com. This wouldn't be a significant problem if I hadn't setup SPF on their domain.... Looks like it is time to open up an alternate port on the mail server....

Update: 1and1 is blocking ports 465 and 587 in addition to port 25. Great....

Posted by Kevin Locke in System Administration at 12:26 | Comments (2)

Friday, June 25. 2010

SQL Server Missing from Synchronization Manager

Symptoms

After (re-)creating a subscription to a pull merge replication publication from SQL Server Express 2005, the subscription fails to appear in Synchronization Manager. After further investigation, the symptom was determined to be restricted to non-Administrators.

Things To Check

Make sure sp_addmergepullsubscription_agent was run with @enabled_for_syncmgr = 'TRUE'. This requirement differs from previous SQL Server versions where this was the default. When this parameter is not set to 'TRUE', the subscription will not appear in Synchronization Manager
Make sure the subscription can be synchronized outside of Synchronization Manager (to confirm that it is a problem when run through replsync.dll - in Synchronization Manager). The easiest way to do this is using replmerg from the command-line.
Make sure the user has permissions to write to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\80\Replication\Subscriptions. Without write permission, SQL Server will silently fail to appear in Synchronization Manager. Also, by default in many configurations, non-Administrators do not have write access to this key so it must be adjusted manually.

The last item is particularly important and required quite a bit of my time to determine... which resulted in the need for this post. Hopefully one of the above suggestions will help you avoid spending the same amount of time that I did to solve this problem.

Posted by Kevin Locke in SQL Server at 11:18

Sunday, June 20. 2010

Triggers, NOT FOR REPLICATION Identity, and Merge Replication

Initial Note/Warning

I hope that there is a more elegant, simple, and robust method for dealing with the issues described in this post than the one that I present herein. My method is a rather ugly kludge that relies on undocumented features of SQL Server and is not nearly as parallelizable as I would like. If you are aware of a better method (which I hope exists), please don't hesitate to post it in the comments. Until a better method is posted, I invite you to use and/or learn from the method presented in this post.

Background

Consider an SQL Server 2005 database which is published for merge replication from a single publisher with multiple subscribers. The database contains 2 tables which we are considering: Forms and InboxForms. Forms contains some sort of form data and InboxForms references all forms which are present in a user's "inbox". Each of the tables contains an INT IDENTITY primary key column and several other columns of data that are not relevant to the problem at hand. The publication is filtered based on the Forms that each user is able to view (determined by some separate mechanism not described here). When a new row is inserted into Forms, a trigger is used to route the forms into the appropriate inbox(es).

The Problem

The routing can not occur (exclusively) at the subscribers, because the filter for the subscriber will not necessarily include user information for the recipient of the form and the form can not be placed into InboxForms if it recipient user does not exist on the subscriber. So, the trigger must run on the publisher during synchronization when inserts are performed by the merge agent (i.e. the trigger must not be marked NOT FOR REPLICATION). However, in this configuration, when the merge agent runs, the following error message is produced:

Msg 545, Level 16, State 1, Line 3
Explicit value must be specified for identity column in table 'InboxForms' either when IDENTITY_INSERT is set to ON or when a replication user is inserting into a NOT FOR REPLICATION identity column.

The problem is that the INT IDENTITY column in InboxForms was marked NOT FOR REPLICATION when the publication was created in order to facilitate automatic identity range management, which is described in the Replicating Identity Columns article in the SQL Server Books Online. NOT FOR REPLICATION behaves very similarly to IDENTITY_INSERT (as hinted at in the error message), such that when a row is inserted by the merge agent, the ideneity seed value is not incremented and the value of the identity column must be explicitly specified. Note, however, that it is not the same mechanism as IDENTITY_INSERT, so changing IDENTITY_INSERT in the trigger will not remove the requirement for explicitly specified identity values.

The Solution

One method to solve this problem is to disable NOT FOR REPLICATION, as suggested in KB908711 (which specifically addresses this issue). However, using this option will interfere with automatic identity range management, since the identity values generated on the subscribers can not be copied to the publisher, and other steps will need to be taken to manually manage identity values. For me, this was an unacceptably high price to pay and another solution was required.

A solution which does not interfere with automatic identity range management is to calculate values for the identity columns and explicitly specify those values when they are required. In order to accomplish this, understanding several features of T-SQL is required: In order to determine when the values are required, the trigger needs to test if it is being run from the merge replication agent. This can be done by testing for the 'replication_agent' SESSIONPROPERTY. In order to determine appropriate values for the identity column, use IDENT_CURRENT and IDENT_INCR. Note that using the maximum value for the identity column is not necessarily correct because the maximum identity range will not necessarily be allocated to the publisher. DBCC CHECKIDENT can be used to update the identity seed value (which is not affected by explicitly inserted identity values).

One other complicating factor in our implementation is that there is no way to atomically insert explicit identity values and update the identity seed value. Therefore, locking is required to prevent multiple connections from simultaneously updating the values and causing collisions. (Or collision errors must be caught and retried) In the following implementation, an exclusive table lock is acquired which prevents any inserts from occurring on the table when the trigger is running. This is a serious performance problem as it prevents any other operations on the locked table from completing while the trigger is executing. Keep this in mind when designing the queries that will run while the lock is held.

Now, without further ado, here's the trigger:

ALTER TRIGGER TR_RouteForms
ON dbo.Forms
AFTER INSERT
AS
BEGIN
-- Client isn't expecting routing counts from their insert
SET NOCOUNT ON;

IF SESSIONPROPERTY('replication_agent') <> 0
BEGIN
-- Running from the replication agent
-- Need explicit value for NOT FOR REPLICATION IDENTITY columns

-- Use transaction to limit lock scope
BEGIN TRAN
-- Variables for IDENT_CURRENT and IDENT_INCR, required
-- because DBCC CHECKIDENT syntax won't support nested parens
DECLARE @Ident INT, @Incr INT;

-- RowCnt used to preserve @@ROWCOUNT
DECLARE @RowCnt INT;

-- Must acquire exclusive lock on InboxForms to prevent other
-- inserts (which would invalidate the identity and cause
-- collisions in the identity column).
-- Select into variable to prevent resultset going to client
-- WHERE clause quickly evaluated, returns small (empty) result
DECLARE @Dummy INT;
SELECT @Dummy = InboxFormID
FROM InboxForms WITH (TABLOCK, XLOCK, HOLDLOCK)
WHERE InboxFormID = 0;

-- Perform the form routing (inserts into InboxForms)
SET @Ident = IDENT_CURRENT('InboxForms');
SET @Incr = IDENT_INCR('InboxForms');
INSERT INTO InboxForms (InboxFormID, FormID, ...)
SELECT @Ident + @Incr * ROW_NUMBER() OVER (ORDER BY FormID) AS InboxFormID, FormID, ...
FROM inserted
WHERE ...routing criteria...
SET @RowCnt = @@ROWCOUNT;
IF @RowCnt > 0
BEGIN
-- At least 1 form was routed, update the identity seed value
-- Note: Can't use MAX(InboxFormID) since publisher may not
-- have been allocated the maximum identity range
SET @Ident = @Ident + @Incr * @RowCnt;
DBCC CHECKIDENT (InboxForms, RESEED, @Ident)
WITH NO_INFOMSGS;
END
COMMIT TRAN
END
ELSE
BEGIN
-- NOT running from the replication agent
-- Can insert normally into NOT FOR REPLICATION IDENTITY columns

-- Perform the form routing (inserts into InboxForms)
INSERT INTO InboxForms (InboxFormID, FormID, ...)
SELECT @Ident + @Incr * ROW_NUMBER() OVER (ORDER BY FormID) AS InboxFormID, FormID, ...
FROM inserted
WHERE ...routing criteria...
END
END

Posted by Kevin Locke in SQL Server at 20:34

Tuesday, May 11. 2010

Making Custom Replication Resolvers Work in SQL Server 2005

Background

SQL Server provides a very convenient method for implementing custom business logic in coordination with the synchronization/merge process of replication. For tasks which need to be done as data is synchronized, or decisions about resolving conflicts which are business-specific, implementing a custom resolver is a surprisingly straight-forward way to go. For more information, check out the following resources:

Making It Work

Things are never quite as easy as they seem... Chances are, some sort of error message was generated once the DLL was deployed and the instructions in (1) were completed. For example, the following error is common:

Error loading custom assembly "assemblyname", Error : "Could not load file or assembly 'assemblyname' or one of its dependencies. The system cannot find the file specified.

Don't Panic. First, check that the DLL has been placed in the directory containing the merge agent on the subscriber (assuming this is being done with pull - place the DLL on the server for push) or registered in the GAC. This message can also indicate dependency problems for the DLL where dependent libraries can't be found/loaded. One way to test that the assembly can be loaded is to compile and run the following program in the same directory as the merge agent:

class Program
{
static void Main(string[] args)
{
TryLoadType(ASSEMBLY_NAME, CLASS_NAME);

// Leave window visible for non-CLI users
Console.ReadKey();
}

static void TryLoadType(string assemblyname, string typename)
{
try
{
Assembly asm = Assembly.Load(assemblyname);
if (asm == null)
{
Console.WriteLine("Failed to load assembly");
return;
}

Type type = asm.GetType(typename);
if (type == null)
{
Console.WriteLine("Failed to load type");
return;
}

ConstructorInfo constr = type.GetConstructor(new Type[0]);
if (constr == null)
{
Console.WriteLine("Failed to find 0-argument constructor");
return;
}

object instance = constr.Invoke(new object[0]);

Console.WriteLine("Successfully loaded " + type.Name);
}
catch (Exception ex)
{
Console.Error.WriteLine("Error loading type: " + ex.Message);
}
}
}

Note: It is very important to correctly determine where the merge agent executable is and where it is being run from when testing. The DLL search path includes both the directory in which the executable file exists and the directory from which it is run (for weak-named assemblies). replmerg.exe usually lives in C:\Program Files\Microsoft SQL Server\90\COM, but mobsync.exe (if you are using Synchronization Manager or Sync Center) is in C:\WINDOWS\system32, and this will have an effect on the assembly search path.

Make sure the names are exactly as they were specified in sp_registercustomresolver. If the problem was a misnamed assembly or class (because you are like me and fat-fingered the name...) here's how you fix it: sp_registercustomresolver can be re-run with the same @article_resolver parameter to overwrite the information for that resolver. This overwrites the information stored in the registry at HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\90\Replication\ArticleResolver (or a similar location for different versions/configurations). However, if the resolver has already been attached to an article, the information is also stored in sysmergearticles in the article_resolver (assembly name), resolver_clsid (CLSID), and resolver_info (.NET class name) columns. So, run an UPDATE on these columns to fix errors, as appropriate.

Good Luck!

Posted by Kevin Locke in Software Development, SQL Server at 10:40

Tuesday, April 27. 2010

identityrangemanagement and read-only tables

As a follow-up to yesterday's problems with exhausted identity ranges, I have come to a realization which I am now adopting as a best practice and I highly recommend for others:

For read-only articles (articles with @subscriber_upload_options = 2), always set @identityrangemanagementoption = 'none'.

The reasoning behind this advice is that when @identityrangemanagementoption = 'auto' (the default in Management Studio), identity ranges will be allocated for all subscribers, even though they will never use these ranges because the table article is read-only. Yet, if the table has a small datatype for a primary key (such as tinyint or smallint) it is quite likely that the identity range will be exhausted (with the default identity range being 1000, this doesn't take many subscribers) and when this happens synchronization will cease.

If there is a reason for allocating these ranges to read-only table articles or for leaving automatic identity range mangement enabled, please post it in the comments.

Posted by Kevin Locke in SQL Server at 08:39

Monday, April 26. 2010

Troubleshooting Failed to allocate new identity range

This morning I encountered a fun new error message during SQL Server synchronization. The error message (from the Replication Monitor) was:

Error messages:

The Publisher failed to allocate a new set of identity ranges for the subscription. This can occur when a Publisher or a republishing Subscriber has run out of identity ranges to allocate to its own Subscribers or when an identity column data type does not support an additional identity range allocation. If a republishing Subscriber has run out of identity ranges, synchronize the republishing Subscriber to obtain more identity ranges before restarting the synchronization. If a Publisher runs out of identit (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147199417) Get help: http://help/MSSQL_REPL-2147199417
Not enough range available to allocate a new range for a subscriber. (Source: MSSQLServer, Error number: 20668) Get help: http://help/20668
Failed to allocate new identity range. (Source: MSSQLServer, Error number: 21197) Get help: http://help/21197

First, some background: I am using a (very) simple replication topology with a single publisher running SQL Server 2005 and ~10 subscribers running SQL Server 2005 Express (all with the latest service packs and patches, of course). No re-publishing servers, multiple publications, etc. The error occurred not long after I had added 2 new subscribers (which turns out to be quite important).

Note: For replication topologies which do include a re-publishing server, the error may be explained by KB941989, which can be solved by installing Cumulative Update 4 (or later updates including the fixes).

To find the cause of this error message, some technical information about how identity values are managed during replication is needed. A key source of information for table articles with identity columns is the MSmerge_identity_range table. This table contains a row for each range of values assigned to each subscriber for each article as well as an additional row holding the entire range of identity values for the table article (this is the row with is_pub_range = 1). In order to make sense of this table, the artid column can be joined to the sysmergearticles table and subid can be joined to the sysmergesubscriptions table. Take note of the max_used column (which only has a value for pub_range rows); this column holds the end of the maximum range that has been allocated for the article (technically 1-past-the-last, since it holds the max of next_range_end, but conceptually it is the end). When this value is near the end of the pub_range, then the available identity ranges are almost exhausted.

One quick way to determine if the problem is due to exhaustion of the available identity range, is with a query similar to the following:

SELECT name, range_begin, range_end, max_used
FROM MSmerge_identity_range mir
INNER JOIN sysmergearticles sma ON mir.artid = sma.artid
WHERE is_pub_range = 1 AND range_end <= max_used + pub_range

Note: In the (very unusual) case that the subscribers are given larger ranges than the publisher(s), replace pub_range with sub_range in the above query. If this query returns any rows, then the problem has been identified. The table articles listed have exhausted their available identity ranges. If the range_end value is below the maximum value for the identity datatype, then it should be possible to increase this value and solve the problem quite easily (Note: I have not found a documented way to do this "safely" and modifying this table directly is probably dangerous, but it solved the problem for me. YMMV). If the range_end value is the maximum for the datatype, then the underlying datatype will likely need to be changed (or significant compaction of the values will need to be done, somehow) and the article dropped and re-added to the publication.

Other sources that I have come across suggested running the sp_adjustpublisheridentityrange stored procedure. It should be safe to run in any case, although I didn't find it particularly useful.

Another suggestion for similar errors that was suggested on the MSDN Forums is to check the identity value on the publisher and that it is within the range allocated to the publisher (both in MSmerge_identity_range and in the check constraint that replication adds to the table for checking the identity range value).

If all of the above suggestions fail, my best suggestion is to watch the synchronization in SQL Profiler (for server versions where SQL Profiler is not available, AnjLab SQLProfiler is a nice alternative) and to examine some of the replication stored procedures using Management Studio (right click->Modify to see the SQL of the SP) to get ideas for what might be going wrong. In particular, consider examining sp_adjustpublisheridentityrange, MSget_identity_range_info, MSpub_adjust_identity, and similar identity management stored procedures.

Good Luck!

Posted by Kevin Locke in SQL Server at 14:15

Saturday, February 6. 2010

Removing All Nonpresent Devices with DevCon

Background

Windows remembers all devices which have previously been connected to the system and retains the drivers for those devices such that if the device is reconnected the driver searching and loading process can be avoided. This is a very useful behavior for devices which are commonly connected and disconnected (e.g. USB devices), but it has the side-effect that devices which are used once, or used for a finite amount of time, will forever be remembered by Windows. It is unclear exactly what "remember" means in this context, as I am not exactly sure what sort of storage or pre-loading of the drivers is occurring, but I can confirm that drivers for these "non-present" devices can affect the system, either with compatibility problems introduced by the driver or with performance problems from many accumulated "non-present" devices (Note: I doubt the performance impact is noticeable on all but extreme cases - like replacing your motherboard more than once with different hardware...).

Managing "Non-Present" Devices

The process for viewing these "non-present" devices in Device Manager is outlined in KB315539. They can also be viewed using the Microsoft DevCon utility. In fact, both programs can be used to remove, as well as view, the "non-present" devices. In Device Manager, the devices can be uninstalled just like any other. In DevCon, you must prefix the Device ID with a "@" in order for it to remove the device.

However, neither of these solutions allows for a quick method of removing all "non-present" devices, and on a system with hundreds of such devices this can be a significant hassle. To overcome this difficulty, I have written a script to remove all "non-present" devices which are not legacy or SW devices (since these are often required for other reasons - even when "non-present") using DevCon. The script can also read and write a file listing all devices to be removed (allowing for manual editing of the device list). You can download the script from the Remove Devices project page on the forge.

Posted by Kevin Locke in System Administration at 15:58 | Comments (4)

Thursday, February 4. 2010

Serious SPF/SenderID Problems in Exchange 2003

I was recently bitten by KB910272 (again). For the unfamiliar, this issue causes Sender ID/SPF records with non-class full subnets to be processed incorrectly and mail from those domains to be rejected. The number of such domains is surprisingly large for this problem to remain unsolved in the normal patch cycle... or so it seems to me right now.

To solve the problem, apply the hotfix from KB910272 or, even better, from one of the superseding updates: KB927478 or KB951639.

Posted by Kevin Locke in System Administration, Troubleshooting at 16:20

Saturday, October 22. 2011

Wednesday, July 13. 2011

The Problem

A Solution

Wednesday, June 8. 2011

Friday, May 6. 2011

Executive Summary

Background

Investigation Technical Details

Sunday, February 6. 2011

Background

Investigation Technical Details

Solution

Caveat

Friday, February 4. 2011

Friday, June 25. 2010

Symptoms

Things To Check

Sunday, June 20. 2010

Initial Note/Warning

Background

The Problem

The Solution

Tuesday, May 11. 2010

Background

Making It Work

Tuesday, April 27. 2010

Monday, April 26. 2010

Saturday, February 6. 2010

Background

Managing "Non-Present" Devices

Thursday, February 4. 2010

Syndicate This Blog

Authors

Show tagged entries

Archives