Tuesday, April 27. 2010
identityrangemanagement and read-only tables
As a follow-up to yesterday's problems with exhausted identity ranges, I have come to a realization which I am now adopting as a best practice and I highly recommend for others:
@subscriber_upload_options = 2
), always set @identityrangemanagementoption = 'none'
.
The reasoning behind this advice is that when @identityrangemanagementoption = 'auto'
(the default in Management Studio), identity ranges will be allocated for all subscribers, even though they will never use these ranges because the table article is read-only. Yet, if the table has a small datatype for a primary key (such as tinyint or smallint) it is quite likely that the identity range will be exhausted (with the default identity range being 1000, this doesn't take many subscribers) and when this happens synchronization will cease.
If there is a reason for allocating these ranges to read-only table articles or for leaving automatic identity range mangement enabled, please post it in the comments.
Monday, April 26. 2010
Troubleshooting Failed to allocate new identity range
- The Publisher failed to allocate a new set of identity ranges for the subscription. This can occur when a Publisher or a republishing Subscriber has run out of identity ranges to allocate to its own Subscribers or when an identity column data type does not support an additional identity range allocation. If a republishing Subscriber has run out of identity ranges, synchronize the republishing Subscriber to obtain more identity ranges before restarting the synchronization. If a Publisher runs out of identit (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147199417) Get help: http://help/MSSQL_REPL-2147199417
- Not enough range available to allocate a new range for a subscriber. (Source: MSSQLServer, Error number: 20668) Get help: http://help/20668
- Failed to allocate new identity range. (Source: MSSQLServer, Error number: 21197) Get help: http://help/21197
First, some background: I am using a (very) simple replication topology with a single publisher running SQL Server 2005 and ~10 subscribers running SQL Server 2005 Express (all with the latest service packs and patches, of course). No re-publishing servers, multiple publications, etc. The error occurred not long after I had added 2 new subscribers (which turns out to be quite important).
Note: For replication topologies which do include a re-publishing server, the error may be explained by KB941989, which can be solved by installing Cumulative Update 4 (or later updates including the fixes).
To find the cause of this error message, some technical information about how identity values are managed during replication is needed. A key source of information for table articles with identity columns is the MSmerge_identity_range table. This table contains a row for each range of values assigned to each subscriber for each article as well as an additional row holding the entire range of identity values for the table article (this is the row with is_pub_range = 1). In order to make sense of this table, the artid column can be joined to the sysmergearticles table and subid can be joined to the sysmergesubscriptions table. Take note of the max_used column (which only has a value for pub_range rows); this column holds the end of the maximum range that has been allocated for the article (technically 1-past-the-last, since it holds the max of next_range_end, but conceptually it is the end). When this value is near the end of the pub_range, then the available identity ranges are almost exhausted.
One quick way to determine if the problem is due to exhaustion of the available identity range, is with a query similar to the following:
SELECT name, range_begin, range_end, max_used
FROM MSmerge_identity_range mir
INNER JOIN sysmergearticles sma ON mir.artid = sma.artid
WHERE is_pub_range = 1 AND range_end <= max_used + pub_range
Note: In the (very unusual) case that the subscribers are given larger ranges than the publisher(s), replace pub_range with sub_range in the above query. If this query returns any rows, then the problem has been identified. The table articles listed have exhausted their available identity ranges. If the range_end value is below the maximum value for the identity datatype, then it should be possible to increase this value and solve the problem quite easily (Note: I have not found a documented way to do this "safely" and modifying this table directly is probably dangerous, but it solved the problem for me. YMMV). If the range_end value is the maximum for the datatype, then the underlying datatype will likely need to be changed (or significant compaction of the values will need to be done, somehow) and the article dropped and re-added to the publication.
Other sources that I have come across suggested running the sp_adjustpublisheridentityrange stored procedure. It should be safe to run in any case, although I didn't find it particularly useful.
Another suggestion for similar errors that was suggested on the MSDN Forums is to check the identity value on the publisher and that it is within the range allocated to the publisher (both in MSmerge_identity_range and in the check constraint that replication adds to the table for checking the identity range value).
If all of the above suggestions fail, my best suggestion is to watch the synchronization in SQL Profiler (for server versions where SQL Profiler is not available, AnjLab SQLProfiler is a nice alternative) and to examine some of the replication stored procedures using Management Studio (right click->Modify to see the SQL of the SP) to get ideas for what might be going wrong. In particular, consider examining sp_adjustpublisheridentityrange, MSget_identity_range_info, MSpub_adjust_identity, and similar identity management stored procedures.
Good Luck!
Saturday, March 20. 2010
WshShell.Exec Considered Harmful Due To Blocking
For the unfamiliar, the Exec method of WshShell (used in Windows scripting) runs an application and provides access to that application's standard streams as TextStream objects. The problem with this method is that the blocking behavior of these streams is not defined (as noted in the comments on the StdOut property) and, more importantly, is impossible to use safely. The problem, familiar to anyone who has dealt with reading and writing to child programs, is that it is very easy to block while attempting to read from (or write to) the child process. What makes this problem worse is that TextStream provides no method for dealing with blocking; there is no way to set non-blocking mode or to check if input is ready to be read or to check if it would be safe to write (at least, none that I am aware of).
As a demonstration of the problem, consider the following applications. First, the child program (written in C):
#include <stdio.h>
int main(void)
{
for (int i=0; i<10; ++i) {
for (int j=0; j<8192; ++j)
fputc('x', stdout);
for (int j=0; j<8192; ++j)
fputc('x', stderr);
}
return 0;
}
var WShell = new ActiveXObject("WScript.Shell");
var wsexec = WShell.Exec(cmd);
var output = "";
var error = "";
// Keep looping until the program exits
while (wsexec.Status == 0) {
while (!wsexec.StdOut.AtEndOfStream) {
output += wsexec.StdOut.Read(1);
}
while (!wsexec.StdErr.AtEndOfStream) {
error += wsexec.StdErr.Read(1);
}
WScript.Sleep(100);
}
WScript.Echo("Output: " + output);
WScript.Echo("Error Output: " + error);
The solution that I came up with (which has its own drawbacks in terms of performance), is to write the program output to files, then read it back in the script once the program has finished. This solves the deadlock problem at the expense of decreasing the performance by requiring the output to be written to disk as an intermediate step (although the OS may not flush it to the physical disk). The code for this solution is presented below:
/** Run a command, in a separate process and retrieve its output.
*
* This is a safer, slower, alternative to WshShell.Exec that supports
* retrieving the output (to stdout and stderr) only after the command
* has completed execution. It does not support writing to the standard
* input of the command. It's only redeeming quality is that it will
* not cause deadlocks due to the blocking behavior of attempting to read
* from StdOut/StdErr.
*
* @param cmd The name/path of the command to run
* @param winstyle The window style (see WshShell.Run) of the command, or null
* @return An object with an exitcode property set to the exit code of the
* command, an output property set to the string of text written by the
* command to stdout, and an errors property with the string of text written
* by the command to stderr.
*/
function run(cmd) {
var tmpdir = FSO.GetSpecialFolder(2 /* TemporaryFolder */);
if (!/(\\|\/)$/.test(tmpdir))
tmpdir += "\\";
var outfile = tmpdir + FSO.GetTempName();
var errfile = tmpdir + FSO.GetTempName();
// Note: See KB278411 for this recipe
// Note2: See cmd.exe /? for interesting quoting behavior...
var runcmd = '%comspec% /c "' + cmd + ' > "' + outfile + '" 2> "' + errfile + '""';
var wshexec = WShell.Exec(runcmd);
// Write stuff to the standard input of the command (through cmd.exe)
// Note: This will block until the program exits if significant amounts
// of information are written and not read. But no deadlock will occur.
// Note2: This will error if the program has exited
try {
wshexec.StdIn.Write("stuff\n");
} catch (ex) {
WScript.Echo("Unable to write to program.");
}
// Do stuff, or write more stuff while cmd executes, or wait...
while (wshexec.Status == 0)
WScript.Sleep(100);
exitcode = wshexec.ExitCode;
var output = "";
try {
var outfs = FSO.OpenTextFile(outfile, 1 /* ForReading */);
output = outfs.ReadAll();
outfs.Close();
FSO.DeleteFile(outfile);
} catch (ex) { }
var errors = "";
try {
var errfs = FSO.OpenTextFile(errfile, 1 /* ForReading */);
errors = errfs.ReadAll();
errfs.Close();
FSO.DeleteFile(errfile);
} catch (ex) { }
return { exitcode: exitcode, output: output, errors: errors };
}
result = run("dir");
WScript.Echo("Exit Code: " + result.exitcode);
WScript.Echo("Output:\n" + result.output);
WScript.Echo("Error Output:\n" + result.errors);
Remember, Don't ever WshShell.Exec a command directly if you are not sure of its inputs and outputs and your script deadlocking would be a problem.
Monday, February 8. 2010
Configuring Tomcat6 with Eclipse in Debian
After some moderate difficulty (mostly due to bugs 507536 and 552480) I have finally managed to setup Eclipse as a build environment for Apache Tomcat. The process is as follows (Note that $DEVELSERVERDIR
can be anything/anywhere that you have access to as a normal user):
- aptitude install eclipse tomcat6-user
- Install Java EE Development Tools plugin from the Eclipse WPT Project
- tomcat6-instance-create
$DEVELSERVERDIR
- cd
$DEVELSERVERDIR
- ln -s /usr/share/tomcat6/lib
- ln -s /usr/share/tomcat6/bin/bootstrap.jar bin
- cp /var/cache/tomcat6/catalina.policy conf
- Configure tomcat and eclipse to your liking and add the Tomcat instance to the Eclipse servers list
At this point you should be able to import/create projects which can be run on your local server instance. (If not, you are in the same boat I was in this morning... and best of luck to you)
Saturday, February 6. 2010
Removing All Nonpresent Devices with DevCon
Background
Windows remembers all devices which have previously been connected to the system and retains the drivers for those devices such that if the device is reconnected the driver searching and loading process can be avoided. This is a very useful behavior for devices which are commonly connected and disconnected (e.g. USB devices), but it has the side-effect that devices which are used once, or used for a finite amount of time, will forever be remembered by Windows. It is unclear exactly what "remember" means in this context, as I am not exactly sure what sort of storage or pre-loading of the drivers is occurring, but I can confirm that drivers for these "non-present" devices can affect the system, either with compatibility problems introduced by the driver or with performance problems from many accumulated "non-present" devices (Note: I doubt the performance impact is noticeable on all but extreme cases - like replacing your motherboard more than once with different hardware...).
Managing "Non-Present" Devices
The process for viewing these "non-present" devices in Device Manager is outlined in KB315539. They can also be viewed using the Microsoft DevCon utility. In fact, both programs can be used to remove, as well as view, the "non-present" devices. In Device Manager, the devices can be uninstalled just like any other. In DevCon, you must prefix the Device ID with a "@" in order for it to remove the device.
However, neither of these solutions allows for a quick method of removing all "non-present" devices, and on a system with hundreds of such devices this can be a significant hassle. To overcome this difficulty, I have written a script to remove all "non-present" devices which are not legacy or SW devices (since these are often required for other reasons - even when "non-present") using DevCon. The script can also read and write a file listing all devices to be removed (allowing for manual editing of the device list). You can download the script from the Remove Devices project page on the forge.
Thursday, February 4. 2010
Serious SPF/SenderID Problems in Exchange 2003
I was recently bitten by KB910272 (again). For the unfamiliar, this issue causes Sender ID/SPF records with non-class full subnets to be processed incorrectly and mail from those domains to be rejected. The number of such domains is surprisingly large for this problem to remain unsolved in the normal patch cycle... or so it seems to me right now.
To solve the problem, apply the hotfix from KB910272 or, even better, from one of the superseding updates: KB927478 or KB951639.
Wednesday, December 30. 2009
HTML Multiple Select Without Ctrl
The HTML select element has received much criticism when used for multiple selection over its usability problems. Although it may be simple enough for experienced users, it must be explained in detail (repeatedly) to the less experienced users, who will always find its behavior to be unintuitive. Recently we have come up against this problem and are attempting to work around it.
For some useful background and some great alternate implementations of multiple-selection see Ryan Cramer's analysis and replacement ideas as well as a jQuery Simple Multi-Select plugin which offers an on-demand replacement for the select multiple elements.
The approach that I took was to attempt to leave the select element alone as much as possible, since it is widely recognized and understood by HTML veterans (and already integrated in huge amounts of legacy code), and make it a little more user-friendly by removing the requirement of holding Ctrl to select multiple options. With the modifications in place, clicking on an option will toggle that option between selected and un-selected, while holding Ctrl or Shift will result in the same behavior as before. The solution is written in JavaScript with jQuery (although there is no reason it couldn't be reworked to remove the dependency on jQuery) and was completed without any browser sniffing (only feature-detection).
The script can be found in the forge on the Improve Multiple Select project page. The script can be directly downloaded from the files section (I suggest against directly linking to it - since the location may change). As always, feedback, suggestions for improvement, bug reports, etc. are greatly appreciated.
Friday, October 16. 2009
HOWTO Modify Certificate Templates in SBS 2003
When using the Certificate Template MMC Snap-In to modify certificate templates in Windows Server 2003, any modifyable template is saved in the Version 2 certificate template format, which can only be used by Windows Server 2003 Enterprise Edition. This creates a significant annoyance when attempting to use certificate templates on a non-Enterprise Edition server (such as Small Business Server). However, when copying a Version 1 template (and making only minor modifications), there is no reason that the template can't be used as a Version 1 template, if you make some modifications by-hand.
Case in point: Setting up offline L2TP/IPSec Certificate Templates. If you have been following KB 555281, you are likely stuck at How to issue the custom L2TP/IPSec (Offline request) template. Here's the trick:
- Setup ADSI Edit, or any other LDAP editor
- Open adsiedit.msc (or your LDAP editor) and browse to CN=Configuration,CN=Services,CN=Public Key Services,CN=Certificate Templates
- Open the Properties for the template that you would like to use
- Change msPKI-Template-Schema-Version and msPKI-Template-Minor-Revision from 2 to 1 (not sure if msPKI-Template-Minor-Revision is really required...)
- Refresh the Certification Authority MMC Snap-In if it is open
That's it. If the template version is set to 1, you can issue the template in any edition of Server 2003.
Note: If you make substantial changes to the properties of the template this trick may not work. The differences in how the template versions are processed can be significant, but this process is likely to work for most simple changes to an existing Version 1 tepmlate.
A Warning to SonicWall Users about IP Fragmentation
Recently I discovered and corrected an obscure problem on a client's system relating to SMTP mail not being received from a single remote domain. The ultimate cause turned out to be the cause for an earlier (only partially solved) problem relating to POST data getting lost for the server hosting their website, and it is all the result of the default configuration on their SonicWall firewall.
By default, SonicWall will block/discard fragmented IP packets. This can lead to very difficult to diagnose problems as large packets (packets larger than the MTU of any link between the source and destination) will mysteriously fail to arrive. To solve the problem, follow the instructions to re-enable fragmented packets.
Note: The reason that fragmented packets are disabled by default is reasonable (at least for simple IP implementations). An IP implementation must keep track of fragments received but not yet reassembled so that when other fragments of the packet arrive (possibly much later and out of order) the original packet can be reassembled. Attackers can use this fact to contribute to a DoS attack by sending many packet fragments which do not contribute to complete packets. This will force the victim system to hold the fragments in memory and exhaust system resources. However, I would expect all practical IP implementations have a limited cache size for fragments to mitigate this scenario...
In summary, I find this default configuration completely unacceptable. A "break the Internet" default policy is ridiculous. I'm surprised that this hasn't bitten more people and wasted more time (or that the affected people haven't complained more loudly about their wasted time). Perhaps it is just Montana that is still using carrier pigeons and other forms of transport with small MTUs...
Monday, October 5. 2009
Broken Date.format in Microsoft ASP.NET 2.0 AJAX
The most recent version of the Microsoft ASP.NET 2.0 AJAX Extensions 1.0 (1.0.61025 according to the support info. in Add/Remove programs) has a bug that just bit me. This library adds a format function to the Date object which provides support for Date and Time Format Strings comparable to the .NET implementation. This is a very handy feature, given JavaScript's lack of a built-in date formatting function. But, if you are going to use it, be careful about the following bug: The time zone offset formats have incorrect sign. For example, if you are in the Pacific Time Zone at UTC-8, Date.format will return formats with UTC+8 (which matches the sign of JavaScript's Date.getTimezoneOffset method).
I created a workaround for the websites that I maintain which does not reimplement Date.format (and likely introduce new bugs) nor require any changes to existing code (with one exception - noted below). This code is a horrible hack and I am almost too embarrassed to post it... but not quite. Here it is:
(function() {
var dfmt;
function fixedDateFormat(format) {
var gto = Date.prototype.getTimezoneOffset;
try {
Date.prototype.getTimezoneOffset = function() { return -gto.call(this); }
return dfmt.call(this, format);
} finally {
Date.prototype.getTimezoneOffset = gto;
}
}
var fixfails = 0;
function fixIt() {
dfmt = Date.prototype.format;
if (typeof(dfmt) != "function") {
// If the library has not been loaded yet, defer
if (++fixfails < 3)
setTimeout(fixIt, 100);
return;
}
if (new Date().getTimezoneOffset() > 0 == new Date().format("zz") > 0)
Date.prototype.format = fixedDateFormat;
}
fixIt();
})();
Just include this code somewhere on the page and it will wrap the Date.format function and negate Date.getTimezoneOffset so that the formatted offset is correct. The significant limitation of the code is that if it is included before Date.format is defined, it will wait for the code to load and try again later. The problem with this approach is that code which is run after Date.format is defined before the fix is installed will still trigger the bug and it will be hard to figure out why that code is not working. Therefore, if possible, include this code immediately after the Microsoft AJAX extensions JavaScript is loaded on the page. Alternatively, use your own workaround or avoid using any of the "z" specifiers in your custom format strings.
Tuesday, September 29. 2009
CallerID for Windows Fax Service
Having recently configured fax sending/receiving on both a Windows server (using Windows Fax Services on SBS2003) and a Linux server (using Hylafax on Debian Lenny), I felt compelled to write down a few troubleshooting steps for fixing Caller ID problems. Both of these systems include Caller ID information with received faxes (separate from any TSID), but often this information is missing and it is difficult to determine why. Make sure that each of the following items are satisfied:
- Does the phone line to which the fax is connected have Caller ID service from the phone company? (easy to test using a Caller ID-capable phone)
- Does the modem support Caller ID? (check the specs, or skip to the next step)
- Is Caller ID enabled in the modem? Is it being received? To test this, use the following steps:
- Stop the fax service
- Connect to the modem using Hyper Terminal, telnet, or cu
- Query the status of Caller ID on the modem using
AT#CID?
(or a different modem-specific command). If the modem reports 0 (disabled), enable it withAT#CID=1
and add this to the modem initialization string so it will be enabled after reboot/reconnect. - While still connected to the modem, make a call to the phone line and watch for Caller ID information to be printed. If no Caller ID information is printed, continue troubleshooting the modem/phone until the information is printed.
- Disconnect from the modem when finished and restart the fax service
Once the items above are satisfied, the Caller ID information should be gathered by the fax service and included with received faxes. If not, check that the initialization commands include enabling Caller ID, if it is disabled by default (try restarting the fax service, then step 3 from above. If the Caller ID is disabled, something in the fax service is disabling it - or not enabling it). Otherwise, something else is broken and you should add a comment below on how to fix it.
Sunday, September 13. 2009
You can't not carry out this action at the present time (Error 2486)
I recently encountered the following error in Microsoft Access 2000:
Apparently this is a catch-all error for any condition where the database engine (or presumably one thread of the engine) is busy completing a task and something else (such as a VBA macro) attempts to perform a function which requires the engine to perform some task. It is horribly difficult to track down, due to the lack of internal debugging interfaces in Access, but check for macros which may be updating visual elements in a form and getting into a loop (for example, OnCurrent actions cascading in a loop or likewise). In my particular case, the error was only triggered when a filter was applied in a form which contained a subform with a field whose default value depended on a field in the parent form. Once I removed the default value from the field in the subform, the error disappeared. (Note also that this only occurred in Access 2000, not Access 2003)
Good luck in tracking down the source of this error if you encounter it.
An error occurred while communicating with scanning device
In the office we are using an HP OfficeJet 6310 All-In-One (donated by a friend, thanks mate!). I recently encountered a problem while attempting to scan from the device using the provided HP software bundle. The (very uninformative) error message that I received after pressing the Scan Document button was:
The communication problem is often solved by reinstalling the drivers (simply remove the printer from Printers and Faxes and click "Add Device" from the installation CD - no need to reinstall the included software) or by correcting common network misconfigurations (wrong IP, disconnected network cable, etc.). Steps for basic troubleshooting are discussed on HP's Website for USB and Network connections.
If/When the basic troubleshooting procedures don't solve the problem, there is a very involved solution for correcting the driver network connection settings posted by Ken Leon on the HP Forums (along with other useful suggestions/diagnostics in that thread).
However, in one (increasingly common) case, it may be possible to avoid the need to reconfigure the internals of the scanner drivers. If your ISP participates in DNS Hijacking, it may be prudent to rule this out as a cause of the error. It appears that the scan drivers attempt to address the device based on its hostname rather than its IP address (regardless of configuration) and when it resolves the address of the device, the DNS request gets hijacked and the scan drivers attempt to connect to the ISP server instead of the device. To diagnose this, simply disconnect the network from the Internet (and restart the router and computer after disconnecting to clear their DNS cache) and see if the error still appears. If it does, then this is not the problem, if it does not, you have found the problem. To remedy the problem, add an entry to the hosts file which associates the hostname of the device (both the name displayed in its web configuration as the hostname and its MAC address prepended with "HP" as this is often used by the drivers).
Thursday, July 23. 2009
Write Conflict Where None Exists
BIT
and was allowed to be NULL
. Changing the columns to BIT NOT NULL
and refreshing the linked table in Access solved the problem.
Update: KB 280730 covers this issue.