Thursday, December 13, 2007

Zoom multiple controls from the same slider in XAML

Recently we've been playing around with XAML and WPF at work, and a common situation we run across is wanting to zoom different Page's without having separate zoom functionality. One slider should be enough to zoom any Page we hook up.

The basics start with an attached DependencyProperty called Zoom.
public static class AttachedProperties
public static readonly DependencyProperty ZoomProperty
= DependencyProperty.RegisterAttached("Zoom", typeof(double), typeof(UIElement),
new FrameworkPropertyMetadata(1.0,
FrameworkPropertyMetadataOptions.AffectsRender, null /* PropChangedCallback */,
new CoerceValueCallback((obj, value) => (double)value < 0.0 ? 0.0 : (double)value)));

public static double GetZoom(DependencyObject obj)
return (double)obj.GetValue(ZoomProperty);

public static void SetZoom(DependencyObject obj, object value)
double val = Double.Parse(value.ToString());
obj.SetValue(ZoomProperty, val);
Now we have an attached property we can connect to any UI Element! Notice that I suffixed 'Zoom' with '-Property' for the static variable, yet gave RegisterAttached just 'Zoom'. This is by convention (coincidentally so are the Get/Set pair below the static value).
<Page x:Class="ZoomTest.Page1" x:Name="page" xmlns="...">
<Grid />
Now we have a Page that starts with a default zoom property of 80%. However, this does not do us much good. We don't actually tell the page how to zoom anywhere. This is where Binding comes into play.
<Page x:Class="ZoomTest.Page1" x:Name="page" xmlns="...">
<Grid LayoutTransform="{Binding Zoom, ElementName=page}">
Now anything inside of the Grid will get a LayoutTransform based on the Zoom property of the Page! Unfortunately the LayoutTransform requires an actual Transform and not a double. Now we could have made the Zoom property a ScaleTransform, but that makes it less useful as an attached property. However, all is not lost:
public class ZoomConverter : IValueConverter
public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
if (targetType != typeof(Transform))
throw new InvalidOperationException();

double val = Double.Parse(value.ToString()) / 100.0;
return new ScaleTransform(val, val);

public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture)
throw new NotSupportedException();
Now we just have to hook the converter up to the data binding, and viola! We have a Page who's Grid content zooms in and out with respect to a given scale.
<Page x:Class="ZoomTest.Page1" x:Name="page" xmlns="...">
<Page.Resources><local:ZoomConverter x:Key="zoomConverter" /></Page.Resources>
<Grid LayoutTransform="{Binding Zoom, ElementName=page, Converter={StaticResource zoomConverter}}">
At this point all we have to do is wire up a Slider (two step process wraaa)!
<Window x:Class="GeometryTest.Window1" x:Name="window"
xmlns="..." local="clr-namespace:ZoomTest"
Title="Zoom Test" Height="480" Width="640">
<DockPanel LastChildFill="True">
<ToolBarTray DockPanel.Dock="Top">
<Label>Zoom:</Label><Slider x:Name="zoomSlider" Minimum="1" Maximum="100" Value="50" />
<Frame Source="Page1.xaml" LoadCompleted="ContentFrame_LoadCompleted" />
void ContentFrame_LoadCompleted(object sender, NavigationEventArgs e)
Binding binding = new Binding();
binding.Source = zoomSlider;
binding.Path = new PropertyPath("Value");
(ContentFrame.Content as UIElement).SetBinding(AttachedProperties.ZoomProperty, binding);
Now, changing the zoom value with the Slider will cause the Grid to zoom in and out! Bonus points for adding a ScrollViewer to the Page to let you actually see the change in size of your Grid.

Tuesday, November 13, 2007

Emergency Medical Technician

My absence from blogging lately can be attributed to a recent change in my life. I'm nearly finished with my Emergency Medical Technician - Basic training at a local community college. This class has been really challenging, and quite a lot of fun. However, studying has dominated my free time, and I have less time at work to blog (because I work less hours now).

Recap of fun things:
  • Hospital clinicals were great
  • Ambulance ride-alongs have been fun as well (one more left!)
  • The Jaws-of-Life are so indescribably cool (I got to cut the roof off of a car), quite literally the Jaws-of-Life will cut through anything you can fit them around on a car.
In August I begin Paramedic school, however, to fill the gap from January (certification as an EMT-B) until August I will be volunteering for a local Fire/Rescue squad as either an EMT-B first responder or an EMT-B in an ambulance. I will keep everyone (all seven of you) updated as this part of my life continues.

Nota bene: I will still be working as a software engineer, just volunteering as an EMT-B.

Tuesday, October 2, 2007

I Work in an Almost Award Winning Trailer

People often tell me they think I work in some glorious office building, or at least somewhere cool. This is pretty far from the truth. On site I've worked in "Green Acres", the e-Cave, shop floor mezzanines, and now a trailer.

"Green Acres" was the first place I worked on site. It was the green office area, which picked up its folksy nickname because you were deep inside a manufacturing building, bathed in constant fluorescent light, unable to hear anything from the outside. If you came in to work and the sun was shining, your brain expected the sun to be shining when you left. Unless lightning struck the building, we were completely oblivious to outside noise. Just the constant hum of fluorescent lighting.

Later I switched groups and was given the opportunity to work in the e-Cave. Imagine a room with open faced cubicles lining the walls. Seven interns and four salaried workers all crammed into a big open room with some conference room chairs, a projector, and a screen. My fondest memories of working on site were from the e-Cave. Come 11am, the whole crew would rally for lunch. I swear those were the most productive years; my old boss swears they were the least productive years.

This same group got bounced around to various locations, eventually settling down in a shop floor mezzanine. We stayed there for almost 3 years. It wasn't bad, the interns had their own 'row', we spent a lot of time in each others cubes. Pretty standard corporate experience....except the place smelled like feet (ed: as I am writing this my shoes are off and I'm just in idea why that place smelled like feet, nope).

Not too long ago we got the news that we would be moving to the 14-wide. This 14 trailer wide (yes 14) "modular office space" is actually the bigger of two trailers built on the far side of the site. Our sister trailer is only 13-wide, but it is hard to tell they are short one trailer. Our trailer is complete with a buried septic tank (that has hit the critical level alarm twice), offsite power (we've lost it no less than 3 times), and a gravel parking lot. I will say though that parking is a lot more fun when you can fishtail into a spot.

It is hard not to laugh at where I work. I work for a Fortune 10 a trailer. We do all sorts of crazy nuclear a trailer. But wait, this isn't just any trailer. This is an Almost Award Winning Trailer. Our 14-wide received honorable mention in the 2007 Modular Building Institute Awards of Distinction for a modular office space of <5,000sqft. I'm sure we were neck-and-neck with the Trenton Police Department, so congratulations to them for their hard work and modular working environment.

Let us remember that in the world of working in a trailer, everybody is a winner!

Wednesday, September 19, 2007

To my noname friend: atoi and atof as they should never be done

Sometimes, when I get bored, I spend an hour or so reinventing the wheel. Earlier tonight I was asked how code like atoi worked. A quick look at an ASCII table and a few multiply-accumulates later, we have something like:
/* atoi - */
long atoi(const char *value) {
  long ival = 0, c, n = 1, i = 0, oval;
  for( ; c = value[i]; ++i) /* chomp leading spaces */
    if(!isspace(c)) break;
  if(c=='-' || c=='+') { /* chomp sign */
    n = (c!='-' ? n : -1); i++;
  while(c = value[i++]) { /* parse number */
    if(!isdigit(c)) return 0;
    oval = ival; /* save ival for overflow detection */
    ival = (ival * 10) + (c - '0'); /* mult/accum */
    if(ival < oval) { /* report overflow/underflow */
      errno = ERANGE;
      return (n>0 ? LONG_MAX : LONG_MIN);
  return (n>0 ? ival : -ival);
Yes, this is quite ugly, and if I ever catch you writing something this cryptic I'll force a 15 page code review on you. However, one can tell how much fun atoi actually is once you take into account error checking!

Naturally our conversation drifted to atof--well in his defense I drifted to atof--and I decided I should write a compliant implementation. As it turns out implementing overflow and underflow checking for floating point numbers is much harder (and trickier too!) than for integers. Below are the pretty printed sources to both my atoi and atof implementations, along with links to download. They are in the public domain and you should use them at your own risk, because if I were to catch you using either of these I will hold a code review so harsh it would make a death row inmate cry.
View atoi source (download)
View atof source (download)

Monday, August 20, 2007

Fun with FORTRAN intrinsics and portability

When porting code from one compiler to the next, you run into all sorts of fun syntactical issues, gotchas with floating point handling, and how each maintainer semantically interpreted the standard.

When porting code from one runtime library to the next, you run into even more fun! Who says the API of your favorite function has not changed? Perhaps you no longer can reference some functions. What could you do to mitigate these risks?

I stumbled across some code that attempted to mitigate these risks associated with the Compaq FORTRAN non-standard (yet invaluable) intrinsic SLEEP by calling out to the C Runtime Library's sleep routine (compliant under ISO/IEC 9945-1:1990, "POSIX.1"). As the code progressed through the years, the original maintainers noted that on Windows, the sleep routine was renamed to Sleep. A simple change to the interface definition fixed the linking issues:
subroutine SLEEP(seconds)
integer*4 :: seconds
end subroutine
end interface
However, careful users would notice that this change (as an attempt to use the more stable C RTL version of sleep) has an unexpected side effect. MSDN states that the single parameter given to Sleep is actually, "The minimum time interval for which execution is to be suspended, in milliseconds."

So, code that once called SLEEP(1) or SLEEP(5) expecting to regain control in 1s and 5s respectively, now sleeps for 1ms and 5ms respectively. This is well beneath the timeslice/quantum given to a process (6-55ms on Windows), effectively making the call an inefficient Sleep(0) (which in and of itself is an inefficient thread yield!). The correct action is to consult the Intel FORTRAN Libraries reference and note that in the portability library is a SLEEP function that replicates the non-standard intrinsic found for Compaq, and will work across all platforms Intel's FORTRAN compiler is supported. This is not a great solution, but it is also not the worst solution (hacking a layer on top of the Windows Sleep function to multiply the parameter by 1000).

As an aside, it is a bit of a programming error to rely on sleep for hard timing of any interval other than integer multiples of the timeslice/quantum (plus some amount of jitter). It is also a bit of a programming error to ignore changes to API's when moving compilers and libraries and operating systems.

Too bad compilers cannot catch either of these...

Wednesday, July 25, 2007

MPICH.NT, MPICH2, and TCP Offloading (continued)

Well we tried updated drivers for the NC373i but with no avail. We were still seeing TOE errors which resulted in MPICH.NT and MPICH2 hanging. Luckily the machines have second NIC's, NC380T PCIe DP Multifunc Gigabit Server Adapters. Unluckily for us, those network cards exhibited the same problem as the embedded NC373i.

This leads me to believe that there is a problem with the Windows 2003 R2 Scalable Networking Pack. Specifically with the Chimney TCP Offloading portion. There may even be an issue in MPICH.NT and MPICH2 and Windows 2003 R2's SNP. However, that just seems highly unlikely as MPICH.NT and MPICH2 are wildly different under the hood.

We've initiated a case with Microsoft to get to the bottom of this issue. Meanwhile our new servers have TOE disabled, which isn't bad, but it isn't good either.

Monday, July 23, 2007

MPICH.NT, MPICH2, and TCP Offloading

Recently we came across a strange problem where on some new machines MPICH.NT and MPICH2 both would fail to correctly operate across machines. On a single machine there were no problems, and actually the application partially worked under MPICH.NT.

My initial guess was MPICH.NT did not play well with Windows 2003 Server R2. Previously we have only had Windows 2000 Server for the MPI jobs, so it seemed logical that changing the server OS would cause some issues. I recompiled the application for MPICH2 (bigger, newer, better) and found the application "hung" in exactly the same fashion as under MPICH.NT.

So now I had a common failure mode across two versions of MPICH (which are wildly different under the hood) on the same OS. I started running MPICH in debug/verbose mode and spent a lot of time looking at the thousands of lines of output and noticed that both under MPICH.NT and MPICH2 they came to the same place and halted, only I couldn't tell where in the code this was.

You may think this is where I fired up the parallel debugger and did wild and crazy things, but that takes too much time. I went with good ole fashioned printf debugging. I ended up getting output like the following:
[0] calling MPI_Bcast ...
[1] calling MPI_Bcast ...
[2] calling MPI_Bcast ...
[3] calling MPI_Bcast ...
[4] calling MPI_Bcast ...
[N-2] calling MPI_Bcast ...
[N-1] calling MPI_Bcast ...
Where [X] is the process number and if the call was successful the MPI_STATUS code would be returned on the next line, however, none of these calls returned. An important thing to note is these MPI_Bcast calls were sending buffers of 150MiB+, which is quite large. This fact drove me to check on the network card settings.

While poking around in the network card settings, I noticed two things:
  1. Network card drivers were out of date
  2. Network card status tool counters showed some errors in TCP Offloading
Using the information that TOE had reported errors in the past, I reran the application and watched the TOE error counters. Sure enough, when that MPI_Bcast line was reached, the TOE error counters incremented by one. So I went and disabled TOE on my two test machines and reran the application.

Boom, problem solved. Well...not really.

Disabling TOE will kill performance for other applications that do not have an issue with offloading. However, I cannot upgrade the drivers for the network card (even as a test) without going through 10 miles of red tape. That is neither here nor there, the important part is the problem has been identified and can be solved.

So if you have an HP NC373i Multifunction Gigabit Ethernet Adapter and experience problems with MPICH or MPICH2 on Windows, it is probably the TCP Offloading Engine. Try updating your drivers or disabling TOE to solve the issue. I will post an update if the latest drivers indeed fix this issue.

Tuesday, July 17, 2007

CNN, Kashiwazaki NPP, and the 2007 Niigata Earthquake

(ed: I normally don't blog about non-technical issues, but this chaps my cheeks to no end)

If you haven't heard yesterday an earthquake happened of the coast off the Niigata prefecture in Japan. It had a magnitude of 6.8 and caused serious problems for the Kashiwazaki-Kariwa NPP. A transformer caught fire at unit 3 (there are 7 units) and radioactive liquids spilled into the ocean. This is obviously a serious event and should be treated as such, however, the news coverage was yellow journalism at best. All of the initial reports were sensationalist and biased with little facts (partially due to the tight lipped nature of TEPCO) to support any of their claims.

To see how bad the yellow journalism got just read the following headline blurb:
Radioactive leak, tremors follow Japan quake
A strong earthquake struck northwestern Japan today, causing a radioactive leak and fire at one of the world's most powerful nuclear power plants. Eight people were killed and hundreds injured. The plant leaked about 315 gallons of water, according to a Tokyo Electric official.
So, how many casualties were a result of the radioactive leak and fire:
  • 8 deaths and hundreds of injuries
  • 6 deaths and hundreds of injuries
  • 1 death and tens of injuries
  • 0 deaths and 0 injuries
If you guessed 0 and 0 you would be right!

But wait...

Didn't the blurb say that eight people died and hundreds were injured by the radioactive leak and fire?! Why yes, yes it did. This is far from ethical, yet CNN went right ahead and posted that online.

Don't believe me? Check out this screenshot of CNN's misrepresentation of the Kashiwazaki-Kariwa leak and fire.

Wednesday, July 11, 2007

Lessons Learned Flying Space Available

Flying space available is great. It is cheap, fun, and can be very flexible. However, when things get busy you can find yourself in trouble. Space available travel is prioritized and if you aren't an employee of an airline, you'll find yourself at the bottom of the pile.

You can mitigate your troubles by always arriving for the earliest flight possible. Quite often people oversleep and miss the early flights. Waking up at 3:30 AM to possibly catch the 5:00 AM flight might sound awful, but it may be the only way to make it to your destination.

If you are not able to make it onto a flight you are listed for, you are automatically rolled over to the next flight. However, twice I have not been rolled over for one reason or another and had to be manually moved. Always double check to make sure you have been added to the next flight.

When you miss a flight, it may seem tempting to use the 1-800 number most airlines have to change to a different option. However, I've found they tend to not get the change made properly. Always use the gate agents to make the change and quite often they will also suggest alternate routes (if only to get you off their back).

If you happen to be woefully unlucky and are bumped from one day to the next save your boarding pass from the day before. You can use that to skip the lines at the ticketing counter and go straight through security. Once you get to the gate double check that you are indeed listed on the flight and get yourself a new flight coupon (or if really lucky a boarding pass).

The last bit of advice is to be as sociable as possible. Hang out with the gate agents, any flight attendants who are stuck, and yuck it up with other space available passengers. I've met so many great people sitting around in airports waiting on flights. Some of these people can even help you out if you're really stuck, so it's worth being polite and friendly. Getting somebody a $3 airport coffee can go a long way towards making it onto a flight.

Sunday, June 24, 2007

Conflicting COM Interop DLL Names

Recently our job scheduling software got a version bump from 5 to 6 and I'm in charge of bringing our support libraries up to date. A quick look into the changelog showed that I would have to do some under the hood work, but I have a relatively abstracted class library which would be easy to add support for version 6.

Not so fast. The job scheduler uses COM objects as its method of interaction, and .Net has to build an interop assembly to talk to it. The easy way, Add Reference -> COM, should work. It should work, provided the COM DLL's have different names. Thanks to Murphy's Law, the two DLL's, while under different folders, have the same name!

.Net generates Interop.X.dll and Interop.X.dll, even though the two are in different folders and represent different versions, solely because our job scheduler's COM DLL is X.dll, in both folders. While this is not necessarily their fault, it certainly makes the lives of those of us who do integration harder (their COM object model is pretty bad to start with).

Thankfully, Microsoft provides the means to create your own Primary Interop Assembly from a DLL. Using TlbImp you can create your own COM Interop DLL, complete with a non-conflicting name and namespace.
TlbImp version5\X.dll /namespace:X /out:Interop.X.dll
TlbImp version6\X.dll /namespace:X6 /out:Interop.X6.dll
Now I can import these two conflict free and deal with more important issues, like the poor documentation included with the COM library.

Wednesday, June 20, 2007

KB925902: X is not a valid Win32 application

Recently we've had a number of new hires and interns come in as the business expands and schools get out. They've been given new computers, Core 2 Duo's and Xeon 51XX's. Last week a few users reported no longer being able to run certain older applications. Windows XP SP2 stated that "X is not a valid Win32 application."

Yet earlier in the day or in the week they had been able to run the same executable without any issues. On my XPSP2 machine I could run the executables fine, and Win2ksp4 had no problems either. At this point a massive search for "what changed" began.

The executables were compiled on MSVS6 and Compaq Visual FORTRAN, so perhaps it was their newer processors coupled with the way the executables were compiled. Sure enough recompiling under MSVS8 or Intel Visual FORTRAN allowed them to run the executables. However, one of the executables still had a problem. Thankfully, we got a new error (I hate old errors):
XXXXX.EXE - Illegal System DLL Relocation
The system DLL user32.dll was relocated in memory. The application will not run properly. The relocation occurred because the DLL C:\WINNT\system32\HHCTRL.OCX occupied an address range reserved for Windows system DLLs. The vendor supplying the DLL should be contacted for a new DLL.
Lovely, a Windows OCX control bumped user32. A quick google search brought up KB925902 as the offending patch. I went to one of the machines to look for the patch, but it appeared from Add/Remove Programs that this patch was never installed!

Before giving up saying the patch is not installed, a useful thing to note is you can browse to %WINDIR% and take a look at all the NTUNINSTALL$KB* folders to see every patch that has been applied. This list is much more exhaustive than the Add/Remove Programs list.

Sure enough, there was %WINDIR%\ntuninstall$kb925902\, and after uninstalling the patch, everything was fine on these machines. I wonder what KB925902 could have possibly changed to cause such a colossal error.
MS07-017: Vulnerability in GDI could allow remote code execution
So, a security bug in the graphics subsystem gets a patch which affects the ability to run console applications? You should read the article on MS07-017 to get a feel for how many subsystems are affected by their patch. Thank you Windows for making my life so wonderful.

Wednesday, June 13, 2007

New Intel Visual FORTRAN 10.0.025

Intel Visual FORTRAN 10.0.025 was just released, and of course I got my hands on it. I'd had troubles in 9.1 that were "Targeted to Fix," but I needed the codes compiled as soon as possible. So I install IVF10, and get started with my first project. I press F7 and wait...


The new version of IVF has a static verification component. It turns out that my project (about 105KLOC) causes the verification tool to run into the per-process memory limit (2GiB). Talk about cool. I've now broken two versions of the Intel compiler right out of the box!

I can't really blame Intel, I would imagine full static verification of a project of that size would be hard to do in 2GiB of memory. Besides, it is only a nicety, so I disabled it and compiled again. Without static verification it worked, however, when I went to run, my program reported that it could not find a file.

This file is pulled from an environment variable and as soon as I stepped through the application it became obvious what happened. Visual Studio's Environment configuration option for debugging had delimited the key value pair lines with \r\n instead of just \n. A temporary solution for this problem is to bring up a C++ project and input the environment there, then copy and paste the string into the IVF10 project's Environment setting. Not sure if this is an IVF10 or VS2005 issue.

I am now known at work as the code killer. Put something in front of me and I'll break it. Whoops.

Tuesday, June 12, 2007

.Net Deployment Build Error HRESULT = '80004005'

I've got a Visual Studio 2005 Deployment project for an application I distribute internally, and came across this crazy error while rebuilding the MSI file.
------ Starting pre-build validation for project 'MyProjectInstall' ------
ERROR: An error occurred while validating. HRESULT = '80004005'
------ Pre-build validation for project 'MyProjectInstall' completed ------
No other clues as to the actual problem. Some googling revealed that this has to do with a project building with references it does not need (or in my case stale references). A quick fix is to go to the offending 'Primary Output' project, and remove all of its non-Microsoft references. Then add them one by one until the project compiles. At this point your deployment project should compile without any hassles.

Friday, June 8, 2007

Drive Letter Economics

On Windows there is a fun property of the command line (not quite DOS) where you cannot change directory to a UNC path. This effectively makes it impossible to set your working directory to a UNC path from a batch file. To address this issue Microsoft has two methods of switching to a UNC path.

You can NET USE the path as a drive letter. However, you have to be sure that the drive letter you chose is not in use. When running in a large multi-user environment, you can see how this would become troublesome. More importantly, NET USE is semi-permanent, living for as long as the computer is on. You must unuse the drive letter assignment to free this up for other people.

Your other option is pushd which pushes a path name onto a virtual stack, making the path you specify the current working directory. If you pushd a UNC path, it is assigned a drive letter from the pool of open drive letters. Now, this too is semi-permanent (i.e. outlives the cmd instance it was done in). This assignment lives on until you unuse it or use popd. The more vexing part is on Windows 2000 Server these drive letter assignments affect everyone who uses the machine.

Let's say user A has a script that calls pushd without popd, if his script gets run enough times, eventually Windows 2000 Server machines begin running out of drive letters. So when user B's script runs on a machine without free drive letters, they are greeted with this fun message:
C:\>PUSHD \\machine\unc\path\here\
' ' is an invalid current directory path. UNC paths are not supported.
Aren't you glad you get an error message which reflects the problem?

Now on Windows 2003 Server this problem is non-existent. Users can only muck up drive letter assignments for themselves, not for everyone logged in to a machine. However, upgrading production servers to another operating system is not always a valid fix. The problem does not go away, just users are insulated from other users.

The correct solution is to follow the best practices and have a matching popd for ever pushd call you make. Of course, it wouldn't be a best practice if nobody ignored it.

Friday, June 1, 2007

Premature Optimization is the Root of all Evil

Donald Knuth was indeed right when he said that, "premature optimization is the root of all evil." In a few FORTRAN codes I have, the original programmers made use of boolean short circuiting. This technique is extremely popular in languages which support it. If you are unfamiliar with short circuiting it goes a little something like this, given:
if (expression1 .and. expression2 ... .and. expressionN) then
! some code here
end if
Short circuiting relies on the fact that the language will evaluate boolean expressions in order of precedence, from left to right. So if and only if expression1 is .TRUE. then expression2 will be evaluated. If and only if expression2 is .TRUE. then expression3 is evaluated, and so on and so forth. If, from left to right, any expression is found to be .FALSE. then the entire If statement is considered to be .FALSE., which in boolean algebra makes sense.

A common use of boolean short circuiting would be to protect against out of bounds array access in loops which may not stop at the end of an array. For instance:
real, dimension(:), allocatable :: myArray
do i = 1,m
if (i .lt. n .and. myArray(i) .op. someVal) then
! do something
end if
end do
Many languages support short circuiting by design, many support it by consensus, however FORTRAN does not make short circuiting part of the design and there is no consensus on its adoption. The above example works fine under Compaq Visual FORTRAN, but if you enable bounds checking on Intel Visual FORTRAN you get a run-time error.

Both CVF and IVF are following the standard with their interpretations, FORTRAN does not specify how a compiler should implement the above if statement. However, often times people adopt the unofficial standards created by compilers which interpret the standard in a certain way. CVF evaluates the statement above left-to-right and applies boolean short circuiting. IVF evaluates all components of the expression before making a decision. Both of these interpretations are correct, but they have interesting implications.
if (b .op. k .and. somefunc() .op. someval) then
! CVF and IVF may not execute this in the same fashion
end if
The problem with the above statement is that if IVF were to evaluate somefunc() before the comparison between b and k, potential side effects inside somefunc() could alter b or k, fundamentally changing the meaning of the statement. Worse still if the code was originally defined for CVF, the side effects of somefunc() could depend on being ignored when the comparison between b and k is .FALSE..

As a programmer you should mind the relevant standards and strive to rely on as few platform or compiler specific behaviors. The two above examples could be rewritten with their intentions preserved in only a few extra lines.
real, dimension(:), allocatable :: myArray
do i = 1,m
if (i .lt. n) then
if (myArray(i) .op. someVal) then
! all FORTRAN compilers will get here for the same
! reason
end if
end if
end do
if (b .op. k) then
if (somefunc() .op. someval) then
! all FORTRAN compilers will get here for the same
! reason
end if
end if
So pay attention to the fun problems you may create for the guy who inherits your code when you get all crazy. It has been said that 60% of programming is maintaining your code, however, I find in my job that number is closer to 80 or even 90%. Don't make your life any harder than it already is.

Thursday, May 31, 2007

.Net XmlSerializer and InvalidCastException

Many of our applications work via a plugin architecture, which allows us to be flexible in a lot of ways. A while back I ran into a problem with XML serialization and our plugin system. The error was confusing and the solution was non-obvious. The exception I recieved was the following:
System.InvalidOperationException: There was an error generating the XML document.
---System.InvalidCastException: Unable to cast object
of type 'MyNamespace.Settings' to type 'MyNamespace.Settings'. at
XmlSerializationWriterSettings.Write3_Settings(Object o)
I've made bold the confusing (and vexing!) part of the error. Apparently the XmlSerializer could not cast a type to itself? Worse still, the MSDN documentation does not list InvalidCastException as a common exception (which normally lists the boneheaded mistake your program made).

After a large amount of googling, I came across a snippet--which if you place in App.Config--makes the error disappear (but is not meant to remove any errors):
<add name="XmlSerialization.Compilation" value="4" />
What the "4" means, I could not tell you, but this magical block of code solved my problem. However, I am never satisfied with hacks like this, so I dug deeper. The root cause apparently is due to how I load my plugin and where the assembly is that called the XmlSerializer.

In .Net there are 3 assembly load contexts (plus assemblies can be loaded without context), each causes your types to be slightly different. If your plugin is loaded in the Load-From context (as mine was), the type MyNamespace.Settings is "branded" (so to speak) with the context it was resolved in. If your plugin uses an XmlSerializer, the temporary assemblies generated to speed (de)serialization are part of the Load context (or perhaps are without context, I haven't found out for sure). Therefore the type the XmlSerializer attempts to create is different in context from the type in your plugin.

I found the most effective strategy to combat this interesting error is to always use the Load context. This requires your plugin DLLs lie under the ApplicationBase or PrivateBinBase paths. All in all this is the best solution, considering Side-by-Side is the new Microsoft way of deploying applications and DLLs (to avoid DLL Hell).

Here is a short snippet of what the plugins may look like in your App.Config:
name="My Plugin"
assemblyName="MyPlugin, Version=,
Culture=neutral, PublicKeyToken=deadbeefbaadf00d" />
You could then load this plugin (after reading in the appropriate ConfigurationSection) like so, to ensure XmlSerializer works in your plugin:
PluginsSection pluginsSection =
config.GetSection("plugins") as PluginsSection;
foreach(PluginElement elt in pluginsSection.Plugins)
Assembly pluginAsm = Assembly.Load(elt.AssemblyName);
/* Reflect across the assembly looking for types with
* [MyAppPluginAttribute] or those that implement
* IMyAppPlugin, so an assembly can contain more than
* one plugin.
The .Net world has many intricacies and most seem to stem from this notion of Assemblies and satellite assemblies and manifests and ligers and unicorns, so don't be discouraged if you have a hard time working it all out.

Wednesday, May 30, 2007

Tracking down network gremlins

I've been besieged as of late by gremlins somewhere in the ether. They have stolen our token rings and have set fire to my home. Actually, it appears our file server is crapping out (again with those technical terms) at random intervals.

Well, how do I know it is the file server?

I did not know at first, the errors returned from FORTRAN applications were code 30, which basically means it could not open a file, but it did not know why. Later, I received some errors during reading and writing, which confirmed an issue with the file server (and not the application).

However, there were no useful error codes being returned!

Instead of rewriting these older applications to return the system error codes (newer ones include said detail) I wrote a canary application (in C if you must know). This tester would attempt to open a few files thousands of times in random order. Then read, write, read+write to each of these files thousands of times. It would do all of this in a giant loop, sleeping for a set amount of time at the end. During this loop it would rigorously check the return values of the functions, and die immediately (and loudly!) with the corresponding error code.

Sure enough it caught the error!

Wait, now that we know what the error is, why are we getting this error?

Preliminary analysis had it that the file server was CPU bound during the "hiccup". How could we really know what was the cause? Sysinternals has a lovely suite called PsTools which provides everything you could ever need to monitor processes from the command line. A simple trigger for the canary job to run a PsExec job when it died with an error was implemented:
psexec \\machinename pslist -s 90 -r 5 -x
Now we could get some output from the file server as to what it was doing when the job had the "hiccup". This worked well and we were able to identify the offending process (and even the offending thread!), yet that did not solve our problem. It only identified a cause and most likely not even the root cause! Eventually we will drill down to the actual problem and solve that (only to move on to the next issue, phew).

VAX Floating Point Numbers

So in the world of old hardware you have the DEC VAX. Big ole honkin' machines from the days of yore. They were introduced a decade before I was born and support for them was withdrawn before I graduated high school. By the time I began interacting with them, they were the old gray mare having been largely replaced by hardware like the DEC Alpha (AXP).

The transition from VAX to AXP was pretty smooth on OpenVMS and many companies, including the one I work for, made the move. Modern AXP processors are impressive and for a long time held the record for the fastest supercomputers in the United States.

Part of the allure of the AXP was it's support for data found on the VAX. VAXen came long before the IEEE 754 standard for floating point numbers, so it is not hard to see how they developed their own standard. IBM mainframes and Cray supercomputers both have (popular) floating point formats from around that time. Interestingly the VAX floating point format has some formatting dependencies on the PDP-11 (craaaazy) format, which can really make life hell.

So why would I bring this up?

When a company has been using computers for a long time, you end up with a need to store data somewhere. Now data that is a decade old is easy to interact with. Imagine going back another ten years. Imagine another ten. You're now knocking on the door of the advent of (roughly) modern computing. FORTRAN 66 (and later 77) is in its prime. VAXen and IBM mainframes rule the earth! Kidding, but at least VAXen ruled my company.

The amount of data which has been preserved is staggering. The only issue is, the number of machines which can natively read the data is diminishing rapidly. Compaq (the new DEC) is phasing out support for the AXP in 2004 and transitioning users to the Intel Itanium and Itanium 2 (cue up Itanic jokes). A certain nagging problem with this transition is the loss of native support for the VAX floating point format.

The two common formats I deal with are the VAX F_Float and G_Float, single and double precision respectively. The F_Float is bias-128 and the G_Float is bias-1024. Both the F and G representations have an implicitly defined hidden-bit normalized mantissa (m) like so:
F_Float is held in 32 bits and G_Float is held in 64 bits. Both formats sufferinherit from the PDP-11 memory layout, so the actual bits stored on disk is not true little endian.

So why is this a problem?

There are no modern processors (read: with future support) with native support for the VAX format. All of our codes which read in floating point data from old data files must make the conversion from the VAX format to their host format (which in all cases is IEEE754). This conversion is not nice and is in fact lossy.

IEEE754 S_Float and T_float, single and double precision respectively, cannot exactly represent all VAX floating point data. S_Float is bias-127 and T_Float is bias-1023 (note this is different than F and G). Both S and T have hidden-bit normalized mantissas, however IEEE754 supports "subnormal" or "denormal" forms, where the leading bit could be a 1 or a 0.
1.mmm...mmm (normal)
0.mmm...mmm (subnormal)
This does not bode well for direct conversion between the formats.

Even if the byte layout was the same, we still have two different forms for floating point numbers. Every time we make the conversion we lose precision. What is even more insidious is that VAX and IEEE754 do not have the same rounding rules (I'm not even sure the VAX has defined rounding rules!). Floating point formats are inherently inexact and how these inexact representations are interpreted with respect to rounding is very important.

Moreover, even if we overlooked the problems in representation of floating point numbers, what about exceptional numbers like Infinity and the result of a divide by zero operation? The VAX format only defines positive and negative "excess," which while akin to Infinity, causes an exception and cannot be used in math. IEEE754 encodes both positive and negative Infinity and includes a special case for mathematical operations which have no defined result, Not A Number (NaN). IEEE754 supports both quiet NaN's, which always produce NaN, and loud NaN's which throw floating point exceptions.

Ok, so if we ignore Infinity and NaN we still have a problem. IEEE754 supports positive and negative zero. VAX only supports positive zero. Why is this a problem? Not only is negative zero unrepresentable on the VAX, but many common mathematical operations on IEEE754 can result in a negative zero (say converging from the "left" of zero).

Wow, so basically we're screwed.

Or not. The path to go down is one where the data gets converted to the new standard (new being in the last 15 years or so) which is (more-or-less) a universal standard on processors. This is a time consuming task, and one that needs to be approached carefully to ensure a high degree of fidelity. However, it needs to be made to ensure the longevity of both the software and the data.

Tuesday, May 29, 2007

Intel Visual FORTRAN oddity

So I come across some excellent FORTRAN77 code that I must convert to F90 and use Intel Visual FORTRAN with. Not a big deal, the code is well formed F77 and should convert to F90 in a straightforward manner.

Ha ha ha, I know, what was I thinking.

The conversion was easy going until mysteriously the compiler began crapping out (yes how very technical) with an abort code of 3. No error in my code, just the compiler was having internal issues. The specific error from the Intel FORTRAN 9.1 compiler was:
GEM_LO_GET_LOCATOR_INFO: zero locator value
This was truly vexing, because at the time I was in a rush to get this code ported over to IVF. Sure enough, there was an internal problem with the Intel compiler, confirmed by their support staff. A specific variable name (SNGL), coupled with some specific compiler flags (/iface:stdref /names:as_is) caused the abort.

A patch is in the works, meanwhile SNGL becomes singleVal in the converted code, and viola the problem vanishes. I'd love to see the root cause analysis on that bug!

Finally got one of these for work

I now have a blog for work related things, finally. I found my company's "Social Media & Blogging Guidelines," document and we're allowed to blog. We have to keep things appropriate, of course, but otherwise we are golden.

So I work for GE Energy, NuclearGE-Hitachi Nuclear Energy Americas (ed: name change as of 4 June 2007) as a software engineer. I'm the responsible engineer for codes ranging from FORTRAN 77/90, K&R C, C++, VB, VB.Net, Java, and C# 2.0. Mostly I work on GUI's (C# and Java) and support libraries (C, C++, FORTRAN, C#, Java), however, being a jack of many trades I also get in on the technology codes in FORTRAN.

Our systems range from Windows 2000 and XP on the desktop, Windows 2000 and 2003 on the servers, OpenVMS 7.X and 8.X servers, and a few scattered Linux/HP-UX/Tru64 boxen. We're trying to consolidate all of these systems, but personally I would rather the effort be placed on insuring interoperability across all of them (while least common denominator programming is at times frustrating, it keeps your code simple and most of the time easier to debug).

I spend a lot of time ensuring that our software remains well integrated, mainly utilizing API's which were set in stone before I was born. I get called upon to debug the crazy situations which happen when you bring together such an unholy trinity as FORTRAN, C, and C#. Yet the work is challenging and fun; my biggest grief being hard to find bugs and managing to break things which should not break. Ok, I lied, my biggest grief is procedures, but I think any engineer will tell you that.

I will be posting lots of technical issues that I come across and how I made it around them (or why I cannot seem to get around them). We'll see how this goes.