Monday, July 23, 2007

MPICH.NT, MPICH2, and TCP Offloading

Recently we came across a strange problem where on some new machines MPICH.NT and MPICH2 both would fail to correctly operate across machines. On a single machine there were no problems, and actually the application partially worked under MPICH.NT.

My initial guess was MPICH.NT did not play well with Windows 2003 Server R2. Previously we have only had Windows 2000 Server for the MPI jobs, so it seemed logical that changing the server OS would cause some issues. I recompiled the application for MPICH2 (bigger, newer, better) and found the application "hung" in exactly the same fashion as under MPICH.NT.

So now I had a common failure mode across two versions of MPICH (which are wildly different under the hood) on the same OS. I started running MPICH in debug/verbose mode and spent a lot of time looking at the thousands of lines of output and noticed that both under MPICH.NT and MPICH2 they came to the same place and halted, only I couldn't tell where in the code this was.

You may think this is where I fired up the parallel debugger and did wild and crazy things, but that takes too much time. I went with good ole fashioned printf debugging. I ended up getting output like the following:
[0] calling MPI_Bcast ...
[1] calling MPI_Bcast ...
[2] calling MPI_Bcast ...
[3] calling MPI_Bcast ...
[4] calling MPI_Bcast ...
...
[N-2] calling MPI_Bcast ...
[N-1] calling MPI_Bcast ...
Where [X] is the process number and if the call was successful the MPI_STATUS code would be returned on the next line, however, none of these calls returned. An important thing to note is these MPI_Bcast calls were sending buffers of 150MiB+, which is quite large. This fact drove me to check on the network card settings.

While poking around in the network card settings, I noticed two things:
  1. Network card drivers were out of date
  2. Network card status tool counters showed some errors in TCP Offloading
Using the information that TOE had reported errors in the past, I reran the application and watched the TOE error counters. Sure enough, when that MPI_Bcast line was reached, the TOE error counters incremented by one. So I went and disabled TOE on my two test machines and reran the application.

Boom, problem solved. Well...not really.

Disabling TOE will kill performance for other applications that do not have an issue with offloading. However, I cannot upgrade the drivers for the network card (even as a test) without going through 10 miles of red tape. That is neither here nor there, the important part is the problem has been identified and can be solved.

So if you have an HP NC373i Multifunction Gigabit Ethernet Adapter and experience problems with MPICH or MPICH2 on Windows, it is probably the TCP Offloading Engine. Try updating your drivers or disabling TOE to solve the issue. I will post an update if the latest drivers indeed fix this issue.

3 comments:

Unknown said...

I just had the same issue with the same NIC, a NC373i, but different applications. A driver update did not help, but disabling TCP Offloading did.

Abinaw said...

Chimney offload issues arises out of the oudated adapter or drivers. If you are having issues even after disabling RSS and TOE then use this patch which is released by Microsoft to fix their NDIS.
http://support.microsoft.com/kb/936594
thanks
-abhinaw

Abinaw said...

If you are having issues even after updating your Network adapters and drivers which support Task offload feature then use this patch from Microsoft
http://support.microsoft.com/kb/936594