Showing posts with label assembly. Show all posts
Showing posts with label assembly. Show all posts

Monday, December 26, 2011

Philips Healthcare's Sierra ECG format XLI Compression Scheme

One of the services I work for has recently acquired a Philips HeartStart MRx cardiac monitor. It came complete with Bluetooth transmission of 12-Lead and event data. At roughly the same time our service installed a computer, an MDT, into the cab of our unit to interface with our county's CAD software.

Naturally I linked our monitor and our MDT via Bluetooth, and transmitted a 12-Lead from a rhythm generator. When the file landed on the MDT, I looked for an application to view the 12-Leads and rhythm strips, however, none appeared to be able to use the file as-is.

For the non-technical, the ECGs are shipped compressed--somewhat like a ZIP file--which contains all of your monitored vital signs, printed rhythm strips, and your 12-Leads. The format of the 12-Leads is an Open Standard; Philips Healthcare provides most of the details needed to use the files. The 12-Lead data is also compressed to save space. Unfortunately, there is no documentation which tells you how to decompress the 12-Lead data.

The technically-faint-of-heart should skip these next bits.

For the technical, the ECGs are contained in a Gzip'd TAR archive. The 12-Leads are stored inside in an XML format known as the Sierra ECG format (currently at version 1.03 or 1.04, as far as I can tell). Inside this XML format is Base64 encoded, XLI compressed data comprising the acquired leads during a 12-Lead (up to 16 leads appear to be able to be stored).

I searched for a description of the XLI compression format, however, I was only able to find a reference implementation for Microsoft Windows which simply decoded the files. No code or description was provided, and the implementation itself is not portable. (ed: it appears this may be a reference to the HP PageWriter XLi which Philips acquired)

At this point I decided my only option was to reverse engineer the XLI Compression format, and began with simple guesses. I tried decompressing the data using Deflate, Zip, and RLE without any progress. I was able to determine that the first 8 bytes of the compressed data included a compressed length, some uncompressed data, and that each of the 12 to 16 leads were stored in a chunk with one of these headers:
offset   2        4        6        8  ...
+--------+--------+--------+--------+--------+--------+--------+
| Size            |  Unk.  | Delta? | Compressed data...       |
+--------+--------+--------+--------+                          |
| ...                                             [Size bytes] |
+--------+--------+--------+--------+--------+--------+--------+
| Next lead chunk ...                                          |
Once the simple guesses were ruled out, I began exploring the behavior of the reference implementation provided for the Sierra ECG format. Using OllyDbg I noticed certain code tells which made me believe the decompression algorithm read 10-bits at a time:
SHR   EAX, 16h   ; reduce EAX to the 10-bit code word
SHL   ECX, Ah    ; prepare to read 10 more bits from the input
The compressed data also did not appear to contain a compression dictionary referenced by the code. At this point I considered I was looking at a form of Lempel-Ziv-Welch, or LZW, compression. LZW is a popular, lossless compression scheme which creates its compression dictionary on the fly. It is used by the GIF and TIFF image formats, and was the subject of controversy when it was first introduced into the GIF format due to patent licensing requirements.

In my quest to quickly reach a conclusion I found an excellent LZW implementation from Mark Nelson in C and it successfully decompressed the data. In fact, the structure of the C code was so familiar, I realized the reference implementation from Philips used the exact same code!

If you've reached this step while following along at home, you'll notice the decompressed data seems front-loaded with 0's. This is a case of intelligently streaming the data to the compression algorithm to take advantage of data duplication.

The uncompressed data represents 16-bit delta codes, of which the majority include 0x00 or 0xFF in their most significant byte (MSB). This is because they are either small and positive or small and negative, and as ECG data is rhythmic the delta codes are likely to retain the same sign for numerous samples.

To take advantage of this fact during compression, the delta codes are first deinterleaved into two halves. The first half includes each MSB and the second half includes each LSB. The pseudo-code for interleaving the decompressed data looks like the following:
# input contains the decompressed data
# output will contain the interleaved 16-bit delta codes
fun unpack( input[], output[], nSamples )
    for i <- 1..nSamples
        output[i] <- (input[i] << 8) | input[nSamples + i]
    endfor
endfun
At this point the delta compression scheme will need to be decoded to produce the actual signal data for each of the leads. The delta compression scheme is a simple recurrence relation (a second order difference relation) using the prior two delta codes:
# output contains the 16-bit delta codes
# first is the 16-bit delta code from the chunk header
fun deltaDecompression( output[], nSamples, first )
    x <- output[1]
    y <- output[2]
    prev <- first
    for i <- 3..nSamples
        z <- (2 * y) - x - prev
        prev <- output[i] - 64   # is -64 to 64 the range?
        output[i] <- z
        x <- y
        y <- z
    endfor
endfun
Now that you have the actual, per signal data all you need to do is recreate leads III, aVR, aVL, and aVF. This is done using the data from leads I and II as on most ECG machines. I've omitted the actual formulas for brevity.

Using my reference implementation of the decompression algorithm I was able to feed the original acquired 12-Lead to the Philips ECG to SVG converter, with the following results:


If you'd like to start playing with my code I welcome you to join my Github Project: sierra-ecg-tools. I am also working on a C implementation, and likely an Android implementation. Stay tuned, and apologies for the technical post.

The author has no financial ties to Philips Healthcare and received no compensation for this work.

Monday, April 21, 2008

RE: NTDebugging Puzzler 0x00000003 (Matrix Edition) Some assembly required

So the Microsoft Advanced Windows Debugging and Troubleshooting blog put out a puzzler based on reverse engineering assembler into something meaningful, say C/C++. I took a look at the assembly listing and decided it would be a fun break from work to decompile that into something. About 30 minutes later I wound up with some source that compiles down to nearly the same listing as theirs.

Phew, I've still got it! It took me some time monkeying around with the order of the local variables to get the same space added for the locals as in their example. Other than that it was pretty straight forward.
puzzler3.cpp
#include <stdlib.h>
#include <string.h>

/** Sort an input string's characters in descending order
*
*/
// puzzler3!myfun [c:\source\puzzler\puzzler3\puzzler3\puzzler3.cpp @ 20]:
void myfun(char* arg0)
// 20 00cc1480 55 push ebp
// 20 00cc1481 8bec mov ebp,esp
// 20 00cc1483 81ecf0000000 sub esp,0F0h
// 20 00cc1489 53 push ebx
// 20 00cc148a 56 push esi
// 20 00cc148b 57 push edi
{
//union { // [ebp-0F0h]
// struct {
size_t arg0_len; // [ebp-08h]
// unsigned int pad0; // [ebp-0Ch]
// unsigned int pad1; // [ebp-10h]
size_t ii; // [ebp-14h]
// unsigned int pad2; // [ebp-18h]
// unsigned int pad3; // [ebp-1Ch]
size_t len_temp; // [ebp-20h]
// unsigned int pad4; // [ebp-24h]
// unsigned int pad5; // [ebp-28h]
char x; // [ebp-29h]
// };
// unsigned char local_variables[240];
//}; // local variables

// //set local variables to unusual values for buffer checking
//memset(local_variables,
// 0xCCCCCCCCU,
// sizeof(local0)/sizeof(unsigned int)
//); // 20 00cc148c 8dbd10ffffff lea edi,[ebp-0F0h]
// 20 00cc1492 b93c000000 mov ecx,3Ch
// 20 00cc1497 b8cccccccc mov eax,0CCCCCCCCh
// 20 00cc149c f3ab rep stos dword ptr es:[edi]

len_temp = strlen(arg0); // 26 00cc149e 8b4508 mov eax,dword ptr [ebp+8]
// 26 00cc14a1 50 push eax
// 26 00cc14a2 e803fcffff call puzzler3!ILT+165(_strlen) (00cc10aa)
// 26 00cc14a7 83c404 add esp,4
// 26 00cc14aa 8945e0 mov dword ptr [ebp-20h],eax
// 28 00cc14ad 8b45e0 mov eax,dword ptr [ebp-20h]
// 28 00cc14b0 8945f8 mov dword ptr [ebp-8],eax

for(arg0_len = len_temp; arg0_len > 0; arg0_len--)
// 28 00cc14b5 8b45f8 mov eax,dword ptr [ebp-8]
// 28 00cc14b8 83e801 sub eax,1
// 28 00cc14bb 8945f8 mov dword ptr [ebp-8],eax
// 28 00cc14b3 eb09 jmp puzzler3!myfun+0x3e (00cc14be)
// 28 00cc14be 837df800 cmp dword ptr [ebp-8],0
// 28 00cc14c2 7e60 jle puzzler3!myfun+0xa4 (00cc1524)
{
// 30 00cc14c4 c745ec00000000 mov dword ptr [ebp-14h],0
// 30 00cc14cb eb09 jmp puzzler3!myfun+0x56 (00cc14d6)
for(ii = 0; ii < (arg0_len - 1); ii++)
// 30 00cc14cd 8b45ec mov eax,dword ptr [ebp-14h]
// 30 00cc14d0 83c001 add eax,1
// 30 00cc14d3 8945ec mov dword ptr [ebp-14h],eax

// 30 00cc14d6 8b45f8 mov eax,dword ptr [ebp-8]
// 30 00cc14d9 83e801 sub eax,1
// 30 00cc14dc 3945ec cmp dword ptr [ebp-14h],eax
// 30 00cc14df 7d41 jge puzzler3!myfun+0xa2 (00cc1522)
{
//char temp0 = arg0[ii]; // 32 00cc14e1 8b4508 mov eax,dword ptr [ebp+8]
// 32 00cc14e4 0345ec add eax,dword ptr [ebp-14h]
// 32 00cc14e7 0fbe08 movsx ecx,byte ptr [eax]
//char temp1 = arg0[ii + 1]; // 32 00cc14ea 8b5508 mov edx,dword ptr [ebp+8]
// 32 00cc14ed 0355ec add edx,dword ptr [ebp-14h]
// 32 00cc14f0 0fbe4201 movsx eax,byte ptr [edx+1]


// 32 00cc14f4 3bc8 cmp ecx,eax
// 32 00cc14f6 7e28 jle puzzler3!myfun+0xa0 (00cc1520)
if(arg0[ii + 1] > arg0[ii])
{
x = arg0[ii]; // 34 00cc14f8 8b4508 mov eax,dword ptr [ebp+8]
// 34 00cc14fb 0345ec add eax,dword ptr [ebp-14h]
// 34 00cc14fe 8a08 mov cl,byte ptr [eax]
// 34 00cc1500 884dd7 mov byte ptr [ebp-29h],cl

arg0[ii] = arg0[ii+1]; // 35 00cc1503 8b4508 mov eax,dword ptr [ebp+8]
// 35 00cc1506 0345ec add eax,dword ptr [ebp-14h]
// 35 00cc1509 8b4d08 mov ecx,dword ptr [ebp+8]
// 35 00cc150c 034dec add ecx,dword ptr [ebp-14h]
// 35 00cc150f 8a5101 mov dl,byte ptr [ecx+1]
// 35 00cc1512 8810 mov byte ptr [eax],dl
arg0[ii+1] = x; // 36 00cc1514 8b4508 mov eax,dword ptr [ebp+8]
// 36 00cc1517 0345ec add eax,dword ptr [ebp-14h]
// 36 00cc151a 8a4dd7 mov cl,byte ptr [ebp-29h]
// 36 00cc151d 884801 mov byte ptr [eax+1],cl
}
} // 38 00cc1520 ebab jmp puzzler3!myfun+0x4d (00cc14cd)
} // 40 00cc1522 eb91 jmp puzzler3!myfun+0x35 (00cc14b5)

// return and ensure our stack is still correct
return; // 41 00cc1524 5f pop edi
// 41 00cc1525 5e pop esi
// 41 00cc1526 5b pop ebx
// 41 00cc1527 81c4f0000000 add esp,0F0h
// 41 00cc152d 3bec cmp ebp,esp
// 41 00cc152f e820fcffff call puzzler3!ILT+335(__RTC_CheckEsp) (00cc1154)
// 41 00cc1534 8be5 mov esp,ebp
// 41 00cc1536 5d pop ebp
// 41 00cc1537 c3 ret
}

puzzler3_driver.cpp
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

void myfun(char *arg0);

int main(int argc, char* argv[])
{
char *test = NULL;

test = (char *)calloc(strlen(argv[0]) + 1, sizeof(char));
strcpy(test, argv[0]);

printf("%s\n", test);
myfun(test);
printf("%s\n", test);

return 0;
}

You'll need an MSVC compiler, probably circa VS2k5 or later, with /Od /GS /RTC1 /FA /Fa"bin\\" to get a listing that'll look like their listing.