Dirty OBS data

I hate the OBS table. No, actually I shouldn’t blame the table; I should blame the application that allows so much garbage data to be entered into it. If you’ve worked with the table at all (I prefer to go directly to the table rather than use the RPTOBS view, though the LASTLABS view is pretty useful), you probably know that you’re likely to find all kinds of things in there.

Want a list of patients who’ve had their Zostavax shot? Be prepared to filter the OBS table more or less thusly:

This is what you get when you allow free text entry. Obviously in some cases it can’t be helped, but it should be restricted to a limited set of options wherever possible. Since Centricity doesn’t limit those options in nearly as many places as it should, have fun collecting your distinct list of OBSVALUEs for that HDID and assembling that list.

This is especially problematic when you need to evaluate numeric values. Go run this query:

Chances are you’ll get a collection of numbers, representing 100 diastolic blood pressure observations. Okay, now add this…

If your database is anything like mine, I’m willing to bet what you just got back ain’t pretty. Here’s my top ten results:

90 left
80 L
120 L
80 ra
100 Left
82 Right
90 LEFT ARM
82 Left
78r
98R

Yeah, those aren’t numbers. So if you’re running a query where you need to evaluate the diastolic pressure as a number, what can you do? You’re going to hit an error if you try something like WHERE CAST(OBSVALUE AS float) > 80. I suppose we can resign ourselves to filtering those values out. Just slap WHERE ISNUMERIC(OBSVALUE) = 1on there and we’re all set.

Only one problem: ISNUMERIC() sucks. It’s just my humble opinion, but I consider ISNUMERIC() to be a broken function. Here, try this:

They all returned 1, didn’t they? According to ISNUMERIC(), those three values are numbers. Now try converting them to int… Hey, nice error message ya got there, pardner!

Now, obviously there’s more to ISNUMERIC() than, “Does this value represent what the average person would consider a number?” I encourage you to check out the Question of the Day I submitted on this topic at SQL Server Central, and the lengthy conversation that followed. And I suppose if you want to be all official about it you could read the BOL entry on the function, whatever. There do exist numeric data types to which those values will successfully convert, so to call it a broken function is a gross oversimplification, granted. My point is, as an indicator of whether a value will convert to a given numeric data type, ISNUMERIC() is useless if not dangerous.

Here’s an idea! Let’s write a scalar function to see if the value will convert to a float. Put a TRY...CATCH block in there, try to convert it to a float, and use the error handling to return a bit value indicating whether or not the conversion succeeded. Scalar UDFs aren’t very efficient in general, but it should be pretty light. Let’s give it a go! Oh… never mind. SQL Server doesn’t allow TRY...CATCH blocks in functions. Well that’s silly.

Considering it further, filtering those values out isn’t necessarily what you want, either. Each of the values I listed contains useful data. We just need to get to it. It’s not going to be pretty or efficient, but the results we get back may be worth it.

This function is a modification of code I found on Pinal Dave’s excellent Journey to SQL Authority blog.

We have two parameters: @strAlphaNumeric is the OBSVALUE, and @ElseValue is what the function will return if @strAlphaNumeric turns out not to be convertible. I’ve defined the return value as numeric(19,10) but you can make it whatever you need it to be.

First things first, if there are no digits in @strAlphaNumeric, we’re clearly wasting our time here. Similarly, if the value has two decimal places in it, or features a letter or space between any digits, we can automatically assume this will not convert. Even though it has digits in it, we’ll have no way of reliably knowing what the number is supposed to be, at least not programmatically. So we cut to the chase and return @ElseValue.

Now that we’ve established that the value is hypothetically convertible, we begin the process of stripping out any characters that are not digits or a decimal point. I’ll leave it to you to sort out that first WHILE loop, but please don’t hesitate to ask questions.

The next WHILE loop strips any trailing decimal places. We shouldn’t have more than one, but let’s play it safe.

Now for one final check. If there’s anything left in @strAlphaNumeric and it passes ISNUMERIC() (hey, we gotta give the old gal something to do here), we cross our fingers and convert it to our numeric data type. Otherwise we return @ElseValue.

From outside the function, we can be prepared to filter out any records that return our @ElseValue, and feel relatively safe that the numeric data we get back is reasonably sound. There are probably six other ways I’d rather do it, but given the limitations under which we work, this will suffice.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">