Top X Diagnoses, by Location

As ICD-10 approaches, I’ve been getting quite a few requests for the top 100 diagnoses for a facility, or a group of facilities. Judging from what I’ve seen (and responded to) on CHUG, it looks like a few of you have been receiving the same request. It’s easy-peasy!

I’m going to offer two versions. One is fairly simple, and is designed to report on a single facility. The second gets a bit more complex, grouping facilities into different practice and specialty types.

Let’s jump right in…

Okay, so what do we have here? Obviously we’re starting with some datetime variables; I like to use variables for my date ranges as it makes it easier to find and update them if I’m using them in multiple locations throughout the script. I’m not in this case, but it’s a useful convention. Here I’m setting my date range to FY2015.

Next we have a common table expression where we’re aggregating the number of incidences of each ICD9 code in the PatientVisitDiags table. We’ve added the LOCREG table into the mix so that we can filter on the facility’s ABBREVNAME value, which is usually easier to remember than a FacilityId or LOCID. But if you know those values and want to leave LOCREG out of it, just modify the second half of the WHERE clause to reflect that.

I find CTEs to be very useful in a situation like this, where you need an aggregate but you also want a bunch of related information as well, like code descriptions and the ICD10 equivalent. You can do those in an aggregate statement, either including them in the GROUP BY or wrapping them in a MAX(), but this just feels cleaner to me. One thing at a time often works best.

Now that we have our aggregates, we can easily select the top 100 diagnoses by ORDERing on DiagCount DESC. Don’t forget DESC, or else you’ll end up with the 100 least-used diagnoses! And we can link the ICD9 code to any other tables we need to. In this case I’m making my way through the MasterDiagnosis and ProblemCodeMap tables to look up the ICD10 equivalents, but you can tailor this to your needs.

If you only need to get this data on a single-facility basis, you can stop here. Because things are about to get tricky, folks. But by all means, follow along if you’re feeling bold!

In our next scenario, we want to report on all of our facilities. But we want to group them into Family Practice, Internal Medicine, Cardiology, Surgical Care, and Urgent Care. And just to throw a monkey wrench into the works, there’s one provider at a Family Practice facility who needs to be counted as Internal Medicine. Got all that?

Good! Okay, let’s start by assuming that we have the same @Start and @End variables. The first thing we’re going to do is expand upon our original CTE:

Zoinks! Deep breath; let’s start at the top. We’re not only aggregating by ICD9 code this time, but by “FacilityGroup” as well. We’ll see where that comes from in a moment.

We’ve replaced LOCREG with our old friend the Facilities function. Take a moment to read up on that if you’ve never seen it before. Basically, it lets us return a portion of the LOCREG table based on parameters. In this case we’re asking it to return our practices, and our cardiology and surgical care facilities. If you’re reporting on everything in your LOCREG table, just use LOCREG.

Then we’ve got a big honkin’ CASE statement in a CROSS APPLY. If you’re not familiar with CROSS APPLY, it’s a very useful tool. There are other ways of doing what it does, but not this cleanly or reusably. In this case, you could very easily take that CASE statement and move it to the SELECT list to give you the FacilityGroup value. But if you do that, you’re going to have to replicate that CASE statement in the GROUP BY clause. Should you need to make any changes, you’d have to remember to update both instances. By using a CROSS APPLY, you can establish that value in one place and reference it throughout the statement. This is just a tiny example of what CROSS APPLY can do, so read up on it if you haven’t used it before.

Now let’s have some fun with CASE logic. Remember, we’re grouping our facilities together into practices and specialties. Starting from the top, any facilities that start with “Surg” will be thrown into the “Surgical Care” bucket. (This is just an example, you’d obviously want to modify this according to your facilities list.) Anything starting with “Cardi” gets thrown into “Cardiology” – makes sense. The “UC” location is “Urgent Care” – we only have one of those, so no need for a LIKE here.

Whoa, hang on. What is this v.DoctorId business? I thought we were looking at facilities? Remember how we said there’s one provider at a Family Practice location who needs to be counted as Internal Medicine? That’s his DoctorFacility.DoctorFacilityId value. CASE statements are always evaluated from top to bottom, so by throwing him into that bucket before we divvy up the practices, we make sure all of his diagnoses end up in the right place. Then finally we toss a few select practices into the “Family Practice” bucket and everything else ends up in “Internal Medicine.”

Okay, so that gives us our diagnosis counts, by facility group and ICD9 code. If you didn’t need to group your facilities, you would simply eliminate that CROSS APPLY and group on ABBREVNAME instead. But we’re not done yet. Perhaps you noticed that comma at the end of the CTE? That means we’ve got another one on its way…

Most of this should look familiar, as it’s basically the SELECT statement from the “simple” version above. But what do we have here…

We have a windowed function that’s PARTITIONing by the FacilityGroup and ORDERing by DiagCount DESC. This is going to number your diagnosis lists so we can pull the top 100. By PARTITIONing, each FacilityGroup’s diagnosis list gets numbered individually. And of course we ORDER by DiagCount DESC so that the highest counts are listed first.

Finally, we grab the top 100 from each FacilityGroup like so…

But wouldn’t it be easier, rather than do that second SELECT as a CTE, to simply put the ROW_NUMBER() function in the WHERE clause?

Yeah, except you can’t use windowed functions in a WHERE clause. Too bad, so sad. There may be a simpler way of doing this, but that ain’t it. And really that’s not so bad anyway. It runs quickly (less than a second for FY2015 in my environment), and once you have a sense of how it works it’s not too difficult to maintain.

Let me know how this works out for you, and I’ll be happy to answer any questions!

2 thoughts on “Top X Diagnoses, by Location

Leave a Reply