Measuring Physicians' Quality and Performance
The edgiest use of comparative measurement is probably "pay for performance,"1 financial carrots and sticks attached to measured achievements in care. One of the clearest examples has been in England, where British general practitioners (GPs) engaged with the National Health Service beginning in 2004 under the Quality and Outcomes Framework (QOF). The terms of that contract in 2008-2009 define 129 quality indicators—the majority on clinical processes and outcomes—and award financial gains to physicians for following those standards. The budget for the first year of that program assumed that GPs would achieve 75% conformance to QOF standards. The budget was wrong; conformance was 97%. The National Health Service got exactly the performance it paid for, although the costs were higher than expected. Morale declined; GPs diverted attention from other needed care to the processes covered by the reward system; and some physicians clearly gamed the system by classifying high proportions of patients as exceptions.
This and other early experiences with pay for performance at the level of individual physicians have raised some orange flags. In this issue of JAMA, the report by Nyweide et al raises another flag. It is known that individual physicians often have too few patients in any specific disease group to support statistically valid comparative measurement—a sample size problem. Nyweide et al ask whether statistically meaningful differences can be measured more reliably for primary care groups than for individuals. Using fee-for-service Medicare data, the authors constructed an algorithm to assign physicians to groups and patients to physicians, and then determined whether the numbers of patients suitable to study in each physician group were sufficient to support comparison among groups on 3 process measures of quality and 2 outcomes.
We should be glad to Berwick begins to understand the practical problem of P4P from a purely statistical viewpoint. He finishes his editorial with ranting comparable to my best (or worst). I believe he rambles, throwing out ideas, convinced that there is an attractive solution to the problem of evaluating individual physician performance.
The goal is worthy. We should give physicians feedback on their performance.
The problem is enormous. Physicians have multidimensional tasks. I have written often about the dimensions.
Researchers, and therefore insurance companies, have focused on the "low hanging fruit". They measure performance on diabetes, heart failure, hypertension indicators. These diseases (and several more) represent the common problems. One problem with the use of performance indicators comes from understanding that every physician cares for uncommon problems. I can easily imagine a physician who does a great job with the common diseases and problems, but does not do well with the uncommon.
No performance measure of which I am aware puts a focus on diagnosis. I value the moniker diagnostician, as I believe that name conveys much honor. Great physicians must make accurate diagnoses.
We have difficulty measuring bedside manner. We can ask patients about their physicians, but I would postulate that truly great bedside manner is more challenging to measure than a measure of patient satisfaction. We do many things at the bedside, comfort, educate, counsel, interrogate, and confront. Just measuring satisfaction therefore does not fully measure bedside manner.
Defining physician quality, while desirable, is extremely challenging. We need to think outside the "low hanging fruit" box, and have expert physicians discuss the dimensions of excellence. Whether or not physician quality is measurable remains a question that I cannot yet answer.


{ 1 comment… read it below or add one }
Spot on. These ‘quality’ initiatives are scams on the public. Pay for Performance can be more aptly renamed, Pay for Paperwork. I know because I've been there. The true quality determinants in medicine cannot be easily counted. Sure, we can measure how often gastroenterologists like me reach the end of the colon, or how many polyps we harvest, but will data like this determine if I’m any good?
If I were a patient, I would want be assured that my doctor could:
Palpate an abdomen
Listen without interrupting
Know when not to order a CAT scan
Recruit outstanding consultants when necessary
Say, “I don’t know”, on occasion
Take an accurate and complete patient history
Be compassionate
How do you measure stuff like this?
http://www.MDWhistleblower.blogspot.com
{ 1 trackback }