Bad Science And Surgery
Doctors have an ethical imperative to ensure that their patients receive the best treatment available. From the earliest Hippocratic times we have been exhorted to “to do no harm” (επι δηλησει δε και αδικιηι ειρξειν). This requirement to use treatments that works and withhold treatments that may harm is particularly important in disciplines such as gynaecology where most interventions are aimed at alleviating non-fatal symptoms rather than heroic life-saving therapies. There is almost universal acceptance of such an approach, the problem that we have is how to know which treatments work and which do not?
Almost all of us have been trained in the classical apprenticeship manner. We learned the most effective treatments and most efficient procedures by studying the works of our distinguished teachers whom we assumed had distilled the most appropriate techniques by years of study coupled with diligent distillation of their trial and error experiences. We, in our turn applied and modified these approaches and taught them to our students. So our profession developed. But on what foundations are our speciality built and how do we know they work?
The most common justification of a procedure new or old is “that I my experience it works”. Such a justification from the famous and powerful in our profession is difficult to refute particularly if a number of our most eminent authorities come together in a society or college and re-enforce each other’s belief. Unfortunately our collective history is littered with examples of when the most widely held beliefs of the highest in the profession are subsequently shown to be in error. From the earliest days of modern gynaecology the collective wisdom of the profession has been found wanting. Take the following text on the treatment of menorrahgia from a standard medical book published in 1842 by the distinguished physician of the Royal College of Surgeons of Edinburgh Dr Thomas Andrew (1).
“It is therefore necessary to adopt preparatory measures, even when flooding is not of long standing, blood should be taken from the arm to the extent of from 10 to 14 ounces or even a pound, according to circumstances. Basquellon never omitted this precaution, even when the patient had pallid lips, small pulse, and appeared bloodless, and frequently he found the pulse and the strength of the patient re-invigorated under this treatment. This, however, is not a practice to be followed by an unprofessional attendant, as the patient might fall into a state of syncope or fainting from which she might never recover. In all circumstances of active flooding blood-letting must never be omitted”.
This passage in a widely distributed textbook and written by doctors of the highest probity reveals many of the weaknesses of our profession, now as well as then. This passage reflects the very best collective medical opinion at the time of how to manage heavy menstrual bleeding. Very authorative but completely wrong 1) First build a treatment on a reasonable but ultimately erroneous hypothesis: The theory behind blood–letting was that the excess bleeding was consequent on pelvic congestion (and possibly the accumulation of evil humours). Removing the blood would reduce the congestion and thereby stop the bleeding. 2) Describe in great detail the methods of the treatment: The more complex the treatment, the higher the skill levels required of the attendant. This reflects powerfully on the successful champions of the intervention and will ensure increased prestige and more importantly to many increased income 3) Claim the treatment is effective in the originators hands: Once the effectiveness of the intervention is conceded then adverse outcomes can be attributed to failings of those who attempt to copy the technique rather than deficiencies in the concept. There continue to be many examples of how sensible sounding procedures are endorsed by the profession at large to only later be found to be erroneous.
Of course many (? most) developments in gynaecological surgery represent real, reproducible advances that improve the quality of life of our patients. We have lived, during the last 20 year, through an unprecedented explosion of novel methods, techniques and treatments that appear to be of benefit to our gynaecological patients. How do we know if each new intervention is worthwhile and without unacceptable side-effects? It is self-evident that we need to collect data and share this data with our colleagues. It is easier than it ever has been to rapidly transmit a new concept or idea to all corners of the globe. But much of this data seems to confuse and obscure rather than clarify the effectiveness of any new intervention. Presentation of data that obscures rather than illuminates the benefits can truly be called Bad Science.
Let us therefore look at the methods currently available to us for assessing new procedures and particularly look to see evidence of where bad science may mislead or confound us. Perhaps the most misleading of all types of pseudo-scientific study in surgery is the retrospective single centre study. In reality such studies are audits of a single unit’s performance over time. This will give an indication of the effectiveness of a particular procedure in a particular unit. The results are seldom generalisable because of the impact of types of equipment available, experience of the operators, pathology and general health of patients treated etc. etc. Even hard data such as a complication rate are uncertain in this type of study because of uncertainty about the completeness of collection of the information. Misreporting or underestimate of many but the most obvious (death or child-birth) end-points is obvious bad science and is frequently unavoidable in this type of study.
It is well established that prospective studies allow more accurate data collection and it is likely that items such as complications rates will be more fairly recorded. A major additional advantage of the prospective study is that the effect of the intervention on individual outcome measures can be reasonably measured. If weight, or pain score or any Quality of Life instrument is measure before and after the intervention a more meaningful assessment of the effect of the intervention on such parameters may be made. The most common bad science associated with this type of trial is failure to acknowledge the impact of what we can collectively call “the placebo effect”. Any intervention, particularly a new one and especially if associated with impressive technological equipment such as lasers or robots is known to be associated with a significant beneficial effect on outcome. This effect has long been known, and for much of the history of medicine, in the absence of really effective interventions, this was virtually the only tool our predecessors had. In the old text previously referred to the supportive measures for excessive bleeding are described thus: “But should the girl be delicate state of health at the commencement of this epoch, it is necessary to allow very nourishing diet, give her slight tonics, use cold and shower baths, beginning at blood heat, and reducing it to the ordinary temperature, combined with exercise in the open air, and if competent, exercise especially on horseback. It is in these cases especially that local remedies may be employed with advantage such as frequently bathing the feet, allowing a quantity of water that the legs of the patient may be immersed as far as the knees for when the feet only are covered, they are more frequently injurious than useful. The bowels should be kept open with foetid and stimulating warm enemas. Warm cataplasms, such as mustard sinapism, applied around the pelvis, dry cupping, frequent blisters, the application of a few leeches to the ankles, legs or superior parts of the thighs, small bleedings from the feet etc etc.”
It is clear that even in the absence of major therapeutic effect, improvements in symptoms, may still be observed in patients having ineffective therapies administered with much ceremony. The ritual aspects of the therapeutic process may have been rather neglected in these modern mechanistic times but their effect is still powerful and well recognised. In a placebo controlled trial of laparoscopic excision of endometriosis we published (2) in whom the patients were completely blinded to the nature of their treatment 6 of 19 patients in the placebo arm and who had no treatment other than diagnostic laparoscopy reported improvement in their overall pain levels 6 months after the intervention. Phased another way almost 1/3rd of the patients who felt they might have had effective treatment but in fact had only a placebo intervention and who had gone through the whole ritual of hospitalisation, surgery and careful follow up reported improvement in pain 6 months after the intervention. Such a placebo response to surgery has been very well documented in a large number of situations. Failure to recognise this placebo effect may attribute specific beneficial effects to an intervention that are only in fact a response to the ritual of caring and support surrounding such interventions and is another type of bad science associated with surgical assessment.
To avoid attributing too much power to the placebo effect that may be associated with any major health intervention it is necessary to compare the particular intervention with either a sham intervention or a ‘state of the art’ procedure. Great care must be exercised when comparing two interventions that apples are compared with apples. Many examples of inappropriate comparison have led to misleading results. In any laboratory experiment, the good scientist takes extra-ordinary pains to ensure that every aspect of the two arms of an experiment is identical, to ensure that any observed differences are actually due to the process being investigated. Random alterations in the background conditions of the intervention may contribute to the observed effect and may therefore completely invalidate any observed differences. In clinical medicine the experimental conditions can never be standardised for patients, their disorders and their surgeons vary. It is possible to reduce the effect of random background variations by the process of randomisation. The aim of this process is to ensure that each and every variable that might affect the measured outcomes is evenly distributed between each arm of the study in a randomised controlled trial (RCT). If patients of the same weight distribution, the same range of pelvic pathologies, the same smoking history and surgeons with the same range of surgical expertise are fairly distributed to each treatment group there should be an equal number of oranges, apples and pears in each group and so the overall outcome measure differences should be more likely due to the intervention. Moreover data collection in a correctly structured RCT should be conducted under a single standardised system improving the reliability of the comparison. Very common bad science in surgical trials is that inappropriate comparisons are made between say the same surgeons results at different stages of his career ie a consecutive comparison. There are many other examples where apparently good comparisons are invalidated because different patients were pre-selected for inclusion in particular arms of the trial. This often results in poor-prospect patients being allocated to one arm compromising the results, outside the effects of the intervention under investigation. Blind randomisation by a third independent party (often a computer programme) removes many of the risks of bad science in such a randomised comparative study.
This review illustrates a number of the problems associated with trying to fulfil our Hippocratic duty and fairly assessing the many procedures we are collectively trying to develop. There are many pitfalls and fair assessment of competing interventions is more difficult and expensive than it would initially appear. Despite the difficulties, however, new surgical techniques and technologies can and are well tested and validated. Gynaecological endoscopists have been in the very forefront of conducting good quality trials of these new surgeries and we should be collectively proud of our leading role. To avoid bad science involves humility on behalf of the surgeons in recognising that interventions that seem intuitively good may not be so. Bad science will also be avoided if suitable care is taken with trial design and similar diligence is exercised by all those who read and assess published results.
The ISGE is at the leading edge of organisations committed to continually developing better surgical procedures for women and to assessing them with the very best science we can achieve. The forthcoming meeting in Sydney will allow those fortunate to attend to see all types of science good and bad associated with minimal access gynaecology. It should be a wonderful occasion. We hope to see you there.
REFERENCES
1) A Cyclopedia of Domestic Medicine and Surgery by Thomas Andrew MD. 1842. Pub Blackie and Sons Queen Street Glasgow
2) Abbott JA, Hawe J, Hunter D, Holmes M, Finn P, Garry R. Laparoscopic excision of endometriosis: a randomized, placebo-controlled trial. Fertil Steril 2004;82:878-884