art5
art5
To cite this article: C. D. Stephens, N. Mackin & J. H. Sims-Williams (1996) The Development
and Validation of an Orthodontic Expert System, British Journal of Orthodontics, 23:1, 1-9, DOI:
10.1179/bjo.23.1.1
N. MACKIN, PH.D
Team Management Systems Ltd., Watlington. Oxfordshire. lJK
Abstract. An expert system for providing advice 011 the se/ectio11and treatme111 of m.n·s suiwhle for trclllmellt hy 11/t'Wis of
removable appliances has been developed. Its a.\·sessment by peer review is described.
Index words: Expert System, Jeremiah.
logical functions intended by its creator. In the realm of close to the figure suggested by Stephens and Harradinc,
expert systems the issues are rather more complicated; (1988), and more recently confirmed by Evans, (1994) as
there is an additional need to test the validity of the reflecting contemporary consultant opinion.
knowledge contained within the system (Gaschnig et al., However, it is also necessary to verify that the
1983). This is no easy task. A search of the Index Medicus encoded version clinical rules which have been used meet
for the years 1987-93 revealed that, although there were with peer approval. An effective way of achieving both
458 papers describing expert systems, there were only objectives is to have the output of the system assessed by
three reports of clinical trials to assess their validity. consultant orthodontists just as they might assess one
In order to evaluate a system's advice it is necessary to another. Such a trial, formulated in terms of anonymous
establish some form of objective standard which defines peer level review, has the advantages of independence
the correct decision for any given problem. There are two and objectivity. The reviewers should be drawn from
prevalent viewpoints: independent centres so that they do not reflect the biases
of the department within which the system was devel-
1. The correct decision may only be evaluated in retro-
oped. It is also important when setting up such a trial to
spect.
take account of the fact that studies have revealed a high
2. The correct decision may be defined as that which a
level of inter-examiner and intra-examiner variability of
human expert selects as the correct answer based upon experts when planning orthodontic treatment planning
the same information as the expert system. (Han et al., 1991; Stephens et al., 1993).
Buchanan and Shortcliffe, (1984) argue against the
former viewpoint suggesting that in many domains the Methods
question either cannot be answered or is irrelevant. In
diagnosis for example, the eventual outcome for a patient A sample set of 20 cases was selected from past cases
is not entirely correlated with the correctness of the ther- treated by dental practitioners in the General Dental
apy; many patients die, even when they receive the best Service of England and Wales. These were drawn from a
possible treatment, and occasionally patients improve random samples of cases collected as part of a larger
despite inappropriate treatment! In an orthodontic con- study (Richmond et al., 1993). The sample represented a
text we arc all dependent on patient co-operation and wide range of malocclusions which was chosen to include
also at the mercy of unusual expressions of facial growth. 10 cases for which the expert system provided a treat-
An evaluation of MYCIN (Yu et al., 1979) chose the ment plan: i.e. it had advised that an acceptable result
second viewpoint and compared the advice of the expert could be achieved by means of removable appliances,
system with that given by human experts in a blind trial. with or without extractions. These were designated 'treat-
The prescriptions, made by both the system and the clini- ment cases'. In the remaining 10 cases the system had
cians, were assessed by a panel of eight specialists. recommended that the case be referred for specialist
In discussing the evaluation of an expert system to advice or treatment- 'referral cases'. (Subsequently, it
advise upon the management of anaemic patients, was found that one of the treatment cases had been given
Quaglini et al., ( 1988) selected a comparable technique to participants with the post-treatment models and had to
formulated as a modified Turing test. They sought to be excluded from the study, reducing the number of
determine whether 'the system's diagnostic performance treatment cases to 9.)
be distinguished from that provided by human haema- Six consultants drawn from the South of England took
tologists?' The index of performance they selected was, part in the trial which consisted of two parts. First, the
therefore. a measure of the inter-expert consensus. This consultants were each presented with a selection of 13
form of trial was the basis of the peer level review of out of the 19 sets of models. These were accompanied by
Jeremiah which is the subject of this paper. essential information such as the age, sex, skeletal class,
and a note of any teeth which were missing on the radio-
graph. For each case the consultant was required to fill
The Validation of Jeremiah out a proforma as if they were providing advice at the
The rule base of the Jeremiah was extensively verified request of a general practitioner who was competent with
during the development stage of the system. For exam- removable appliances (Fig. 2). If they felt that the practi-
plc, at an early stage it was confirmed that the system tioner should refer the patient to a specialist or consul-
could produce recommendations which were acceptable tant orthodontist, then they filled out the referral box on
to one of the experts who had provided the advice from the form and specified the reason for the referral. The
which the rules of the system were obtained (CDS). In identity of the consultant was recorded on the form using
order to determine this, a clinical assistant who had been only a letter A, 8, C, D, For G.
trained by the expert and had been shown to record near As the trial had to be undertaken in a single day time
identical clinical observations for the features of maloc- did not allow each consultant to assess every case. The
clusion, was used to enter data for the 2(X) cases drawn at trial was structured to allow each of the 6 consultants to
random from the G DS sample used by Richmond et al., provide opinions of all of nine 'treatment cases', and
(1992). The advice which the system generated from this either three or four of the 'referral cases'.
data demonstrated a high degree of conformity with the It should of course be noted that a consultant's recom-
opinion of the original expert (Fig. la,b). It also con- mendation for a 'treatment case' could be a referral
firmed that the proportion of cases which the system rather than a treatment plan and vice versa. In addition,
judged to be within the scope of a general dental prac- there were two treatment cases where one or more con-
titioner with no more than a basic dental degree was very sultants thought that no treatment at all was required.
BJV February 1996 Orthodontic Fxpcn Sy,klll -~
Fm. I (a,b) The approval given by one of the experts upon whose advice the rules of the expert system was based (CDS), to the advice provided by the system
when used by a clinical assistant trained in case assessment hy him. Note: the one case disapproved of in tlw second s.:rics proved to have had clinical data t·ntcred
incorrectly.
In the second part of the study each consultant was tants were required to assess the advice provided on the
presented with a series of models, each with the appropri- form by their colleagues or the system. If treatment had
ate proforma filled out either by either another consul- been recommended, the assessing consultant indicated
tant or with the output from the expert system which had whether he or she considered the plan to be 'ideal',
been obtained previously and had been transcribed by 'good', 'acceptable' or ·unacceptable' (See Table 1). If
one of the authors (NM) as 'Consultant E'. referral had been proposed the consultant was asked to
For each plan (or recommendation to refer) consul- say whether this was 'justified' or 'unjustilied' and
4 C. D. Stephens et al. BJO Vo/23No.l
l
2. Treatment objectives
1
2. Give reasons for referral
Ideal
Good
The best possible treatment for the case as it
presented.
Not necessarily the best treatment plan possible, but
one standing an excellent chance of conferring benefit
both aesthetically and functionally.
Acceptable Standing a reasonable chance of conferring benefit,
and little risk of causing harm aesthetically and
functionally.
Unacceptable Standing little chance of material benefit and or the
<END>
possibility of causing harm to the occlusion or
3. Proposed occlusal changes appearance.
Justification
Correction of lncleor rel•tlonehlp YH/No
Justified The referral justified in terms of the complexity of the
treatment required.
-
Unjustified The referral over-cautious.
Correcuon of buccal occlualon Basis of referral
Antero-p..tettor YH/No Good An appropriate reason for referral was given.
............ YH/NO Poor An inappropriate reason cited for referral.
4. Proposed extractions
TA 111.1' 3 Consultant 1/.\'.\'l!.\'.1'/tll'tll.\' of the nim• 1"11.~1'.\' wht•rt• Jnnniah had pm~·itled 11
trelllnlt'tl/ plan. An emry of 'T' indicate.\' that the conmlttmt prtn-itletl 11 trt•llfmt•nt p/1111, ·U'
repre.~nzt.1· decision tlwt the m.l'f' rt'ifllireda rt~{nmlmul 'N' reprt'.\t'/l/.1' tlw opinion tlwt
tw trt•atment was rt'ifllirt·d
A B (' () F (j T R F
136 R T R R R R 5 T
53 R T R R R R I s 'ITR
143 R R T T T T -1 ~ 'ITIR
99 R R R R R T I s Tr
355 R N T N R R I 3 'I
4XS R N R R R R 11 s TR
345 T T R R R T 3 3 TR
IIX T T T T T T 6 0 rrm
4X T R R R R T 2 -1 ·rr
Decisions In tr.:at .l 4 .l 2 :! 5 llJ lh
Decisions to refer h 3 6 ll 7 4 .l:! :'i
TA!I(.I, 4 n,.. n•tWtfll//11 {1.\'.'fl'.'i.\'l)ll•lll.\' of' the /0 ('tl.~l'.\' where Jeremiah had
rt'C'OIIIIIIt'lldedrlwt the Ct/.1'1' be rt~/i·rretl.fi•r .l'f't'<'illli.l·t 111/t•in• or tn•uflll<'llt
Coded examiner A B ]) F (i F
('as~.!
% R I{
(!() T R
2lJX R R R
42 R T R
459 R T I{
274 R R R
110 R R ]{
505 R R R
!XIl R R R
4X R R R
Decisions Ill treat 0 2 () (I
Decisions to refer 3 I :l 4
The categories 'ideal', ·good', and ·acceptable' can be the relative assessments between the consultants rather
usefully compounded into a single category ·satisfactory'. than being an measuring according to an absolute scale.
The recommendations for treatment can be examined
using the simplified categories of ·unacceptable' and Discussion
'satisfactory'. The Table 10 gives the percentage of satis-
factory recommendations made by each consultant. The agreement between examiners on whether cases
The consultants' assessments so far have been con- should he treated or referred by a practitioner rangell
sidered to be directly comparable. There may well he from 'poor' to ·substantial' (Table 5 ). Then! were also
some consultants whose assessments arc out of line with considerable disagreements in the level of support given
their colleagues, for this reason the assessments have also to the treatment plans (Table 6a,h ). Six plans were
been aggregated in terms of who made the assessment. It assessed as being 'gooll' by one consultant and yct ·un-
could be that the variation in the assessments from one acceptable' by the other. Given the precise definitions of
consultant to another is due wholly to differences in opin- the terms, it is surprising that there should have been
ion between the consultants as to what constitutes a satis- such a measure of variability in the examiner's assess-
factory treatment plan. The assessments made by each ments.
consultant can he weighted to counter these differences It might he thought that such disagreement arose
in opinion. because additional data such as cephalometric radi-
Wcightings were generated so that the averaged ography and facial photographs were not available. I iow-
weighted assessments made by each individual consultant cver, the work of Han et al .. ( 1991) suggests otherwise.
arc in proportion to those made by the whole group of They found that even with a standardized data gathering
consultants (sec Appendix). The wcightings so derived protocol. l~ve orthodontists achieved only 65 per cet;t
arc given in Table 11. mtra-exammer agreement over a 4-6-wcck period in a
It should be noted that the average merely rellects study involving the pretreatment records of 57 patients.
6 C. D. Stephens et al. BJO Vo/23 No./
TAB LE 5 Inter-examiner agreement on their colleagues TAB LE 6( a) Inter-examiner agreement on the appropriateness of the 27
decision to treat or refer (unweighted Kappa values) treatment plans of their colleagues
2nd Assessment
Moreover, in a majority of cases the use of study models Unacceptable 9 13
alone yielded treatment strategies that were equivalent to Satisfactory
(ideal, good, acceptable) 5
those obtained when full records were employed. In
other words the availability of additional information in Kappa =0·20 (Se =0·1) P =0·08.
the form of facial photographs, panoramic radiographs,
cephalometric skull X-rays and tracings made only small TABLE 7(a) Inter-examiner agree-
differences to the treatment decisions. This finding ment on the justification of a decision
supports the earlier report of Naccache et al., (1989). by tl colleague thtll a case should be
When the assessment categories used in the present referred
study were compounded into 'unacceptable' and 'satisfac-
tory' (embracing ideal, good, and satisfactory), 50 per 1st Assessor
cent of the treatment plans were judged differently by the 2nd Assessor Unjust Just
two consultants and the overall agreement was only poor
(Table 6b ). The overall level of agreement on justification Unjust 3
for, and basis of, referral were only 'moderate' (Table Just 38
7a,b ). This extent of examiner variability is well known
(Han et al., 1991; Stephens et al., 1993), but is a serious Kappa =0·38 (S. =0·25) P =0·002.
problem which should be borne in mind when consider-
ing the results presented here. TAilLE 7(b) Inter-examiner agree-
The averaged assessment values of each consultant's ment on soundness of the reason
given us the basis for referral
recommendations provide an overall impression of how
the group considered that particular consultant's recom- I st Assessor
mendations. The values for the expert system in Tables
10-12 indicate that the recommendations for treatment 2nd Assessor Poor Good
decisions produced by Jeremiah are reasonably consis-
Poor 3 6
tent with the group, but that those for referral are more Good 33
poorly regarded. In particular, the reasons stated for
referral were assessed quite unfavourably. Kappa =0·44 (S. =!J.I8) P = O.CXl6.
An explanation for the expert system's low average
value for the referral cases may arise from the nature of
the rulebase. This addresses the domain of removable system's stated reasons will not readily match with the
appliance treatment alone. It therefore recognizes re- consultant's way of thinking and (ii) that the consultant's
ferrals as 'exceptions', on the basis that a course of reasons for advocating referral would be expected to be
removable appliance treatment is inappropriate. A con- more broad ranging.
sultant, on the other hand, would be likely to consider the Weightings developed to compensate for the differing
case within a broader context of treatment encompassing tendencies in assessment (Table 11) can be applied to the
the full range of possible mechanics. Hence, while the treatment assessments made by each consultant. The
system reasons that a case is not amenable to treatment resulting Table 12 should more correctly reflect the
with a maxillary removable appliance because such an group's opinion of the treatment recommendations of the
approach would not correct a specific occlusal abnormal- consultants and the expert system.
ity, a consultant is likely to reason in the opposite way by An absolute interpretation of the values in Table 12
considering what advantages could be gained by employ- would suggest that, as a whole, the group of consultants
ing a more complex type of treatment. The consequences thought the treatment planning of the expert system was
of the different modes of reasoning are (i) that the more often 'unacceptable' than 'satisfactory'. However,
TAB I. F R A vemged a.ues.1·nw111 I'll lues of recommendations madt• /ilf the tremmellf ea."'·'·
A B (' D F (i I'
Treatment n=4 6 0 :i 3 7 31
Average 1·25 l·R3 2·60 2·.B 1·57 I·:iS
Referrals n=fl 2 <) J 2 ~ 10
Justification average 2·0 2·0 I·RlJ 2·0 2·0 I·X
Reason average 2·0 2·0 2·0 l·ti7 .:'·0 1·0 1·6
(\msullant
A B c D F (i
Treatment n~o () 2 4 f) (I ()
Average 1·0 1·75
Referral~: n=6 ti 6
, 4 'i 20
Justification avcrag.: 2·0 2·0 2·0 2·11 2·0 2·0 l·'J
Basis average 1·67 2·0 2·0 2·0 2·0 2·0 1·65
when the value for the expert system is compared with T,\1\1 I· Ill l't'l't't'lllllg<' o/11 conslllta/1/'s rt'Collllllt'll.!ation\ furtrt•atll/1'1/t
the values for the individual consultants, it will he seen which 1\'t"rl' t'lllllidt•rt•t! .l"llti.,:lilt'tlll"\' h\'flt'c'l'.\
that the treatment planning is of a standard directly com-
parable with the consultants themselves. ( 'umultanl sy,lcm
It should be borne in mind that the sample size for the
trial is very low and, therefore, the results arc qucstion-
A B c I> F (i r
a.blc. ~n terms of their accuracy. Nevertheless, it is highly llnaen:plahll' J .:' 2 J -1 I'J
stgmhcant that the consultant's ranking (Table 13) almost An:eplahlo: I 4 0 I> ::' 3 12
exactly matches their clinical experience. The probability % Satisfactory 2." 67 () 67 67 -13 N
that the ranking should he so close to the experience
~ascd ranking by cham:c is less than O·OS33 per cent. As it
ts entirely reasonable to suppose that consultants with TAB 1.1' I I Weighting.•· to 1/0I'IIIII/i:c the con\"1/ltaw.•· · 11 .1.11·.um 1•11 ts 1, till'
m·t•ra/1 group:,· opiniou
greater experience will make recommendations which arc
m?re acceptable to their peers it can be suggested that Consultant
thts method of approach for validation of an expert sys- Syst<'lll
tem is probably effective. A B (' f) , .. (i F
Treatment 0·67 1·57 l·l1S 1-.~<J O·'JO O·SX
.lustilkatiun 2·15 0·7:>. 0·70 0·76
Conclusions 1·15 4·7K
The evidence presented here, when considered in con- Avcra~e ... ()·'JX O·:'itl -0·7H thn tl·(K} -tl-35 -0·1::'
junction with the of earlier work of Brown et al., ( llJ91 ),
suggests that the use of Jeremiah by general practitioners
TAB! I· I_ 3 .n11• muking of tht· six N/IS 1"11/l.l'llittllll.\ l>a.1·1•t! 11 p 011 flt'<'r
could significantly reduce the proportion of in appropri-
rel'lew of their treatme111 plan•. the expert •v.,tem /> 1•ing tf1•110 t1•d m· F
ate treatment plans adopted for removable appliance
togetha ll'llh thc1r r<'imin· l'.lf'<'rieutT t'XfJ/"t'.l-'<'ti as their year of appoi/11·
treatment in the General Dental Sl!rvicc of England and 111<'111 a.1· a con•·u/twu
Wales.
Ranking 1st 2nd .1rd 4th 'ith hlh 7th
Acknowledgements As~cSMlr ll D I· [' (J ('
Avemgce wtd A
O·Stl 0·-IJ tHKl -0· J:~ --O·."lS ll·7S 0·9H
The authors arc grateful to Mr A. 0. Hughcs, M.Sc., a~scssmcnt
M.Phil., F.R.S.S., for statistical advice, and to all those Year of 1'171 1972 llJS6 I!JX3 l'JS7 !l)l)j
consultants who took part in the trials. Jeremiah was <tppointment
I! C. D. Stcphens et al. 810 Vo/23 No. I
developed with the support of Medical Research Council Naccache, H., Bernard, C., Brodeur, J. M. and Poumier, A. (1989)
Grant G870719. Epidemiological evaluation of a computerised diagnosis in
orthodontics.
Journal of Dental Research, 68, 776.
References Quaglini, S., Steranelli, M., Barosi, G. and Benuinl, A. (1988)
A performance evaluation of the expert system ANEMIA.
Brown, I. D., Erritt, S. J., Adams, S. R., Sims-Williams, J. H. and Complllers and Biomedical Research, 21,307-323.
Stephens, C. D. (1991)
The initial use of a computer controlled expert system in the treat- Richmond, S., Shaw, W. C., Stephens, C. D., Webb, G. and Roberts,
ment planning of Class 11 division l malocclusion, c. (1993)
British Journal of Orthodontics, 18, 1-7. Orthodontic treatment in the General Dental Services of England
and Wales, a critical assessment of standards.
Buchanan, B. G. and ShortcliiTe, E. H. (1984) British Dental Journal, 174, 315-329.
Rule-based Expert Systems: the MYCJN experiments of the Stamford
Sims-Williams, J. H., Brown, I. D., Matthewman, A. and Stephens,
heuristic programming project, C. D. (1987)
Addison Wesley, Reading, Massachusetts.
A computer-controlled expert system for orthodontic advice,
Clark, J. D. and Elderton, R. J. (1987) British Dentalloumal, 163,161-166.
Orthodontic treatment in the General Dental Services in Scotland, Stephens, C. D. and Harradine, N. H. (1988)
British Dental Journal, 162,57-62. An examination of the complexity of treatment judged in 1986 to be
Evans, R. J. (1994) required for orthodontic patients referred to the Bristol Dental Hos-
Audit into orthodontic treatment plans for practitioners, pital in 1977 and 19H5.
Unpublished material presented to the meeting of the Consultant British Journal Orthodontics, 15,27-32.
Orthodontists Symposium, February 1994. Stephens, C. D., Drage, K. D., Richmond, S., Shaw, w. C., Roberts,
Gaschning, J. P., Klahr, P., Pople, E. Shortcliffe, E. and Terry, A. C. T. and Andrews, M. (1993)
(1983) Consultant opinion on orthodontic treatment plans devised by den-
Evaluation of expert systems: issues and case studies. tal practitioners: a pilot study,
In: F. Hayes-Roth, D. A. Waterman and B. D. Lcnat, (Eds) Build- Journal of Dentistry, 21, 355-359.
ing Expert Systems, Stheeman, S. E., Van der Stelt, P. F. and Mileman, P.A. (1992)
Addison Wesley, Reading, Massachusetts. Expert systems in dentistry past performance and future prospects,
General Dental Council (1993) Journal of Dentistry, 20, 68-73.
Professional Conduct and Fitness to Practise, Yeh, R. T. (1977)
General Dental Council, London. Current trends in programming methodology: volume 2-program
validation,
Gravely, J. (1989) Prentice Hall, New Jersey.
Who should practise orthodontics'?
British Journal of Orthodontics, 16,235-241. Yu, V. L., Fagan, L. M., Rennet, S. W., Clancey, W. J., Scott, A. C.,
Han, U. K., Vig, K. W. L., Weintraub, J. A., Vig, P. S. and Kowalski, Hannigan, J. F., Blun, R. L., Buchanan, B. G. and Cohen, S. N.
(1979)
c. J. (1991) An evaluation of MYCIN's advice,
Consistency of orthodontic treatment decisions relative to diagnostic
Journal of the American Medical Association, 242, 1279-1282.
records.
American Journal of Orthodontics and Dentofacial Orthopedics, 100,
12-19.
Jacobsen, A. (1988) Appendix
The challenge of dental education today,
Journal of Clinical Orthodomics, 22, 576-5H4. In order to calculate the weightings it is first necessary to
Landis, J. R. and Koch, G. G. (1977) assume that the set of treatment plans assessed by each
The measurement of observer agreement for categorical data. consultant is comparable in terms of the number of the
Biometrics, 33, 159-174. plans which are satisfactory. That is to assume that the
Lawrence, P.C., CliiTord, P.C., Taylor, I. F. (1987) variation in assessments is not due to the fact that the
A~:ute abdominal pain---computer aided diagnosis by non-medically consultants assessed different sets of cases.
qualilied staff. First, it is necessary to define a scale for the weight-
Annals of the Royal College of Surgeons of F:ngland, 69, 233-237.
ings. This can be easily achieved by providing numeric
Lowe, A. A. (1987) values for the categories unacceptable and satisfactory.
Undergraduate and continuing education in orthodontics: a view
into the 1990s. Let A;i be the number of assessments made by consul-
International Dental Journal, 37,91-97. tant i that are in category j. Let the likelihood of a partic-
Mackin, N. (1992) ular consultant i assessing a case into a specific category j
The development of an expert system for planning orthodontic be R;iwhere
treatment,
PhD thesis, University of Bristol.
McAdam, W. A. F., Brock, B. M., Armitage, T., Davenport, P.,
Cahn, M. and de Dombal, F. T. (1990)
Twelve years experience of computer aided diagnosis in a district
general hospital, Let Gi represent the proportion of assessments made by
Annall· of the Royal College of Surgeons of England, 72, 140-146.
the group that are in category j.
Myrberg, N. E. A., Duterloo, R. S., Booy, C., Van der Linden F. P.
(;.M., Boersmay, H. and Prahi-Anderson, B. (1986)
Orthodontic services in the Netherlands: the standpoint of the I: A
Dutch professors of orthodontics, q =_;_'I_
European Journal of Orthodontics, 8, 65-66. I: I: A;)
I I
IJJO February /996
Orthndnnlic ['xpnl S~,lcm 'J
~ow we define the weighting W 11 to normalize the like- For example the weighting for consultant D assessing the
lihood of consultant assessing category j so that it is
I category unacceptable (U) is derived as follows
representative of the group of consultants.
H·35
I· 393
3·67
rA;;rA;;
I I .