Measurement error of visual field tests in glaucoma 2003年第87卷第1期 | 39康复网

Discoveries in Sight, Devers Eye Institute, Portland, OR, USA
² Bristol Eye Hospital, Bristol, UK
³ School of Psychology, University of Western Australia, Perth, Australia
⁴ School of Computing, Curtin University of Technology, Perth, Australia

Accepted for publication 23 July 2002

	ABSTRACT

Top ABSTRACT METHODS RESULTS DISCUSSION REFERENCES

Aim: Psychophysical strategies designed for clinical visualfield testing produce rapid estimates of threshold with relativelyfew stimulus presentations and so represent a trade-off betweentest quality and efficiency. The aim of this study was to determinethe measurement error of a staircase algorithm similar to fullthreshold with standard automated perimetry (SAP) and frequencydoubling perimetry (FDP) in glaucoma patients.

Methods: Seven patients with early open angle glaucoma (OAG)were prospectively recruited. All were experienced in laboratorybased psychophysics. Three matched test locations were examinedwith SAP (externally driven Humphrey field analyser) and FDP(CRT) in a single arbitrarily selected eye of each subject.Each location was tested twice with a 4-2-2 dB staircase strategy,similar to full threshold, and then with the method of constantstimuli (MOCS). Accuracy (threshold estimation error) was quantifiedby determination of differences between "true" threshold measurementsmade by MOCS and single staircase threshold estimates. Precision(repeatability) was quantified by the differences between repeatedstaircase threshold estimates.

Results: Precision was relatively high for both tests, althoughhigher for FDP than SAP at depressed sensitivity levels. Thestaircase strategy significantly underestimated threshold sensitivityfor both test types, with the mean difference (95% CI) betweenstaircase and MOCS thresholds being 4.48 dB (2.35 to 7.32) and1.35 dB (0.56 to 1.73) for SAP and FDP respectively. Agreementlevels (weighted kappa) between MOCS and staircase thresholdswere found to be 0.48 for SAP and 0.85 for FDP. Although this"bias" appeared constant for FDP across all sensitivity levels,this was not the case for SAP where accuracy decreased at lowersensitivity levels.

Conclusion: Estimations of threshold sensitivity made usingstaircase strategies common to clinical visual field test instrumentationare associated with varying degrees of measurement error accordingto visual field test type and sensitivity. In particular, SAPsignificantly overestimates the "true" level of sensitivity,particularly in damaged areas of the visual field, suggestingthat clinical data of this type should be interpreted with caution.

Clinical visual field tests are designed to provide informationabout both the spatial extent and the depth of visual deficitsin a time interval that is sufficiently short to avoid tiringthe patient and negatively impacting reliability. Although manywell described psychophysical approaches can provide rigorous,high quality measurements of threshold sensitivity, these typicallyemploy a considerable number of stimulus presentations at asingle test location and therefore are unsuitable for clinicaluse.¹ Rapid thresholding strategies used in clinical visualfield instrumentation therefore represent a trade-off betweentest measurement quality and temporal efficiency. Inevitably,use of rapid threshold estimation strategies may induce a degreeof measurement error.

In order to understand and evaluate the performance of differentvisual field test types and thresholding strategies, it is necessaryto quantify (1) the ability to produce precise, or repeatable,measurements and (2) the accuracy, or degree of error betweenthe threshold estimation and the "true" threshold sensitivity.Conceptually, these attributes are equivalent to reliabilityand validity respectively.² While there are numerous reportsdescribing the precision of clinical thresholding strategiesfor a variety of visual field test types, empirical data onaccuracy are scarce. Accuracy data have been derived from simulationexercises^3–⁶ or from theoretical calculations.^7,⁸

The aim of this experiment was to evaluate visual field measurementerror in patients with early glaucoma for two commercially availablevisual field test types, standard automated perimetry (SAP)and frequency doubling technology perimetry (FDP) using a staircasestrategy similar to full threshold.

	METHODS

Top ABSTRACT METHODS RESULTS DISCUSSION REFERENCES

The institutional review board of Legacy Health System approvedthis study and all subjects gave informed consent before participatingin the investigation.

Subjects
Seven patients (two male, five female) with early and moderateopen angle glaucoma were recruited for this study from individualsunder the care of the glaucoma service at Devers Eye Institute,Portland, OR, USA. The mean (SD) age of these individuals was75.7 (8.2) years. For the purpose of this investigation, openangle glaucoma was defined at a previous clinical consultationon the basis of both typical glaucomatous optic nerve head changesas determined by a US glaucoma fellowship trained ophthalmologist,characteristic glaucomatous visual field loss, and gonioscopicallyopen anterior chamber angles. Characteristic glaucomatous visualfield loss was defined on previous testing with program 24-2full threshold SAP as an "abnormal" corrected pattern standarddeviation (CPSD) and/or glaucoma hemifield test (GHT) (p<5%for CPSD, "outside normal limits" for GHT), in conjunction witha pattern of visual field loss consistent with glaucoma. Inparticular, early and moderate glaucoma was defined as no testlocations with total deviations worse than –10 dB at themost recent clinical examination. Table 1 presents the visualfield characteristics and other clinical information for theseven glaucoma patients.

fig.ommitteed

Table 1 Visual field characteristics and other clinical information for subjects at the most recent clinical visit

All subjects had previously demonstrated reliable clinical visualfield test results (false positives and negative <33% andfixation losses <25% on catch trials with full thresholdSAP) and were experienced in laboratory based psychophysicaltests with both achromatic and frequency doubling stimuli, havingattended on at least five separate previous occasions for similarexperiments.

Visual field testing
Three test locations were examined in one arbitrarily selectedeye of each subject. Test locations were chosen individuallyfor each subject based on the results of the most recent routineclinical examination in order obtain measurements from a varietyof sensitivity levels from normal to moderate degrees of sensitivityloss. Locations were therefore not standardised among the sample.These same locations were tested with both SAP and FDP. Testorder was randomised to minimise learning or fatigue effects.SAP testing was performed on an HFA model 610 (Humphrey SystemsInc, Dublin, CA, USA), which was externally driven by computerusing custom software. Test conditions identical to routinetesting were employed: size III test target, 200 ms stimulusduration and 31.5 asb (10 cd/m²) background illumination. Frequencydoubling stimuli were presented on a 21" Sony Multiscan G500video monitor driven by a Cambridge Research Systems VSG2/3video board (Cambridge Research Systems Ltd, Kent, UK), usingthe same spatiotemporal properties employed by the commerciallyavailable FDT perimeter (0.25 c/deg spatial frequency sinusoidalwaveforms and 25 Hz counterphase flicker). Mean luminance was50 cd/m². Other properties of frequency doubling stimuli werealso controlled to emulate the commercially available FDP instrumentation,including test target configuration (square 10° x 10°)and stimulus duration (720 ms total stimulus duration, with160 ms linear on-ramp from 0% to tested contrast, 400 ms attest contrast, and 160 ms off-ramp returning to 0% contrast).

It is important to recognise that although both SAP and FDTperimetry make measurements of sensitivity in dB, their measurementscales are not the same as they have different ranges and intervals.In this study, SAP sensitivity measurements use the proprietarylogarithmic HFA scale of retinal sensitivity. The scale usedfor FDT perimetry in this study is also logarithmic, but isa dB scale of FDT stimulus contrast sensitivity ((1 dB = log(1/contrast threshold)) x 10). Although 1 dB on the HFA measurementscale is therefore fundamentally different from 1 dB on theFDT measurement scale, this does not preclude comparison ofthe instruments. In this study it is critical that comparisonbetween the measurement errors of the instruments is based uponthe number of scale intervals that characterise measurementerror, although regrettably both instruments use the same dBnomenclature.

Psychophysical test procedures
Thresholds were quantified using two techniques, as shown inFigure 1, for each visual field test type. Firstly, an adaptivestaircase, or bracketing strategy, was performed to producea threshold estimate typical of clinical visual field testingscenarios whereby testing is performed rapidly using relativelyfew stimulus presentations. This strategy was performed twicein succession, with a rest period between the first and secondsets of estimations. The dynamics of the staircase strategyused in this study were selected to reproduce the "full threshold"strategy of the HFA ("4-2-2") whereby 4 dB step sizes were usedbefore the first reversal of response, followed by 2 dB stepsuntil the second reversal was reached which represented theend point of testing⁹ (see Fig 1A). Stimulus presentations wererandomised and interleaved among test locations. For the purposesof this study, threshold was defined as the mean of the tworeversals and it is important to note that this differs from"last seen" threshold definition employed by the HFA. This alternativedefinition was adopted because of the need for a fair comparisonbetween the staircase and MOCS which makes it essential to obtaina similar threshold end point from each method. A staircasewhere the reversals are averaged results in an estimate of the50% correct point.¹⁰ Similarly, it has been shown that use ofthe "last seen" end point has been shown to induces a systematicmeasurement bias and results in underestimation of sensitivity⁴and increased test-retest variability.¹¹

fig.ommitteed

Figure 1 (A) The staircase strategy. Solid circles represent "not seen" stimuli and open circles "seen" stimuli. This threshold estimation strategy uses 4 dB increments until the first reversal of response (X) and then 2 dB steps until the second reversal (Y). This staircase is similar to the full threshold strategy used by commercial instrumentation excepting that it uses the mean value to the two reversals to define the threshold estimate (broken line), rather than the "last seen" end point used with full threshold. (B) A frequency of seeing curve derived from fitting data obtained with the method of constant stimuli with a cumulative Gaussian function. The 50% detection level (horizontal arrow) is used to extrapolate the "real" or gold standard threshold measurement (vertical arrow).

On completion of testing with the staircase strategy, the MOCSwas undertaken. This established psychophysical approach representsa relatively lengthy and intensive strategy that is designedto produce a rigorous, high "quality" threshold measurementrather than a rapid estimation and is therefore unsuitable forclinical use. For MOCS, seven stimulus levels (luminance incrementsfor SAP and contrast increments for FDP) were examined with20 presentations at each stimulus level. Step sizes betweenstimuli were adjusted in order to approach both 0% and 100%seen, and ranged from 1–3 dB. Stimuli presentations wererandomised and interleaved among test locations. Frequency ofseeing (FOS) curves were constructed from MOCS data by fittingwith a cumulative Gaussian function (Tablecurve 2D, SPSS Inc,San Rafael, CA, USA). These FOS curves were used to quantifythe reference, or "gold standard" threshold using the 50% detectionlevel in dB (see Fig 1B). FOS curves also provided informationon within test variability (interquartile range, dB).

Data analysis
Precision of the staircase strategy was assessed for both SAPand FDP by comparison of the first set of threshold estimationswith those obtained from the second, repeated set. Accuracy,or threshold estimation error, was assessed for each visualfield test type by comparison of the first set of staircasestrategy threshold estimations with the "true," or gold standardthreshold measurement obtained from MOCS.

Both precision and accuracy were quantified using two distinctapproaches. Firstly, the technique described by Bland and Altmanwas used whereby attention was focused on differences in thresholdat the level of individual paired measures, by plotting meanthreshold against measurement difference.¹² Using this approach,paired threshold measurements were examined for evidence ofsystematic bias (mean difference and corresponding 95% confidenceinterval) and also to identify whether this varies with positionalong the measurement scale. The second technique involved calculationof the intraclass correlation coefficient (ICC). This coefficientis equivalent to a quadratic weighted kappa statistic, an agreementmeasure that weights discrepancies between paired measurementsby the square of their difference.¹³ There is no universallyapplicable standard ICC value that represent adequate reliability,but to aid presentation the following convention is followedhere: ICC <0.20 "slight agreement"; 0.21–0.40 "fairagreement"; 0.41–0.60 "moderate agreement"; 0.61–0.80"substantial agreement"; and above 0.80 "almost perfect agreement."¹⁴The ICC was used in preference to the usual (Pearson) correlationcoefficient because the latter measures association rather thanagreement. Unlike the Pearson correlation, the ICC only indicatesperfect agreement if the two assessments are numerically equal—thatis, if a plot of the two measurements has zero intercept anda slope of unity.

Quantification of fatigue effect
It was anticipated that owing to the number of presentationsused with MOCS testing (420 presentations per eye) change inthreshold with test length (fatigue effect) may interfere withquantification of accuracy. This was considered to be of particularimportance to threshold estimates and measures made with SAP,as a number of previous reports have suggested that lower thresholdestimates found with lengthier strategies may be due to fatigue.^15–¹⁷An attempt was made to quantify change in threshold betweenthe first and second halves of MOCS stimulus presentations byequal division of MOCS data for each patient test location.Frequency of seeing curves were constructed for each half andthe difference in the resulting thresholds, which may be consideredas change in threshold due to fatigue, was calculated.

	RESULTS

Top ABSTRACT METHODS RESULTS DISCUSSION REFERENCES

The average SAP mean deviation (MD) among the subjects was –5.77dB. The average response (within test) variability (interquartilerange of frequency of seeing curve) was 1.5 dB (plus or minus0.42 dB) for FDP and 6.2 dB (plus or minus 5.03 dB) for SAP.

Precision
For FDP, "substantial" agreement (ICC = 0.79) was found betweenfirst and second threshold estimations made using the staircasestrategy, with a group mean difference of 0.1 dB, which is lessthan one measurement scale interval. The 95% confidence interval(CI) for this mean difference included zero and extended from1.01 to -0.80 dB, demonstrating that no systematic bias existedbetween first and second set threshold estimations for the sampleexamined in this study. Precision data for FDP are shown graphicallyin Figure 2A. This plot shows that spread of differences wasnot dependent upon the level of mean threshold sensitivity estimationand indicated that the variability of repeated threshold estimatesis similar across the range of thresholds studied.

fig.ommitteed

Figure 2 Precision data for FDP (A) and SAP (B). For both graphs, the solid horizontal lines represent the group mean difference of the first and second threshold estimations. Broken lines represent the upper and lower 95% confidence limits for the group mean difference.

In a similar manner to FDP, the group mean difference betweenrepeated threshold estimations for SAP was small and less thanone measurement scale interval at 0.62 dB, with no evidenceof systematic bias between sequential threshold estimations(95% CI 2.30 to -1.06). However, unlike FDP, although precisionfrom repeated SAP threshold estimations were found to be relativelyhigh at "normal" levels of threshold sensitivity this appearedto decrease at lower levels of threshold sensitivity (see Fig2B), although the possibility of outlying data producing thiseffect should be entertained. This impacted upon the degreeof agreement between the successive sets of threshold estimations,which was quantified as "moderate" (ICC = 0.50) and thus lowerthan that obtained for FDP.

Accuracy
Accuracy data for FDP and SAP are shown in Figure 3. For FDP,threshold sensitivity estimations made using the staircase strategywere significantly higher than the "true," or gold standard,threshold measurement by 1.35 dB (95% CI 0.56 to 1.73). Figure3A shows that this degree of threshold estimation error appearedconstant across the range of thresholds studied. This smallsystematic bias was also confirmed by the high ("almost perfect")level of agreement between the staircase threshold estimationsand gold standard threshold measurement (ICC = 0.85). For SAP,estimations of threshold sensitivity made by the staircase strategythreshold were on average 4.48 dB higher that the "true" threshold(95% CI 2.35 to 7.32). Furthermore, this average threshold estimationerror was not constant: greater errors, specifically overestimatesof threshold sensitivity, were observed at lower sensitivitylevels (see Fig 3B). The greatest threshold error in this datasetwas 19 dB at a mean sensitivity of 13 dB. Agreement betweenstaircase strategy threshold estimation and true threshold forSAP was "moderate" (ICC = 0.48).

fig.ommitteed

Figure 3 Accuracy data for FDP (A) and SAP (B). For both graphs, the solid horizontal lines represent the group mean difference of the gold standard threshold measurement, (50% detection level of the frequency of seeing curve) and the first threshold estimation. Broken lines represent the upper and lower 95% confidence limits for the group mean difference.

Fatigue effect
The difference between threshold measurements obtained fromthe first and second halves of the SAP data collected duringthe longer MOCS strategy are shown in Figure 4. The mean (95%confidence interval) difference between these threshold measurementswas 1.25 dB (1.86 to 0.65) denoting a significant drop in thresholdbetween the first and second halves of testing with this strategy.

fig.ommitteed

Figure 4 Mean versus difference plot for SAP threshold measurements made during first and second halves of data acquisition during MOCS testing, designed to extract information about any fatigue occurring during the course of testing with this threshold measurement strategy. The solid line represents the mean group difference and the broken lines denote upper and lower 95% confidence limits for the group mean difference.

	DISCUSSION

Top ABSTRACT METHODS RESULTS DISCUSSION REFERENCES

This investigation has demonstrated that varying degrees ofmeasurement error are associated with threshold estimationsmade using a staircase algorithm similar to that used in clinicaltest situations. Levels of precision and accuracy for the fullthreshold staircase strategy appeared dependent upon both testtype and threshold sensitivity. Overall, the precision (repeatability)of threshold estimates was relatively high for both SAP andFDP in normal or near normal areas of the visual field, althoughFDP demonstrated greater agreement between repeated estimations.FDP also maintained a similar degree of precision at all thresholds,although SAP precision was reduced at test locations with depressedthreshold sensitivity. Data on precision for SAP in this experimentare in agreement with previous reports that have described higherdegrees of scatter, or variability, among repeated thresholdestimations at damaged locations in the visual field,^18,¹⁹ andalso that variability increased as sensitivity reduced.²⁰ Also,our data are consistent with other investigations that haveshown FDP variability to be lower in terms of number of scaleintervals than that of SAP in damaged areas of the visual fieldand thereby supports suggestions that FDP may have potentialfor monitoring for progressive glaucomatous visual field loss.^21,²²Also of importance is that, unlike SAP, FDP precision is consistentin areas of field loss and is therefore predictable.

Quantification of accuracy in this study demonstrated that thestaircase strategy significantly overestimated the "true" thresholdsensitivity for both visual field test types. The magnitudeof this error was relatively small and systematic for FDP atbetween 1 and 2 dB throughout the range of sensitivities examined,although for SAP the average error was between 4 and 5 dB andwas not constant: sensitivity overestimations could be considerablein areas of the visual field with moderate or advanced sensitivityloss.

Previous investigations of visual field test accuracy were performedusing a variety of computer simulation techniques based on frequencyof seeing curves obtained from normal individuals and patientswith optic nerve diseases.^3–⁶ ^23–²⁵ Simulation offersadvantages over empirical data collection as it permits iterativeinvestigation of alternative thresholding estimation strategies,in addition to allowing assessment of different parameters andend points for the same strategy. Also, it provides a usefulcomplement to empirical data collection as it can be performedin a controlled manner to assess the effects of variables thatinfluence strategy performance, such as different levels ofresponse errors and variability. Also, simulation exercisesare not constrained by patient time, degree of experience andfatigue. However, it is important to recognise that the resultsof simulations should be validated clinically by testing inappropriate patient populations. To date, simulation has beenused primarily to optimise threshold estimation strategies andfor staircase strategies this process illustrated the inverserelation between accuracy and efficiency.^3,⁵ Additionally, simulationsuggested that SAP accuracy may be reduced in areas of glaucomatousvisual field loss⁴ and this previous report suggested that forglaucomatous visual fields typical of those investigated inthis experiment (average MD -6 dB), the mean error of thresholdestimated with the full threshold strategy should be around2.2 dB. Our empirical data suggest that this is a conservativeestimate.

When interpreting visual field test results in clinical environments,it is convenient to assume that the measurements reliably representvisual function. It is therefore essential for visual fieldtest "readers" to be aware that a considerable disparity mayexist between "real" visual field sensitivity and that estimatedin the course of a visual field test. In this experiment, thefinding that the threshold estimation error results in sensitivityoverestimation also means that those interpreting visual fieldtest results may mistakenly underestimate the true degree ofvisual field loss. Awareness of test accuracy is therefore clinicallyimportant, and should also be recognised in the design and interpretationof research projects and clinical trials that employ visualfield test instrumentation. Our data demonstrated that bothSAP and FDP tests have a degree of inaccuracy, or thresholdestimation error, and it is possible that this may be caused,at least in part, by fatigue of the patient or their visualsystem during the more lengthy MOCS strategy used to obtainthe gold standard threshold measurement. Reduction in thresholddue to fatigue was quantified at 1.25 dB for SAP, and so mayaccount for some of the threshold estimation error. However,the finding that accuracy differed according to test type, withFDP exhibiting higher overall accuracy than SAP, cannot be explainedby fatigue as the number of presentations was equal for eachtest and also because test order was randomised. Furthermore,the two different visual field test types studied exhibiteddifferent accuracy characteristics, with FDP threshold estimationerror being systematic, while SAP error increased at lower thresholdsensitivities. The reason for this finding is unclear; however,it is likely that it may be attributable to differences in responsevariability known to exist between the FDP and SAP. As describedin previous reports, testing with frequency doubling stimuliyielded steeper frequency of seeing curves than those obtainedfor SAP stimuli.²² Furthermore, while FDP frequency of seeingcurve slopes have been shown to remain relatively consistentacross the dynamic measurement range of clinical instrumentation,^22,²⁶frequency of seeing curves for SAP become shallower in areasof visual loss.^27,²⁸ This difference in variability characteristicsbetween the two test types means that the zone of uncertaintybetween "always seen" and "always missed" stimuli remain similaracross the measurement range for FDP, in terms of number ofmeasurement scale intervals, but for SAP will increase withgreater defect depths, thereby exerting increasing impact uponaccuracy as sensitivity becomes depressed. It is therefore importantto note that the systematic threshold estimation error of FDPlends itself to simple mathematical correction, while the morecomplex nature of the inaccuracy in SAP threshold estimationsis not straightforward.

Although the comparison of measurement error between the twotypes of visual field tests performed in this investigationdemonstrate lower levels of measurement error with FDP, interpretationof the data presented should be made with full knowledge ofthe differences between the test types. Firstly, it should beemphasised that SAP uses a 40 increment differential light sensitivityscale while FDP uses a 20 increment contrast sensitivity scaleand therefore it is obvious that such measurement scales arefundamentally different and cannot be directly compared. Inspite of this difference, the comparisons made in the courseof the study may be considered highly appropriate in the contextof scale intervals as used in current clinical instrumentation.Secondly, the reader should also be reminded that FDP stimuliare considerably larger (10°) than SAP stimuli (Goldmannsize III, 0.42°), which may influence measurement error.Indeed, previous reports have shown that the variability ofSAP is reduced when larger stimuli are used.²⁹ However, a recentinvestigation demonstrated that reduction in FDP stimulus sizeto achieve 24-2 test pattern resolution did not significantlyeffect variability.²⁶ Finally, it is important to be aware thatwhile the FDP stimulus parameters used in this experiment weredesigned to emulate those found in the commercially availableFDT perimeter, the equipment was not identical to the clinicaldevice. It should also be noted that the modified binary searchthresholding strategy employed in the commercially availableFDT perimeter was not evaluated.

In this study, care has been taken to investigate both the precisionand accuracy attributes of visual field tests. Although bothof these parameters are important, it is necessary to ask whichis preferable. It may be argued that tests with higher precisionmay be of greater clinical value, especially in the contextof monitoring glaucoma, as repeatability is desirable for detectionof progressive loss.³⁰ Of course it is important to considerthe sensitivity of any given test to progressive glaucomatousvisual field loss. While SAP is well established as a clinicaltool for monitoring progressive loss, the results of longitudinalstudies evaluating the ability of FDP to perform the role areawaited with interest. Higher accuracy demonstrates that clinicalvisual field test results provide valid representations of patients'visual function and so is also desirable. However, providedaccuracy remains constant (a "systematic" threshold estimationerror) across the threshold measurement range, it is unlikelyto impact upon interpretation of results unless it becomes necessaryto alternate between different threshold estimation strategieswith different levels of accuracy. If accuracy is not constantacross the measurement range, this may also negatively affectsensitivity to true threshold changes, because a change in truethreshold is not linearly related to a change in estimated threshold.Overall, it is reasonable to suggest that accuracy and precisionare equally valuable.

In summary, accuracy and precision are important prerequisitesof clinical test strategies. It has been demonstrated that measurementsmade using staircase strategies typical of clinical environmentslack high degrees of accuracy, and have differing levels ofprecision. Both accuracy and precision appeared dependent ontest type. Limited accuracy of SAP staircase threshold estimationsat test locations with low sensitivity suggests that clinicaldata of this type should be interpreted with caution. Investigationof the accuracy and precision of other clinically used thresholdestimation strategies are currently under way in our laboratories.

	ACKNOWLEDGEMENTS

This project was supported by NEI Grant No EY-03424 (CAJ) andwas presented in part at the biannual North American PerimetricSociety Meeting in Skaneateles, NY, USA, in September 2001.AMM is supported by a NHMRC Australian Clinical Research Fellowship(No 139150).

PGDS, AMM, and AT have no commercial interest in equipment usedin this investigation. CAJ receives research support from andis a consultant for Welch-Allyn, Skaneateles, NY, and HumphreySystems, Dublin, CA, USA.

	REFERENCES

Top ABSTRACT METHODS RESULTS DISCUSSION REFERENCES

Engen T. Psychophysics. 1 Discrimination and detection. In: Kling JK, Riggs LA, eds. Experimental psychology. New York: Holt, Rinehart and Winston, Inc, 1971.
Gordis L. Epidemiology. 1st ed. Philadelphia: WB Saunders, 1996.
Johnson CA, Chauhan BC, Shapiro LR. Properties of staircase procedures for estimating thresholds in automated perimetry. Invest Ophthalmol Vis Sci 1992;33:2966–74.
Chauhan BC, Johnson CA. Evaluating and optimizing test strategies in automated perimetry. J Glaucoma 1994;3:S73–81.
Glass E, Schaumberger M, Lachenmayr BJ. Simulations for FASTPAC and the standard 4–2 dB full-threshold strategy of the Humphrey Field Analyzer. Invest Ophthalmol Vis Sci 1995;36:1847–54.
Turpin A, McKendrick AM, Johnson CA, et al. Development of efficient threshold strategies for frequency doubling technology perimetry using computer simulation. Invest Ophthalmol Vis Sci 2002;43:322–31.
Spahr J. Optimization of the presentation pattern in automated static perimetry. Vis Res 1975;15:1275–81.
Bebie H, Fankhauser F, Spahr J. Static perimetry:strategies. Acta Ophthalmol (Copenh) 1976;54:325–38.
Heijl A. The Humphrey field analyzer, construction and concepts. Doc Ophthalmol Proc Ser 1985;42:77–84.
Wetherill, Levitt. Sequential estimation of points on a psychometric function. Br J Math Stat Psychol 1965;18:1–10.
Johnson CA, Lewis RA. Staircase scoring procedures for automated perimetry. Doc Ophthalmol Proc Ser 1987;49:575–80.
Bland J, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307–10.
Streiner D, Norman G. Health measurement scales. Oxford: Oxford Medical Publications, 1995.
Landis J, Koch G. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
Bengtsson B, Heijl A, Olsson J. Evaluation of a new threshold visual field strategy, SITA, in normal subjects. Acta Ophthalmol Scand 1998;76:165–9.
Wild JM, Pacey IE, Hancock SA, et al. Between-algorithm, between-individual differences in normal perimetric sensitivity:full threshold, FASTPAC, and SITA. Swedish Interactive Threshold algorithm. Invest Ophthalmol Vis Sci 1999;40:1152–61.
Heijl A, Bengtsson B, Patella VM. Glaucoma follow-up when converting from long to short perimetric threshold tests. Arch Ophthalmol 2000;118:489–93.
Bebie H, Fankhauser F, Spahr J. Static perimetry:accuracy and fluctuations. Acta Ophthalmol (Copenh) 1976;54:339–48.
Flammer J, Drance SM, Zulauf M. Differential light threshold. Short- and long-term fluctuation in patients with glaucoma, normal controls, and patients with suspected glaucoma. Arch Ophthalmol 1984;102:704–6.
Heijl A, Lindgren A, Lindgren G. Test-retest variability in glaucomatous visual fields. Am J Ophthalmol 1989;108:130–5.
Chauhan BC, Johnson CA. Test-retest variability of frequency-doubling perimetry and conventional perimetry in glaucoma patients and normal subjects. Invest Ophthalmol Vis Sci 1999;40:648–56.
Spry PGD, Johnson CA, McKendrick AM, et al. Variability components of standard automated perimetry and frequency doubling technology perimetry. Invest Ophthalmol Vis Sci 2001;42:1404–10.
Shapiro LR, Johnson CA, Kennedy RL. KRAKEN. A computer simulation proceedure for static, kinetic, suprathreshold and heuristic perimetry. In:A. Heijl, eds. Perimetry Update 1988/9. Amsterdam: Kugler and Ghedini, 1989:431–8.
Wall M, Johnson CA, Kutzko KE, et al. Long- and short-term variability of automated perimetry results in patients with optic neuritis and healthy subjects. Arch Ophthalmol 1998;116:53–61.
Turpin A, Johnson CA, Spry PGD. Development of a maximum likelihood procedure for short-wavelength automated perimetry (SWAP). In: Wall M, Mills RP, eds. Perimetry update 2000/1. Amsterdam: Kugler, 2001:139–48.
Spry PGD, Johnson CA. Within-test variability of frequency doubling perimetry using a 24–2 test pattern. J Glaucoma 2002;(in press).
Chauhan BC, Tompkins JD, LeBlanc RP, et al. Characteristics of frequency-of-seeing curves in normal subjects, patients with suspected glaucoma, and patients with glaucoma. Invest Ophthalmol Vis Sci 1993;34:3534–40.
Henson DB, Chaudry S, Artes PH, et al. Response variability in the visual field:comparison of optic neuritis, glaucoma, ocular hypertension, and normal eyes. Invest Ophthalmol Vis Sci 2000;41:417–21.
Wall M, Kutzko KE, Chauhan BC. Variability in patients with glaucomatous visual field damage is reduced using size V stimuli. Invest Ophthalmol Vis Sci 1997;38:426–35.
Spry PGD, Johnson CA. Identification of progressive glaucomatous visual field loss. Surv Ophthalmol 2002;47:158–73.

作者： P G D Spry C A Johnson A M McKendrick and A Tur 2007-5-10