The Case Against Judgmental Adjustments

Một phần của tài liệu an investigation of accuracy, learning and biases in judgmental adjustments of statistical forecasts (Trang 49 - 54)

Some of the earliest studies on judgmental adjustments come from the psychology literature where the authors described various methods of predicting human behavior. In the psychology literature, the main controversy was between the effectiveness of

statistical methods based on objective observations and clinical methods that were based on human judgment. Two studies in this area of inquiry addressed the issue of combining statistical and judgmental prediction methods.

In 1946, Kelly and Fiske (1950) were commissioned by the Veterans

Administration (VA) to develop a selection tool to predict the differential success in training and in clinical competence of the applicants for graduate study in clinical psychology and subsequent employment with the VA. These applicants were already a highly select group of students training in many different institutions with widely varying standards of admission and evaluation. To this end, Kelly and Fiske used many objective

tests, such as the Miller Analogies Test and the Kuder Preference Record, and subjective assessment tools such as face-to-face interviews and role-playing. Based on preliminary results, they suggested that a combination of a few of the objective measures, or even a single objective measure, was a better indicator of success than any of the subjective assessment tools. In fact, they found that combining subjective assessment results with the objective measures did not significantly increase, and in some cases decreased, the validity of the predictions. It is important to note that these findings were based on data obtained while the study was still in progress. Hence, the authors cautioned against generalizing these results as the pattern of findings were subject to change with additional criteria and further analysis. However, a follow-up study was never published.

Although the original context of this study was in the psychology literature, the findings therein can be relevant to the forecasting literature as the task studied is similar to judgmental adjustments. In this study, the future success of an applicant was predicted using objective, subjective and combination methods, similar to forecasting future sales by various methods. In fact, the use of objective methods as an input to subjective methods is analogous to judgmental adjustments since the final forecast is based on a subjective revision of an objective prediction. Therefore, the results of this study can be viewed as analogous to forecasting practice in business and hence be used to discourage the use of judgmental adjustments.

In a later study, Harris (1963) compared the effectiveness of mathematical (objective) and judgmental (subjective) prediction methods. He compared the

performance of a mathematical formula in predicting the score differential of college football games to the judgmental prediction of (1) non-experts and (2) experts. The

formula, developed by Dr. Edward E. Litkenhous, used past performance of the college football teams to predict their future performance. In other words, there were potentially important environmental factors that the formula did not take into account such as weariness of each team from travel and determination to win a particular game. In the first part of the study, Dr. Litkenhous and a sportswriter, as non-expert subjects, made judgmental adjustments to the predictions of the mathematical formula using information that was publicly available. In the second part of the study, a group of coaches were utilized as expert subjects who made judgmental adjustments using information that was semi-confidential about their teams and the opposing teams. In both parts of the study, the human subjects were supplied the results of the formula and then added their knowledge and expertise which were not considered in the formula. In both cases, the formula met or exceeded the accuracy of human subjects. Interestingly, in the first part of the study, Harris used a sample size of two non-experts. In the second part, he elicited opinions from team coaches who were experts only for their teams and whose judgments could have been tainted by their position in their teams. The sample size in the second part of the study was eight. Even though Harris reported results that were statistically significant, the sample sizes he used remain a major weakness in this study.

There are also researchers in other fields that caution against the use of judgmental adjustments. Carbone et al. (1983) investigated the effects of technical expertise, individualized analysis and judgmental adjustments in the context of a number of quantitative forecasting methods. One of their research questions focused on the effect of judgmental adjustments on forecast accuracy. They found that judgmental adjustments had no effect or a negative effect on forecast accuracy. However, the result of this study

was based on randomly selected time series from the M-competition2 and the subjects were MBA students who had neither expertise in the industries from which the data came nor practical experience in forecasting.

In a later study, Carbone and Gorr (1985) expanded on Carbone et al. (1983) where they examined the effects of judgmental adjustments on statistical forecasts. In this experimental study, they used as their subjects graduate students who were asked to provide first a judgmental forecast and then a statistical forecast for published time series data from the M-competition. Finally, the subjects were asked to adjust their judgmental forecasts in view of the statistical forecast. The effects of the judgmental adjustments on forecast accuracy were mixed. Judgmental adjustments improved accuracy for some statistical methods and not for others. The findings were not conclusive, so the authors cautioned against an excessive reliance on judgment. It should be noted that the subjects in the study did not have industry expertise or practical experience in forecasting.

Furthermore, the time series analyzed and the statistical forecasting methods used were arbitrarily chosen. In other words, the compatibility between a time series and a statistical method were omitted in this study. Hence, the statistical forecasts were of questionable quality.

Willemain (1989) investigated judgmental adjustments to statistical forecasts when data were presented graphically. He generated artificial time series using known

2 M-competition is a comparative study on the accuracy of various forecasting methods. Competing forecasting methods are applied to a publicly available data set and the results are subsequently published and evaluated. The data set contains a large number of time series collected from various industries. This competiton has been performed for three times as new forecasting methods and technologies develop.

Furthermore, the data set used has also been expanded over the years. For specific results of the M- competition, the interested reader is referred to Makridakis and Hibon (1979), Makridakis et al. (1982), Makridakis et al. (1993) and Makridakis and Hibon (2001).

ARIMA (autoregressive integrated moving average) specification and parameter values.

This way, the optimal forecast was known for each of the series and the excess error of any statistical forecasts could be calculated. The statistical forecasts were generated using a statistical software package. The subjects were presented with the graphical plots of the statistical forecasts and were asked to judgmentally adjust the forecasts to improve the accuracy. The results suggested that the effectiveness of the judgmental adjustments depended on the size of the “excess error”, that is, judgmental adjustments improved forecast accuracy when the statistical forecast had a relatively large forecast error. If the statistical forecasts were “good” then judgmental adjustments had little effect. But if the statistical forecasts were “poor”, then the judgmental adjustments increased accuracy.

There are a number of shortcomings of this study. First, the “excess error” cannot be accurately calculated in an actual business setting since the theoretically optimal forecasts are virtually impossible to know. Second, the data series was artificially generated and the subjects were students or faculty members with no subject-matter expertise or forecasting experience.

In a later study, Willemain (1991) used time series from the M-competition which represented actual business data. Other than the nature of time series data, this

experimental study closely followed Willemain (1989). But since the theoretical optimal forecast for the time series used in this study could not be determined, Willemain

calculated the excess error as the difference between the errors generated by the nạve method and the forecasting method in use. The results of this experimental study were in line with Willemain (1989) suggesting that judgmental adjustments led to greater

improvements in accuracy when excess error was high. The final conclusion of these two

studies was that judgmental adjustments could improve forecast accuracy only in rare cases and should not be used routinely.

Một phần của tài liệu an investigation of accuracy, learning and biases in judgmental adjustments of statistical forecasts (Trang 49 - 54)

Tải bản đầy đủ (PDF)

(333 trang)