Factors Affecting the Accuracy of Judgmental Adjustments

Một phần của tài liệu an investigation of accuracy, learning and biases in judgmental adjustments of statistical forecasts (Trang 59 - 72)

The previous section described how the conflicting results from studies on

judgmental adjustments could be reconciled by considering the domain knowledge of the forecaster. In other words, judgmental adjustments result in an increase in accuracy when the forecaster possesses relevant domain knowledge. Otherwise, judgmental adjustments may deteriorate accuracy of statistical forecasts. In addition to domain knowledge, there are other factors that may affect the accuracy of judgmental adjustments, including feedback, incentive, excess error and task structure. The following studies have identified some of these factors.

In an experimental study, Angus-Leppan and Fatseas (1986) used short-term interest rate data and had undergraduate accounting students perform several forecasting tasks using various forecasting methods. During the first five tasks, the subjects did not have any knowledge about the nature and source of data but these details were revealed to them in tasks 6 and 7. The subjects were given 48 monthly data points and were asked to perform the following tasks:

1. “Eyeballing”: The subjects visually examined the data in a tabular form and made judgmental forecasts for the next 12 months.

2. Graphical: The subjects plotted the data and extrapolated their forecasts for the next 12 months.

3. Eyeballing and Graphical: The subjects combined their forecasts obtained in task 1 and 2 by arriving at a new forecast using the first two forecasts.

4. Statistical: Using a variety of statistical forecasting methods available to them through a software package, the subjects produced forecasts for the next 12 months.

5. Combination: The subjects combined eyeballing, graphical and statistical forecasts to form a combination forecast for the next 12 months.

6. Best Estimate: The details of the data series were revealed to the subjects at this step. Subjects were told that the data represented the short-term interest rates for specified dates. Furthermore, the familiarity of the subjects with the short-term interest rates and forecasting task itself were also recorded at this step. Task 6 consisted of subjects adding their knowledge about the nature of the time series to the statistical forecasts. Hence, this step corresponds to judgmental adjustments of statistical forecasts. Furthermore, information about the nature of the time series could be considered as task properties feedback and would improve the performance of forecasters. As a result, the authors hypothesized that the “Best Estimate” would outperform any

particular judgmental or statistical forecast generated in the preceding tasks.

7. Long-term Forecasts: The subjects were given actual interest rate figures for their forecast horizon and were able to evaluate the performance of the various methods that they had used. Then, they were asked to select the best

forecasting method to generate forecasts for a forecast horizon of nine months into the future.

The authors formulated two hypotheses related to task 6 and compared the performance of Best Estimate, which is the combination of contextual information and

statistical forecasts, with the performance of (1) purely statistical forecasts and (2) purely judgmental forecasts. In the first case, purely statistical forecasts outperformed Best Estimate consistently over the 12 month forecasting horizon. However, the results were mixed in terms of judgmental forecasts. For the first six months, judgmental forecasts, specifically the graphical method, outperformed Best Estimate while Best Estimate outperformed all judgmental methods for the second six months. It is important to note that when subjects used the statistical forecasting package they chose different statistical methods and combined the results with their contextual knowledge. Therefore, it is difficult to make a sweeping generalization from these findings.

Matthews and Diamantopoulos (1990) investigated the question whether or not managers who judgmentally adjusted forecasts could identify those forecasts that had excessive error. In their study, they compared the distribution of forecast errors of forecasts adjusted by managers with those of forecasts that were not adjusted by

managers. The tests showed that the two sets of errors came from different distributions and the judgmental adjustments led to an overall improvement. Hence, the authors concluded that the managers could detect forecasts with excessive errors and adjust them while keeping forecasts unchanged where statistical methods performed adequately.

Furthermore, Matthews and Diamantopoulos (1990) document an optimism bias in managers’ judgmental adjustments. They argued that the managers who adjusted these forecasts were ultimately responsible for the performance of these products in terms of sales volume and therefore were more inclined to accept higher-end (higher than average) forecasts and revise lower-end (lower than average) forecasts. Yet, they have not

provided conclusive evidence for their explanation.

Wolfe and Flores (1990) conducted experiments where forecasters made judgmental adjustments by following the Analytical Hierarchy Process3. Their experiment involved generating earnings forecasts for companies using statistical

methods and then having these forecasts adjusted by expert (corporate loan officers) and non-expert (MBA students) subjects. As in previous studies, the judgmental adjustments led to an overall improvement in forecast accuracy. While the authors failed to find a significant difference between the improvement levels attained by expert and non-expert subjects, they cautioned against generalizing this finding. In line with Matthews and Diamantopoulos (1990), they observed that the improvement in forecasting accuracy occurred not in periods when the subjects had the most information, but in periods when the statistical forecasts were not accurate. In other words, judgmental adjustments were most effective in periods where the quality of statistical forecasts were low and the data variability was high. These results tie closely with two studies by Willemain (1989, 1991).

Willemain (1989, 1991) obtained similar results in studies where the subjects made judgmental adjustments using graphical data presentation. He defined “excess error” to distinguish between good and poor forecasts where the excess error refers to the increased error when the chosen forecasting method performs worse than the nạve forecasting method or the theoretically optimal forecasts when such forecast is known.

The results indicate that the judgmental adjustments were most useful when the excess

3 The Analytic Hierarchy Process (AHP) was developed to reduce the complexity of multi-criterion decision making. AHP is a structured approach to complex decisions where the alternatives are compared one-on-one through a series of steps and the results are synthesized to reach a final decision. The structure of AHP allows the decision makers not only select the best alternative but also see the rationale behind the decision. (Saaty, 1980)

error was the largest. In other words, judgmental adjustments could compensate for excess error when the statistical forecasting methods performed poorly.

Lim and O’Connor (1996) investigated the effects of contextual knowledge, to which they referred as “causal” information, validity of causal information and learning in repetitive forecasting tasks. In this experimental study, they generated an artificial time series representing soft drink sales. The sales were a function of two “causal” variables:

(1) weather temperature and (2) number of visiting tourists. The validity of causal

information refers to the correlation between a variable and the independent variable, that is, the soft drink sales. The subject generated judgmental forecasts based on graphical plots. Furthermore, some experimental groups were given statistical forecasts, causal information or both. The experimental conditions are shown in Table 2. The subjects performed their respective tasks 30 times with 3 sets of 10 repetitions.

Experimental

Conditions Time Series

Information Statistical

Forecast Causal Information

Judgmental Extrapolation Yes No No

Judgmental Adjustment Yes Yes No

Causal Adjustment Yes No Yes

Mixed Adjustment Yes Yes Yes

Source: Adopted from Lim and O’Connor (1996)

Table 2: Information provided to experimental groups

The results of this experiment suggest that adjusting a judgmental forecast with causal information, statistical forecast or a combination of the two will improve forecast accuracy. The subjects tended to use both types of information but put more weight on statistical forecasts, even when causal information had high-validity, that is, a high correlation with the variable to be forecast.

Moreover, both low-validity and high-validity causal information improved forecast accuracy. As expected, high-validity causal information was more effective than low-validity causal information. The subjects were able to distinguish between low- validity and high-validity information by relying on them to varying degrees.

Finally, the authors analyzed the learning effects. The results show that the accuracy of adjustments decreased over time. Furthermore, the weights that the subjects placed on high and low validity cues changed little over time, indicating a slow learning process. Hence, the authors conclude that the subjects did not learn effectively by performing forecasting tasks repetitively.

This study has a number of limitations. First, the time series and causal

information provided to the subjects were artificial in an overly simplistic fashion. That is, there was only one cue and two levels of validity. In practice, a forecaster is exposed to a multitude of cues with varying degrees of validity which increases the mental load on the forecaster and may lead to different results. Second, the repetitions may not have been enough to simulate the long-term conditions in an actual business setting. Third, the judgmental forecast preceded the statistical forecasts. This research considers the opposite sequence where statistical forecasts precede judgmental adjustments.

Some researchers have looked at the effect of different types of feedback on judgmental forecasting. Remus, O’Connor and Griggs (1996) conducted an experiment where the subjects were asked to make repetitive judgmental forecasts while they received different types of feedback. The authors used five types of feedback:

Simple Outcome Feedback: The subjects received the actual value after making a forecast.

Performance Outcome Feedback: The subjects were provided with an error measure, MAPE4, after making a forecast. This feedback came in two versions: graphical and qualitative.

Task Feedback: Task feedback consisted of information on the structure of the time series to be forecast. For example, this feedback would indicate the upward or downward trend of the time series.

Task Feedback With Cognitive Information Feedback: This type of feedback included task feedback and cognitive information feedback which indicated behaviors necessary to improve forecast accuracy. For example, cognitive information feedback would tell the forecaster that the time series is flat but he or she is overreacting to random noise in the time series.

The results of this experiment showed that task information feedback was the most effective type of feedback for improving the accuracy of judgmental forecasts.

Furthermore, combining task feedback with cognitive information feedback did not significantly improve performance. However, this study has a few limitations. First, the data were generated artificially. Second, the subjects were undergraduate students who had no expertise or domain knowledge. Third, the repetitions were constrained to a small number of forecasts. Hence, the long-term performance of the subjects was not clearly established.

Remus, O’Connor and Griggs (1998) also conducted an experiment on the effects of financial incentives of the accuracy of judgmental forecasting. In this experiment, subjects had to make judgmental forecasts based on artificially generated time series

4 MAPE: Mean absolute percentage error

given financial incentives. The time series were either flat, upward trending or downward trending. The financial incentives were fixed ($5 per subject) or variable ($20 for the best forecaster, $15 for the second and $10 for the third). The time series were free of any context (such as sales, prices, etc.) and the subjects were undergraduate students. The results of the experiment indicated that financial incentives did not have any significant effect on the performance of judgmental forecasting. However, the authors failed to propose any explanation for this result and noted that these results may not be generalized for forecasting practitioners in actual business settings.

Goodwin and Fildes (1999) considered the effects of sporadic events on

judgmental forecasting. In an experimental study where students were used as subjects, they artificially generated sales figures with random variation and promotional periods.

The sales were relatively stable in normal periods, increased during promotional periods and decreased during post-promotion periods. Furthermore, the subjects were provided with statistical forecasts and feedback about the structure of promotional effects, that is, how the sales might be affected by promotions. The authors had two expectations. First, they were expecting that the subjects would use the statistical forecasts to form a good basis for their judgmental forecasts. Second, they were expecting that the subjects would use the information about promotions as additional contextual information to improve the forecast accuracy. However, the results indicated that the subjects used the statistical forecasts in a very inefficient way. First, they modified statistical forecasts when statistical forecasts were fairly reliable. Second, they ignored statistical forecasts when statistical forecasts would provide a good basis for judgmental adjustments. Moreover, providing feedback about promotional effects did not improve forecast accuracy in a

consistent manner. These results suggest that the subjects did not make an efficient use of additional information (statistical forecasts and feedback) in improving the accuracy of judgmental forecasts. Although the authors do not provide a comprehensive explanation for these results, they mention the phenomenon of “deluded self-confidence in human judgment” (Kleinmutz 1990). Decision makers who feel that they have expertise in a domain tend to believe that they can improve the output of a decision aid that is already performing well. In this experiment, the subjects may have been inclined towards adjusting the “good” statistical forecasts provided to them and thereby deteriorate their accuracy.

Again, this study comes with a number of limitations. The time series under study was artificially generated and the students were used as subjects. It is not clear whether the subjects had a clear strategy for integrating various pieces of information. In fact, since the students lacked forecasting experience and had little understanding of how various variables affected each other they may have failed to combine the various pieces of information effectively.

Goodwin (2000) analyzed the effects of various methods of eliciting judgmental adjustments from forecasters on the accuracy of revised forecasts. In this experimental study, he generated a time series data with a trend, seasonality and sudden changes in patterns. The sudden changes in pattern were modeled after sales promotions where the sales spike for the periods when the promotion is in effect and decrease in the periods immediately after that. The judgmental adjustments were elicited from subjects in three different ways:

Change: The forecaster was asked for confirmation every time he or she wanted to adjust a forecast. In other words, the default was not to make any judgmental adjustments and the forecaster had to indicate explicit intent to adjust the forecasts judgmentally.

Adjustment: Instead of entering a new forecast, the forecasters were asked to enter the size of the adjustment for the new forecast.

Reason: The forecaster was asked to provide an explanation for the judgmental adjustment.

The results of the experiment indicated that these different ways of eliciting judgmental adjustments improves the accuracy of judgmental adjustments. However, the study also indicated that if left unchanged the statistical forecasts would have been more accurate. It should be noted that this study was conducted with the participation of students as subjects and artificially generated data series as part of the forecasting task.

Furthermore, the finding that statistical forecasts were more accurate than all of the judgmental adjustments limits the generalizability of the results. First, the very premise of judgmental adjustments is to improve forecast accuracy by incorporating information that the statistical forecasting models ignore. If the judgmental adjustments decrease forecast accuracy, the question of which particular method of elicitation is best is irrelevant because it is best to use statistical forecasts only. The time series and the judgmental adjustments used by the study are therefore inappropriate to draw conclusions from this study to be generalized to an actual business setting.

In a quantitative analysis of accuracy in judgmental forecasting, Stewart (1994) expressed the accuracy in terms of mean squared error (MSE) and decomposed it using

the Brunswik’s lens model5 (see Figure 6).This model illustrates that many factors can improve the accuracy of judgmental forecasts.

Source: Adapted from Stewart (1994)

Figure 6: An expanded lens model

He calculates the MSE for judgmental forecasts and a reference forecast, where the forecast for each period is the same and equal to the average of observed values for all periods. Thus, the MSE for judgmental forecasts is defined as:

∑ −



 

=1 ( )2

i i

Y Y O

MSE n

where n is the number of observations, Yi is the judgmental forecast and Oi is the actual observed in period i. Similarly, the MSE for the reference forecast is defined as:

5 For a description of the Brunswik’s lens model, refer to Chapter 3.

Observed Event

Forecaster’s Judgment Cues Subjective

Cues True

Descriptors

T X U

∑ −



 

=1 ( )2

i

B O O

MSE n

where Ois the average of the observed actuals. Then the skill score is defined as:



 

−

=

B Y

MSE SS 1 MSE

If the forecast is perfect, the SS will be equal to one. If the forecast is as accurate as the reference forecast, the SS will be zero and for forecasts less accurate than reference forecast the SS will be negative. By further decomposing the skill score, Stewart obtained the following:

2 2

2 , ,

,

, )

( 

 

 −

 −

 

 −

O O

YO Y U

Y X U X

T T

O s

O Y s

r s R

V G V R SS

Components of Skill:

1. Environmental predictability 2. Fidelity of the information system

3. Match between environment and forecaster 4. Reliability of the information acquisition 5. Reliability of information processing 6. Conditional/regression bias

7. Unconditional/base rate bias

Based on the expanded lens model and the decomposition of skill score equation, Stewart suggests measures to improve judgmental forecasting performance, as

summarized in Table 3.

1 2 3 4 5 6 7

Skills

Method for Improving Forecasts 1 2 3 4 5 6 7

Identify new descriptors through research X

Develop better measures of true descriptors X

Train forecaster about the environmental system X

Experience with forecasting problem X X X

Cognitive feedback X

Train forecaster to ignore non-predictive cues X

Develop clear definition of cues X

Training to improve cue judgments X

Improve information displays X

Bootstrapping – replace forecaster with model X

Combine several forecasts X

Require justification of forecasts X X

Decompose forecasting task X

Mechanical combination of cues X

Statistical training X X

Feedback about the nature of biases in forecast X X

Search for discrepant information X

Statistical correction for bias X X

Source: Adapted from Stewart (1994)

Table 3: Components of skill addressed by selected methods for improving forecasts

Structuring the process of generating judgmental adjustments is shown to be effective in improving the accuracy of judgmental adjustments. (Sanders, 2001) The cognitive capacity of humans is limited and an overload of information can decrease the effectiveness of judgments. Therefore, structure facilitates the process by which

judgmental adjustments are made (Lim and O’Connor, 1996). Edmundson (1990) developed a computer software package to aid judgment in forecasting by leading forecasters step by step through the forecasting process. This structured approach led to improvements in accuracy. Similarly, subjects who were asked to generate judgmental

adjustments by using the Analytical Hierarchy Process achieved higher accuracy due to this structured approach (Wolfe and Flores, 1990).

Một phần của tài liệu an investigation of accuracy, learning and biases in judgmental adjustments of statistical forecasts (Trang 59 - 72)

Tải bản đầy đủ (PDF)

(333 trang)