Phase-aware Ensemble Forecasting

Phase-aware ensemble forecasting is an ensemble forecasting method that takes into account time lags among ensemble members. Associated with a phase-aware ensemble forecast is a phase-aware mean, which is the average waveform of the ensemble system. The phase-aware mean contrasts with the traditional ensemble mean because the ensemble mean is a time series whose values at each point in time are calculated by averaging ensemble member values at each time point without regard to the waveforms and time lags of the ensemble members. In other words, the phase-aware mean operates on a set of trajectories, whereas the traditional ensemble mean operates on a set of points. The phase-aware mean treats the individual ensemble members as single objects and the ensemble mean treats individual points as objects. Uncertainty around the phase-aware mean is captured by phase-aware spread. The phase-aware ensemble forecast may comprise a larger set of ensemble members than the original ensemble forecast from which the phase-aware forecast is obtained. The larger set of ensemble members is acquired by computing all possible combinations of phase and modulus spectra associated with the original ensemble members.

The difference between the ensemble and phase-aware means can be illustrated using a simple example. Suppose we have a set of N sinusoids A1sin(ft +φ1), A2sin(ft +φ2),..., ANsin(ft +φN) with equal frequencies. In this case, the phase-aware mean is

Ameansin(ft + φmean),

where Amean is the mean amplitude and φmean is the mean phase of the sinusoids. The ensemble mean, on the other hand, is only equal to the phase-aware mean when all the phases are equal. Thus, the ensemble mean is only an average waveform of the ensemble system if all the phases are equal. In other words, timing differences among ensemble members renders the ensemble mean unrepresentative of the ensemble system as a whole.

The MATLAB software to implement the phase-aware methodology can be downloaded here

Sub-ensemble Forecasting

Another way that the ensemble mean can become unrepresentative of the ensemble system as a whole is through the presence of clusters. As shown in Schulte (2017), clusters are quite prevalent in total water level (i. e tide + storm surge) forecasts. These clusters arise from the periodic nature of the tide. That is, ensemble members representing total water level predictions tend to have global maxima around local maxima in the tidal signal. As a result, global maxima associated with all the ensemble members fall into well-defined clusters, which can be visualized by making scatter plots depicting the joint representation of the global maxima values and the predicted timing of the global maxima.

The presence of the clusters has important implications to the interpretation of global ensemble statistics. For example, computing the ensemble mean of the individual total water level ensemble members in the presence of clusters can produce a mean trajectory that differs greatly from that of the individual ensemble members. Thus, if one assumes that the ensemble members represent all possible outcomes, the trajectory has a zero probability of occurring. As a concrete example, the ensemble mean could have two local maxima, whereas the individual ensemble members have only a single local maximum.

The presence of clusters also influences the probability of exceedance curves depicting how likely the total water level will exceed a specified threshold. For instance, clusters can result in probability exceedance curves depicting two possible floods, even though the individual ensemble members only exceed the flood threshold once. In other words, the presence of clusters can render the probability exceedance curves unrepresentative of the ensemble system. It is important to note, however, that clusters do no impact the interpretation of local ensemble statistics. For example, the estimated probability that a flood will occur at a single instant of time is not impacted because the number of ensemble members exceeding a threshold represents the estimated probability of a flood occurring at a single instant whether there are clusters or not. It is only when one zooms out and looks at the probability of a flood occurring at one time in relation to the probability of occurrence at all other time points (a global perspective) does knowledge of clusters becomes important. For example, a global perspective is determining if a flood could occur on Tuesday given that it has occurred on Monday. In contrast, a local perspective does not account for how a flood occurring on Monday could exclude the possibility of it occurring on Tuesday.

To remedy the representativeness of global ensemble statistics problem, it is necessary to cluster the ensemble members based on total water level peak timing and then compute ensemble statistics for each cluster individually. This process is called sub-ensemble forecasting, and the statistics based on the individual clusters are called sub-ensemble statistics (e.g. sub-ensemble mean). As shown in Schulte (2017), the sub-ensemble means are representative of the individual ensemble member trajectories, with the sub-ensemble means being the single best representation of each cluster.