Phase-aware ensemble forecasting is an ensemble forecasting method that takes into
account time lags among ensemble members. Associated with a phase-aware ensemble
forecast is a phase-aware mean, which is the average waveform of the ensemble
system. The phase-aware mean contrasts with the traditional ensemble mean because the
ensemble mean is a time series whose values at each point in time
are calculated by averaging ensemble member values at each time point without regard
to the
waveforms and time lags of the ensemble members. In other words, the phase-aware mean
operates on a set of trajectories, whereas the traditional ensemble mean operates on
a set of points. The phase-aware mean treats the individual ensemble members as
single
objects and the ensemble mean treats individual points as objects. Uncertainty
around
the phase-aware
mean is captured by
phase-aware spread. The phase-aware ensemble forecast may comprise a larger set of
ensemble members than the original ensemble forecast from which the
phase-aware forecast is obtained. The larger set of ensemble members is acquired by
computing all possible combinations of phase and modulus spectra associated with the
original ensemble members.
The difference between the ensemble and phase-aware means can be illustrated using a simple example. Suppose we have a set of N sinusoids A1sin(ft +φ1), A2sin(ft +φ2),..., ANsin(ft +φN) with equal frequencies. In this case, the phase-aware mean is
Another way that the ensemble mean can become unrepresentative of the ensemble system as a whole is through the presence of clusters. As shown in Schulte (2017), clusters are quite prevalent in total water level (i. e tide + storm surge) forecasts. These clusters arise from the periodic nature of the tide. That is, ensemble members representing total water level predictions tend to have global maxima around local maxima in the tidal signal. As a result, global maxima associated with all the ensemble members fall into well-defined clusters, which can be visualized by making scatter plots depicting the joint representation of the global maxima values and the predicted timing of the global maxima.
The presence of the clusters has important implications to the interpretation of global ensemble statistics. For example, computing the ensemble mean of the individual total water level ensemble members in the presence of clusters can produce a mean trajectory that differs greatly from that of the individual ensemble members. Thus, if one assumes that the ensemble members represent all possible outcomes, the trajectory has a zero probability of occurring. As a concrete example, the ensemble mean could have two local maxima, whereas the individual ensemble members have only a single local maximum.
The presence of clusters also influences the probability of exceedance curves depicting how likely the total water level will exceed a specified threshold. For instance, clusters can result in probability exceedance curves depicting two possible floods, even though the individual ensemble members only exceed the flood threshold once. In other words, the presence of clusters can render the probability exceedance curves unrepresentative of the ensemble system. It is important to note, however, that clusters do no impact the interpretation of local ensemble statistics. For example, the estimated probability that a flood will occur at a single instant of time is not impacted because the number of ensemble members exceeding a threshold represents the estimated probability of a flood occurring at a single instant whether there are clusters or not. It is only when one zooms out and looks at the probability of a flood occurring at one time in relation to the probability of occurrence at all other time points (a global perspective) does knowledge of clusters becomes important. For example, a global perspective is determining if a flood could occur on Tuesday given that it has occurred on Monday. In contrast, a local perspective does not account for how a flood occurring on Monday could exclude the possibility of it occurring on Tuesday.
To remedy the representativeness of global ensemble statistics problem, it is necessary to cluster the ensemble members based on total water level peak timing and then compute ensemble statistics for each cluster individually. This process is called sub-ensemble forecasting, and the statistics based on the individual clusters are called sub-ensemble statistics (e.g. sub-ensemble mean). As shown in Schulte (2017), the sub-ensemble means are representative of the individual ensemble member trajectories, with the sub-ensemble means being the single best representation of each cluster.