Brief Overview

Wavelet analysis is a method that allows one to unfold a time series and detect features embedded in the time series such as periodicities and discontinuities. For example, wavelet analysis can be used to determine the dominant period of a sinusoid. In many applications, it is necessary to implement statistical hypothesis testing to help differentiate features that are noise from those that exceed background noise. Such methods could determine if periods of above-normal and below-normal precipitation occur in a predictable sequence, much like how a pure sinusoid fluctuates from relatively high values to relatively low values in regular intervals.

My research has focused on developing new statistical hypothesis tests in wavelet analysis that can be used to help more fully understand a variety of time series, ranging from financial times series to geophysical ones. This research has afforded me the opportunity to merge ideas from geometry, topology, and statistics. A description of my wavelet analysis research is provided below. Additional links to various software packages that I created are also provided.

Geometric Significance Testing

In a geometric significance test (Schulte et al., 2015), the area of so-called point-wise significance patches are used to assess the statistical significance of wavelet quantities corresponding to points contained in the patches. Before the test can be implemented, the set of all points in the time-scale plane whose associated point-wise test p-values are less than the point-wise significance level must be identified. That is, the set

P_pw = {(a,b):ρ(a,b) < α_pw}

is computed, where ρ is the point-wise test p-value, a is wavelet scale, and b is time. A point-wise significance patch is an equivalence class of P_pw resulting from the equivalence relation on P_pw that makes points x and y in P_pw equivalent if they can be joined by a continuous path. More specifically, if f:[0 1] → P_pw is a path, then f(0) = x and f(1) = y. Thus, a patch is really a path-component of P_pw. Intuitively, it represents the largest contiguous region of P_pw such that there is no larger contiguous region containing it.

After the computation of P_pw and the identification of patches, the actual geometric test can be performed. To perform the geometric test, first compute the area and centroid of every patch of P_pw. Using the scale-coordinate of the centroid, compute the normalized area of every patch, which is given as

A_norm = A/S²,

where S is the scale-coordinate of the centroid and A is the area of the patch. Squaring by the scale coordinate of the centroid accounts for how patches expand in both the time and scale directions. Normalizing allows patches located anywhere in the time-scale plane to be compared simultaneously. In other words, the critical value of the geometric test is independent of wavelet scale. To determine the geometric test p-value, compute a null distribution of normalized areas under some noise model (e.g. red noise for geophysical time series) and determine where on the distribution the normalized area of the patch in question falls. If α_g is the significance level of the geometric test, then the critical value of the geometric test is the 100(1- α_g)-th percentile of the null distribution.

The MATLAB software to implement the geometric significance test can be found at the MATLAB file exchange website by clicking here.

A R software package can be found on the Advanced Biwavelet Software Page .

Topological Significance Testing

Topological methods offer another method for determining the significance of features found in wavelet spectra by counting the number of holes and path-components found in the set P_pw comprising all points in the time-scale plane that are point-wise significant (Schulte et al., 2015; Schulte, 2019). More generally, the toplogical significance test evaluates statistical significance by counting the number of holes and path-components at every point-wise significance level. This idea is made precise and formal usng a method called persistent homology. Suppose that the point-wise test was performed at M point-wise significance levels such that the point-wise significance levels satisfy

α₁ ≤ α₂ ≤ ...≤ α_M

If α₁ is chosen small enough such that no wavelet quantities are statistically significant and if α_M is chosen large enough so that all wavelet quantities are statistically significant, then the sets

P_i^pw = {(b,a): ρ(b,a) < α_i}

will form a filtration of the time-scale plane. That is,

∅ = P₁^pw⊆P₂^pw,⊆...⊆P_M^pw = W,

where W is the time-scale plane. Corresponding to the filtration is a sequence of homology groups connected by homomorphisms, i.e.

0 = H_n(P₁^pw)→H_n(P₂^pw)→...→H_n(W)

The homomorphisms in homology are induced from the inclusion maps from the underlying space of P_i^pw to that of P_j^pw, where i ≤ j. Using these ideas, it can be assesed whether or not a wavelet spectrum is likely consistent with one of noise. To do so, compute the ranks of the homology groups, which are the 0 and 1-dimensional Betti numbers. For each dimension n compute the Betti numbers associated with P_i^pw, resulting in 0 and 1-dimensional persistent homology profiles. Then compare the calculated Betti numbers to a null distribution estimated using Monte Carlo methods. The toplogical significance test then determines if the number of holes or path-components at a point-wise significance level is unusually large or low. Doing this procedure at each point-wise significance level accounts for how the result of the test could depend on the chosen point-wise significance level. The persistent homology profiles as a whole could deviate from that typically associated with noise even if at given point-wise significance level the wavelet spectra in question is topologically equivalent to that of noise.

The MATLAB software to implement the geometric significance test can be found at the MATLAB file exchange website by clicking here.

Cumulative Areawise Testing

Cumulative areawise testing is similar to geometric significance testing except that the areas of patches are tracked as the pointwise significance level is changed (Schulte, 2016; Schulte, 2019). The procedure has more statistical power than existing methods.The method combines ideas from persistent homology in algebraic topology with ideas from geometric significance testing. More specifically, it was shown in Schulte (2019), the output of the cumulative area-wise testing procedure is the mean of individual estimates of statistical significance calculated from the geometric test applied at a set of point-wise significance levels. Thus, the cumulative area-wise test is an ensemble method in which the cumulative area test statistic is used to filter out noise associated with the geometric test applied at a single point-wise significance level. In other words, the individual geometric test results can be identified with ensemble members and the cumulative area-wise test result can be identified with the ensemble mean of the ensemble members.

The MATLAB software to implement the cumulative areawise significance test can be found at the MATLAB file exchange website by clicking here.

Higher-order Wavelet Analysis

For nonlinear time series, it is necessary to use higher-order spectral analysis to examine higher-order moments such as skewness and kurtosis in frequency space (Schulte, 2016). Higher-order wavelet analysis can be used to quantify cycle geometry such as skewness and asymmetry

The MATLAB software to implement higher-order wavelet analysis can be found at the MATLAB file exchange website by clicking here.

Phase-aware Ensemble Forecasting

Phase-aware ensemble forecasting is an ensemble forecasting method that takes into account time lags among ensemble members. Associated with a phase-aware ensemble forecast is a phase-aware mean, which is the average waveform of the ensemble system. The phase-aware mean contrasts with the traditional ensemble mean because the ensemble mean is a time series whose values at each point in time are calculated by averaging ensemble member values at each time point without regard to the waveforms and time lags of the ensemble members. In other words, the phase-aware mean operates on a set of trajectories, whereas the traditional ensemble mean operates on a set of points. The phase-aware mean treats the individual ensemble members as single objects and the ensemble mean treats individual points as objects. Uncertainty around the phase-aware mean is captured by phase-aware spread. The phase-aware ensemble forecast may comprise a larger set of ensemble members than the original ensemble forecast from which the phase-aware forecast is obtained. The larger set of ensemble members is acquired by computing all possible combinations of phase and modulus spectra associated with the original ensemble members.

The difference between the ensemble and phase-aware means can be illustrated using a simple example. Suppose we have a set of N sinusoids A₁sin(ft +φ₁), A₂sin(ft +φ₂),..., A_Nsin(ft +φ_N) with equal frequencies. In this case, the phase-aware mean is

A_meansin(ft + φ_mean),

where A_mean is the mean amplitude and φ_mean is the mean phase of the sinusoids. The ensemble mean, on the other hand, is only equal to the phase-aware mean when all the phases are equal. Thus, the ensemble mean is only an average waveform of the ensemble system if all the phases are equal. In other words, timing differences among ensemble members renders the ensemble mean unrepresentative of the ensemble system as a whole.

The MATLAB software to implement the phase-aware methodology can be downloaded here

Cumulative Arcwise Testing of Global Wavelet Spectra

The cumulative arcwise testing is the lower-dimensional version of cumulative areawise testing (Schulte et al., 2018). The cumulative arcwise test resolves two drawbacks of traditional pointwise testing of global wavelet spectra. The first problem is that adjacent wavelet quantities are correlated so that statistically significant results will tend to cluster and form pointwise significance arcs or peaks The cumulative arcwise test treats the pointwise significance arcs as single objects and evaluates the statistical significance of the arcs based on the arc length. The arc length is an integrated metric that takes into account the width of the pointwise significance arc in frequency space and the height of the peak relative to the critical level of the pointwise test. Because the result of the arcwise test could be sensitive to the choice pointwise significance level, the cumulative arcwise test tracks the arc length of a pointwise significance arc as the pointwise significance level changes. The arc length of a pointwise significance arc will always decrease as the critical level of the pointwise test increases (as the pointwise significance level decreases, say, from 0.1 to 0.05).

The MATLAB software to implement the cumulative arcwise test can be found by clicking clicking here.

Wavelet Analysis Page