Multiscale Statistics for Evolving Complex Systems
An easy-read on multiscale mehodologies
relevance of multiscale modeling
It has for long been
efficient and accurate modeling
of various real world phenomena needs to incorporate the fact that
made on different scales each carry essential information. In most
terms, representing data on large scales by its mean is often useful
as as `average income' or an 'average number of clients per day') but
be inappropriate. Examples include the arrival of clients in a waiting
queues (Fig. 1), stock returns and indices, local connectivity in a
large scale heterogeneous ad hoc network, geophysical well-data,
measured velocity in fully developed turbulence and many more.
|Real WAN Traffic
Fig 1: The multiplicative Multifractal wavelet model outperforms the
additive Gaussian model
Classical models of time series such as
Poisson and Markov
rely heavily on the assumption of independence, or at least weak
So do classical limit theorems such as the Law of Large Numbers which
that at large scales a Poisson process can safely be approximated by
mean arrival rate. However, in real world situations one is confronted
with data traces which are `spiky' and `bursty' even at large scales.
a behavior is caused by strong dependence in the data: large
tend to come in clusters, and clusters of clusters, or by infinite variance effects. This is
for the classical data of the River Nile as for modern high speed
networks and has obvious and far-reaching consequences ranging from
and buffer design to bandwidth allocation to name but a few.
Fig 2: Nasdaq, S&P500 and Dow Jones
indices show rich detail at fine and coarse time granularity
Aimed at modeling at multiple scales,
fractional Brownian motion (fBm)
is able to capture and incorporate long range dependence (LRD) at least
in terms of second order correlations. In this attractive fractal
model, LRD is directly coupled to a rescaling property which makes fBm
look `statistically identical
' on all scales. Being a Gaussian
fBm is a natural candidate for an approximation of LRD-traces, at least
at large scales due to the Central Limit Theorem (CLT) which states
that sums of random variables (ie. highly aggregated or large-scale
tend to the Normal distribution.
Self-similarity: Statistical Scaling of FBM
The scaling of the Gaussian FBM-process can capture Long Range
Yet, it may provide a poor approximation to certain real world data.
Like fractal models, multifractals are
inherently multiscale objects
with strong rescaling properties, but with the essential difference of
being built on multiplicative schemes. Naturally, they are
non-Gaussian and are ruled by different limiting laws than the additive
CLT, more precisely by martingale theorems. Moreover, multifractals
exactly that `spiky' and `bursty' appearance which we encounter now in
many real world situations such as Internet traffic loads, web file
data of interest in turbulence at high Reynolds numbers and in
Geophysics, to name but a few.
The LRD built into fractal models helps
address strong correlations
and high variability, especially on large scales or aggregation levels.
Bursts, however, are much better understood in terms of a local
scaling analysis which is designed to capture not only an overall
but rather also rare events such as bursts. This is exactly
the multifractal spectrum f(a) provides in terms of a large
principle: given the strength of a burst measured with the
the value f(a) indicates how frequently this strength a
be encountered. The larger f(a) the more often one will see
Effectiveness and accuracy of
multiplicative, multifractal models
We make the case for multifractal
modeling with the example of network traffic modeling in Fig 4.
| Variance time plots indicate that both
and the fGn exactly match of the second-order correlation structure of
the Bellcore data of Fig 1.
Yet, there is a clear difference in the quality of approximation of the
two models which we address now.
|The Multifractal spectrum measures the
"burstiness" of a time trace through the scaling of its higher-order
moments. Note the close match between the spectra of the MWM and
Bellcore data, particularly in the left half of the curves. This is a
quantitative explanation for the close visual match we see above. In
contrast, the fGn match is poor. The MWM comes with its own toolbox
of powerful multifractal analysis techniques for characterizing
traffic burstiness and long-range-dependence.
|More important than either the visual
or multifractal spectrum match is the fact that the MWM synthesized
data has nearly the same queuing behavior as the real data,
again in contrast to the fGn model. Here we plot the queue overflow
probability as a function of queue size.
Exploiting the tree-structure of the MWM we have developed an
formula for the queuing behavior of MWM traffic, which will be
vital for online
traffic characterization and performance prediction, as well as an
algorithm to infer competing cross traffic along a path
through the internet.
Fig 4: Explaining the superiority of multiplicative traffic