Disclaimer: Not certain this is appropriate to EP. This is in the way of an introduction to some aspects of modeling and might be a bit arcane. It was inspired by a comment by RebelCapitalist and deals with econometric modeling. Enjoy.
Let's say we're interested in estimating the likelihood of some event x happening. x might be a loan default, upcoming regulations, getting hit by lightning, whatever. If we happen to know something about x, we can assign some probability P(x), play the numbers, and improve our chances of a good outcome. That's a big "if", and unless P = 1 we can still lose; still, that's the best we can do.
But what if x depends on some other event? Your chances of getting hit by lightning are different on a dry clear day in a cave vs standing on an aluminum stepladder holding up a golf club outside in a thunderstorm. That is, what is P(x|y), the probability P of x given some circumstance y? What we know about y informs what we know about x. This is the field of conditional probability, first formalized by Thomas Bayes, an 18th century English clergyman who gave his name to a whole field of study in Information Science.
In this post I'm interested in a particular subset of that same question. What is the likelihood of x at some point t occurring given x at some previous point? If we extend the history of x all the way to the origin we need to assess the following conditional probability
P(x(tn) | x(tn-1), x(tn-2), ..., x(t1))
That is, what is the probability of finding x at t given x at t-1, which depends on x at t-2, which depends on ... x at the origin, t1. It becomes helpful to classify four cases of this conditional probability.
Purely random process
For a purely random process the previous history is completely irrelevant. We can then write
P(x(tn) | x(tn-1), x(tn-2), ..., x(t1)) = P(x(tn))
An example of this is the coin toss. You have an equal likelihood of getting a heads or a tails on the next toss, and it doesn't matter whether you just got 15 heads in the previous 15 tosses, the odds of getting heads on the next toss is still 0.5 (although after some point you might wonder if the coin or the toss is rigged). Now the chances of getting 15 straight heads is extremely small, but your chances of coming up heads next toss is 0.5.
If there are two possible independent outcomes (like a coin toss or a true/false) you get a binomial distribution, which in the limit of large n converges to a Gaussian. 15 heads in a row is in the far wing of that distribution. But your chances of getting another head then is still 50%.
For a deterministic process, the likelihood of finding x at tn is exactly determined by x at tn-1. Following that back we find that the initial condition x(t1) determines the trajectory of x. In other words
P(x(tn) | x(tn-1), x(tn-2), ..., x(t1)) = P(x(tn) | x(tn-1), x(tn-2), ..., x(t1))
If we also know the initial conditions of everything else in the space that x inhabits, we know in principle the future history of everything in the space forever and ever amen. It might get complicated, in fact it might get chaotic and if we go numerical the finiteness of our computational word size might limit our ability to predict very far into the future of the system, so not all is roses here. And as far as we know nothing is truly purely deterministic (although the various Copenhagen nonbelievers -- most famously Einstein and Schrodinger -- might argue otherwise.) But to a really good approximation many things are: you can hit a battleship from a pitching, moving platform 15 miles away; if you drop a pencil it will fall; in 23000 years the North Pole will point to the star Thuban in the constellation Draco; etc.
Now it gets interesting. The first two cases are the extrema of stochastic processes (characterized by one or more random variables, where the "randomness" in the second case = 0, that is, the stochastic variables have constant temporal autocorrelations ) in general.
The temporal autocorrelation of a stochastic process describes how it “decorrelates” with itself as time progresses, both in timescale and form. It turns out to be a powerful analytical tool: from it you can infer dynamics of the underlying stochastic process, specifically timescales and relative timescales. It also characterizes the nature of the process, and there are four cases here: a delta function, for purely random processes; a constant, for deterministic processes; an exponential decay, for Markovian processes; and everything else, for non-Markovian processes. At least in science and engineering, this analysis is usually done in the frequency domain (spectral analysis).
A Markov process is one step back from purely random:
P(x(tn) | x(tn-1), x(tn-2), ..., x(t1)) = P(x(tn) | x(tn-1))
This is also known as a Markov chain, after Andrey Markov, a Russian mathematician of the late 19th and early 20th centuries. Where x goes next depends only on the present. Note that that does not mean determined by the present -- if that were the case we would be able to follow its trajectory back to the beginning -- it means that the likelihood of finding x at t depends only on where x is at t-1. Because of this Markov processes are said to be memoryless. A true random walk is an example of a Markov process. Brownian motion is a Markov process.
If the random variables characterizing x are both Markovian and Gaussian (that is, they are normally distributed) then the temporal autocorrelation of x is given by an exponential decay. This is the signature of a Markov process (if the work is in the frequency domain that will show up as a Lorentzian spectral density). Nature is full of exponential decays -- Markov processes are quite common.
Note the constraints placed on the random variable(s) of x. Let's use Brownian motion as an example. Its equation of motion (describes behavior in time) is the Langevin equation:
F = ma = m dx2/d2 t = m dv/dt = - Df v + A(t)
Physically as a particle moves through the fluid, it feels a drag force Df (it takes work to push through the molecules making up the liquid, hence the negative sign) and a stochastic "bumping" force A(t), from collisions with surrounding molecules (I'm guessing the analogue in finance is buying and selling of a bond, security, whatever). The stochastic term A(t) has to be rapid in time (more precisely have ~delta function autocorrelation on the timescale of observation) and be small in amplitude. A big bump might suddenly send the particle into a very non-Brownian jump and be felt some time down the line, which violates the memoryless condition. A long slow bump is like a directed push rather than a bump, and if it's long enough and slow enough it too violates the memoryless condition (x(tn) depends on x(tn-2) or even further back.) Langevin equations (there are actually many forms) turn out to describe a huge number of phenomena, and Brownian motion is used to model all sorts of stochastic motion, from radioactive decay to laser lineshapes to thermal diffusion to finance and econometrics. Entire books have been written about them.
RebelCapitalist provided a paper (in the thread that inspired this post) which explicitly uses a Langevin equation to model bond yields. It's a little difficult to read because the author does not define his notation (! - leaves it in a reference) but the solution is a product of exponential decays – as it had to be because of the Markovian assumption. It is also difficult to assess how good a model this is because the author does not validate it against data (!!). But we can invert that: if an observed autocorrelation does NOT exhibit exponential decay, the underlying process is not Markovian, and we'll need a different model. It also means that if the fluctuations in your system are not small and rapid, you cannot use a Markov model; if you do use one, you've made a statement about those fluctuations. A quick web trawl brings up any number of Markovian finance and econometric models.
Take home messages about Markov processes: memoryless, exponential autocorrelation, rapid and small-amplitude stochastic term, well understood.
Now it gets messy, because a non-Markovian process is characterized by a conditional probability that is not Markovian (and, by convention, not purely random or deterministic). It is like saying a system is nonlinear. The conditional probability chain may be truncated at any point, and the weighting of events at various ti may be quite different and may be conditionally different. There are an infinite number of possibilities here, and you have to come up with the one that accurately describes the dynamics of your system. In certain fields of study, where the variables can be assumed to be Gaussian (at least to a good approximation) and where the process is stationary over the timescale of interest, you can ignore the mechanism and use the observed autocorrelation. If it's more convenient you might rather use its Fourier transform , the spectral density. The alternative is to come up with a mechanism (conditional probability), find an observable -- the autocorrelation or spectral density -- and show it fits the data to at least the noise level. In general non-Markovian problems are Very Hard and attempts to address them frequently devolve to speculative flailing (if this is different in finance and econometrics I'd like to know). The difficulty is not in coming up with a specific conditional probability -- that's easy: coming up with one that is realistic is the problem.
It may not be obvious, but if the system is stationary for long enough, the system will become Markovian if you expand the timescale of observation. That's because fluctuations that are non-Markovian when looking on one timescale start looking more and more delta-function like as you expand your timescale. Stochastic processes converge to exponentials, if you can wait long enough (which is, admittedly, a very big if, but handy when modeling). As for the market, that's not stationary (for one thing it keeps growing), but over certain time domains it may look stationary, though that might take some creative detrending.
There's another wrinkle here. The market is self-aware – I think it is fair to say that it is conditionally autoregressive, sometimes in strange ways. Sometimes everybody's buying because, well, everybody's buying; sometimes it gets spooked for no very good reason. Obviously neither behavior is Markovian, at least on those timescales. It can be selectively amnesiac (leverage post-LTCM) or have a very long memory (Great Depression) depending on whether times feel good or not. As an aside, the Great Depression was obviously a hugely non-Markovian event. The conditional autoregressiveness can tap events from well back in history like the GD, which makes for interesting model challenges.
Take home messages about non-Markovian systems: messy, memory effects, realistic, not a specific case so no general solutions, best to stick to observables (autocorrelation or spectral density), timescales important.
A couple of closing thoughts. When looking at Markovian stochastic models (easily recognizable by its autocorrelation; or if it says Langevin equation or random walk) ask if the constraints on fluctuations make sense within the limits of the model. If the model is non-Markovian, how realistic is the mechanism (conditional probability), and – critically – how well do the predictions match the data? Also, comments and criticisms from an econometric perspective welcome – what may be important where I come from may be irrelevant here, and what may be important in this field I may have no clue about.