Bayes, Markov, and Conditional Probability in Finance Models

Disclaimer: Not certain this is appropriate to EP. This is in the way of an introduction to some aspects of modeling and might be a bit arcane. It was inspired by a comment by RebelCapitalist and deals with econometric modeling. Enjoy.

Let's say we're interested in estimating the likelihood of some event x happening. x might be a loan default, upcoming regulations, getting hit by lightning, whatever. If we happen to know something about x, we can assign some probability P(x), play the numbers, and improve our chances of a good outcome. That's a big "if", and unless P = 1 we can still lose; still, that's the best we can do.

But what if x depends on some other event? Your chances of getting hit by lightning are different on a dry clear day in a cave vs standing on an aluminum stepladder holding up a golf club outside in a thunderstorm. That is, what is P(x|y), the probability P of x given some circumstance y? What we know about y informs what we know about x. This is the field of conditional probability, first formalized by Thomas Bayes, an 18th century English clergyman who gave his name to a whole field of study in Information Science.

In this post I'm interested in a particular subset of that same question. What is the likelihood of x at some point t occurring given x at some previous point? If we extend the history of x all the way to the origin we need to assess the following conditional probability

P(x(tn) | x(tn-1), x(tn-2), ..., x(t1))

That is, what is the probability of finding x at t given x at t-1, which depends on x at t-2, which depends on ... x at the origin, t1. It becomes helpful to classify four cases of this conditional probability.

Purely random process
For a purely random process the previous history is completely irrelevant. We can then write

P(x(tn) | x(tn-1), x(tn-2), ..., x(t1)) = P(x(tn))

An example of this is the coin toss. You have an equal likelihood of getting a heads or a tails on the next toss, and it doesn't matter whether you just got 15 heads in the previous 15 tosses, the odds of getting heads on the next toss is still 0.5 (although after some point you might wonder if the coin or the toss is rigged). Now the chances of getting 15 straight heads is extremely small, but your chances of coming up heads next toss is 0.5.

If there are two possible independent outcomes (like a coin toss or a true/false) you get a binomial distribution, which in the limit of large n converges to a Gaussian. 15 heads in a row is in the far wing of that distribution. But your chances of getting another head then is still 50%.

Deterministic process
For a deterministic process, the likelihood of finding x at tn is exactly determined by x at tn-1. Following that back we find that the initial condition x(t1) determines the trajectory of x. In other words

P(x(tn) | x(tn-1), x(tn-2), ..., x(t1)) = P(x(tn) | x(tn-1), x(tn-2), ..., x(t1))

If we also know the initial conditions of everything else in the space that x inhabits, we know in principle the future history of everything in the space forever and ever amen. It might get complicated, in fact it might get chaotic and if we go numerical the finiteness of our computational word size might limit our ability to predict very far into the future of the system, so not all is roses here. And as far as we know nothing is truly purely deterministic (although the various Copenhagen nonbelievers -- most famously Einstein and Schrodinger -- might argue otherwise.) But to a really good approximation many things are: you can hit a battleship from a pitching, moving platform 15 miles away; if you drop a pencil it will fall; in 23000 years the North Pole will point to the star Thuban in the constellation Draco; etc.

Markov process
Now it gets interesting. The first two cases are the extrema of stochastic processes (characterized by one or more random variables, where the "randomness" in the second case = 0, that is, the stochastic variables have constant temporal autocorrelations ) in general.

The temporal autocorrelation of a stochastic process describes how it “decorrelates” with itself as time progresses, both in timescale and form. It turns out to be a powerful analytical tool: from it you can infer dynamics of the underlying stochastic process, specifically timescales and relative timescales. It also characterizes the nature of the process, and there are four cases here: a delta function, for purely random processes; a constant, for deterministic processes; an exponential decay, for Markovian processes; and everything else, for non-Markovian processes. At least in science and engineering, this analysis is usually done in the frequency domain (spectral analysis).

A Markov process is one step back from purely random:

P(x(tn) | x(tn-1), x(tn-2), ..., x(t1)) = P(x(tn) | x(tn-1))

This is also known as a Markov chain, after Andrey Markov, a Russian mathematician of the late 19th and early 20th centuries. Where x goes next depends only on the present. Note that that does not mean determined by the present -- if that were the case we would be able to follow its trajectory back to the beginning -- it means that the likelihood of finding x at t depends only on where x is at t-1. Because of this Markov processes are said to be memoryless. A true random walk is an example of a Markov process. Brownian motion is a Markov process.

If the random variables characterizing x are both Markovian and Gaussian (that is, they are normally distributed) then the temporal autocorrelation of x is given by an exponential decay. This is the signature of a Markov process (if the work is in the frequency domain that will show up as a Lorentzian spectral density). Nature is full of exponential decays -- Markov processes are quite common.

Note the constraints placed on the random variable(s) of x. Let's use Brownian motion as an example. Its equation of motion (describes behavior in time) is the Langevin equation:

F = ma = m dx2/d2 t = m dv/dt = - Df v + A(t)

Physically as a particle moves through the fluid, it feels a drag force Df (it takes work to push through the molecules making up the liquid, hence the negative sign) and a stochastic "bumping" force A(t), from collisions with surrounding molecules (I'm guessing the analogue in finance is buying and selling of a bond, security, whatever). The stochastic term A(t) has to be rapid in time (more precisely have ~delta function autocorrelation on the timescale of observation) and be small in amplitude. A big bump might suddenly send the particle into a very non-Brownian jump and be felt some time down the line, which violates the memoryless condition. A long slow bump is like a directed push rather than a bump, and if it's long enough and slow enough it too violates the memoryless condition (x(tn) depends on x(tn-2) or even further back.) Langevin equations (there are actually many forms) turn out to describe a huge number of phenomena, and Brownian motion is used to model all sorts of stochastic motion, from radioactive decay to laser lineshapes to thermal diffusion to finance and econometrics. Entire books have been written about them.

RebelCapitalist provided a paper (in the thread that inspired this post) which explicitly uses a Langevin equation to model bond yields. It's a little difficult to read because the author does not define his notation (! - leaves it in a reference) but the solution is a product of exponential decays – as it had to be because of the Markovian assumption. It is also difficult to assess how good a model this is because the author does not validate it against data (!!). But we can invert that: if an observed autocorrelation does NOT exhibit exponential decay, the underlying process is not Markovian, and we'll need a different model. It also means that if the fluctuations in your system are not small and rapid, you cannot use a Markov model; if you do use one, you've made a statement about those fluctuations. A quick web trawl brings up any number of Markovian finance and econometric models.

Take home messages about Markov processes: memoryless, exponential autocorrelation, rapid and small-amplitude stochastic term, well understood.

Non-Markov processes
Now it gets messy, because a non-Markovian process is characterized by a conditional probability that is not Markovian (and, by convention, not purely random or deterministic). It is like saying a system is nonlinear. The conditional probability chain may be truncated at any point, and the weighting of events at various ti may be quite different and may be conditionally different. There are an infinite number of possibilities here, and you have to come up with the one that accurately describes the dynamics of your system. In certain fields of study, where the variables can be assumed to be Gaussian (at least to a good approximation) and where the process is stationary over the timescale of interest, you can ignore the mechanism and use the observed autocorrelation. If it's more convenient you might rather use its Fourier transform , the spectral density. The alternative is to come up with a mechanism (conditional probability), find an observable -- the autocorrelation or spectral density -- and show it fits the data to at least the noise level. In general non-Markovian problems are Very Hard and attempts to address them frequently devolve to speculative flailing (if this is different in finance and econometrics I'd like to know). The difficulty is not in coming up with a specific conditional probability -- that's easy: coming up with one that is realistic is the problem.

It may not be obvious, but if the system is stationary for long enough, the system will become Markovian if you expand the timescale of observation. That's because fluctuations that are non-Markovian when looking on one timescale start looking more and more delta-function like as you expand your timescale. Stochastic processes converge to exponentials, if you can wait long enough (which is, admittedly, a very big if, but handy when modeling). As for the market, that's not stationary (for one thing it keeps growing), but over certain time domains it may look stationary, though that might take some creative detrending.

There's another wrinkle here. The market is self-aware – I think it is fair to say that it is conditionally autoregressive, sometimes in strange ways. Sometimes everybody's buying because, well, everybody's buying; sometimes it gets spooked for no very good reason. Obviously neither behavior is Markovian, at least on those timescales. It can be selectively amnesiac (leverage post-LTCM) or have a very long memory (Great Depression) depending on whether times feel good or not. As an aside, the Great Depression was obviously a hugely non-Markovian event. The conditional autoregressiveness can tap events from well back in history like the GD, which makes for interesting model challenges.

Take home messages about non-Markovian systems: messy, memory effects, realistic, not a specific case so no general solutions, best to stick to observables (autocorrelation or spectral density), timescales important.

A couple of closing thoughts. When looking at Markovian stochastic models (easily recognizable by its autocorrelation; or if it says Langevin equation or random walk) ask if the constraints on fluctuations make sense within the limits of the model. If the model is non-Markovian, how realistic is the mechanism (conditional probability), and – critically – how well do the predictions match the data? Also, comments and criticisms from an econometric perspective welcome – what may be important where I come from may be irrelevant here, and what may be important in this field I may have no clue about.



I'm so bummed out

you didn't use the mimetex that is available on the site! ;(

Well, sure it's ok for EP, we can cover a variety of topics here and if they cannot understand your critique, oh well, they don't have to read it. Would be interesting if we got some structured finance people here.

What was that phrase? In 100 years who cares because now we are all dead? (in trying to obtain a stationary process via time window segmentation).

I don't know if this is econometric, but more structured finance where they are trying to use modeling where, as you point out, their assumptions are quite often not valid.

I don't know if anyone could follow you though. I'd say please define a delta function to explain the time dependencies and system memory issue...

but honestly if they got that far I think they know what was is.

I think we need to cross post this in "geeks wonder what are these structured finance dudes thinking" somewhere.

I can live with a "geeks wonder

what these dudes are thinking" section.

Sorry about the MimeTeX -- seemed like overkill for simple subscripts. But yeah, maybe this was one of those "if you know the material this is too simple; if you don't this won't be useful" things. I was trying to keep things in English instead of math, and to provide some intuitive insight rather than derivations.

Re: delta function. Actually I had a section on that, digressed into a long discussion about functions vs distributions and cut it out. I seem to have cut out the definition as well -- oops. Fortunately any web trawl will bring up definition and use.

your education is showing

Go down memory lane and think back to your sophomore undergraduate class introducing a delta function.....

how many got a "F" on that test? Don't even ask them to do a LaPlace Transform on it....just the concept of delta t blew their minds.

I know it is beyond belief difficult to translate concepts into English which are built upon mathematical concepts...

which is precisely how these models were even used in finance...
most people just cannot grasp them, hence they said "fine, sure thing, all good" and used them. Supposedly they created black box modeling where some analyst just plugs in the numbers and gets out an answer...very few even know what is inside the box....and how those models are mathematically invalid, based on the characteristics of the data and system itself.

Lake Wobegon children

Surely all EP readers are above average!

Reading through the Li paper now. Interesting. It makes me even more skeptical of money managers -- didn't think that was possible.


One never knows who is reading EP. I just we have a lot of lurkers.

I agree, I never looked into any of these financial modeling papers or anything and the one straight out of the box...I was/am horrified!

I've said many times on here they should be regulating the mathematics and software, technology itself because it's clearly increasing an influence in finance yet it looks like the wild west...whatever someone can get away with...flies as "innovation"....reminds me of that "innovation" during the dot con era of Enron going to "manage and "auction" Internet bandwidth.

So, if bond yields misbehave and period high volatility

A Markov model will have problems modeling huge swings in bond yields. - Financial Information for the Rest of Us.

If bond yields

are the stochastic variable that's correct.

Here's the thing though. A model prediction with the wrong mechanism can match the data over a sufficiently limited (aka cherry-picked) domain. Attempting to extend the prediction beyond that domain is foolhardy.

There are two other wrinkles. First is that high (or more specifically variable) volatility means that the variance in the distribution is not constant -- it becomes "heteroskedastic" -- and if the volatility feeds on itself it becomes conditionally autoregressively heteroskedastic. Whether the distributions remain Gaussian under those circumstances is not clear to me.

Second thing is really a suspicion: that a model need not be accurate, but if the market is looking for an excuse to make money and the model provides it, it will be used until the failures become too great to overlook -- which may take a while since markets overshoot. I suspect -- but need to show -- that's what happened with Gaussian Copula.

more Gaussian Distribution is just plain damn wrong post

BiModality of Markets: Why Mean-Variance Doesn’t Work.

Ritzholtz log distro projected actual DOW returns
Src: The Big Picture, click to enlarge

I'm not sure what his reference is to people don't understand the difference between logs and percentages, except perhaps people do not realize a logarithm does average, or act as a low pass filter, it smooths those spikes that one would normally see in a % graph. So, he's implying those 2x inverse return ETFs have drift...
and on this score, I've been wondering about DUG, SKF (Proshares ultrashorts).


I've seen maybe 6 episodes of the show over maybe that many years and I swear in every single one what's her name tells what's his name "That's scientifically impossible!". And I always wanted to say "The basis of science is observation, and you've been observing the scientifically impossible every week for 5 YEARS. Are you thick or what?!?"

So even now we see non-Gaussian data being modeled as Gaussians. Never mind that this is real data, never mind that this has been going on decades, never mind that the assumption of independent events doesn't apply. Are these guys thick or what?!?

God bless all of us "Thick people"

Otherwise known as the Densa society....we discuss the magic secret recipe of special sauce on Big macs and which color really does go better with red, yellow or white!

To be honest, I must have appeared thick when I was in school for I can remember being presented with these models thinking, "that signal is simply not a bell curve"....bam, on test, oops, "not A". ;)

And they run Wall St

Yeah that'll work.

this is completely missing the point of derivatives

Read this New York Times piece trying to claim the derivatives models are flawed because they do not include "human behavior". That is so dead ass wrong and once again, these people do not understand mathematical models so they think the answer is outside of them...

when we have written over and over again, by the math, by the models themselves, they are wrong!

God. You cannot create some gambling chip based on an incorrect mathematical model and this has everything to do with incorrect mathematical assumptions, such as davet's post here on Gaussian distributions (mine is how CDSes violate the mathematical properties to even be used in a Copula).

This stuff drives me nuts. If they would realize it's the actual mathematics by itself that is incorrect we might get somewhere.

I'm actually a bit sympathetic

I suspect (but haven't shown) that a major problem is the assumption that the stochastic fluctuations are independent events. "Non-Gaussian" is rather like "non-Markovian" or "nonlinear" -- a statement of what it isn't rather than what it is, so there are no general solutions. It sounds like "behavioral economics" is an attempt at an end run around the problem, and whether it works any less well isn't clear to me.

The question above about volatility got me poking into GARCH models, specifically what is assumed about the distributions of the stochastic variables. Unfortunately I also have papers to write (completely unrelated) so it's been slow going...

the mathematics is nasty stuff

I do not believe in PhD level Economists get into this level of advanced mathematics to parse some of these models and find the mathematical flaws.

The ind. event issue is somewhat violated but more the requirements of the correlation coefficient, i.e. invertible, 1:1 to the actual data they are using is invalid (as well as your points).

So, I guess EP has found another activity here since some of us do have the mathematical background to critique these models.

Very few are. I think zero hedge has Ritholtz and of course Taleb and Roubini.

I just get irritated when I at least can plainly see "bad math" yet those who cannot really dig through these claim it's all "behavior" as if mathematical modeling, just as a tool, a subject is somehow "all bad".

When I'm feeling cynical

I think the decision tree goes like this:

Makes money now: keeper
Doesn't make money now: toss it

Issues like accuracy, realism, etc are secondary.

I've done my share of modeling and therefore my share of huge assumptions. But people in my field 1) always validate against data, and 2) don't stake everybody else's money on the outcome.

Most damning internal emails of financial crisis

I think no matter who cynical you can "think" you won't hit the reality of what has been happening.

check this out

Remember Enron talking about taking money from grandma while manipulating energy blackouts in California?

where did you find this GARCH modeling from?

I think I saw something about this in seasonal adjustments...that mysterious black box for unemployment and other data coming from the BLS. I just scanned the wiki and wow, you're right, that's going to be slow going.

To phrase it another way,

recessions (or chained recessions which equals a depression) correlate all assets.