The Hamilton model assumes that there exist n states of nature, with the returns from each state being drawn from a different distribution. The model views xt as being drawn from a mixture of densities. Thus xt might be thought of as realisations of a process with, say, two 'states', each of which occurs randomly. For example, values of xt in the first state are drawn from a normal distribution with mean, μ1, and variance σ12 while in the second they come from a normal distribution with mean μ2, and variance σ22. The state which occurs will be determined by a third process for which the probability of state one occurring is λ. Hence the density of xt will be: 

where n1(x) and n2(x) are the density functions of a normal distribution2 N(μ1,σ12), N((μ2,σ22), respectively. If λ is then allowed to vary with the past history of states, a certain level of dependence in returns may be modelled. A method of achieving this is to make the probability of being in state one during period t differ, conditional upon whether the process is in state one or state two in period t-1.

Each state of nature is assumed to follow a Markov process, with the probability pi of being in state i at time t conditional upon the fact that the process is in state i in time t-1. The model's strength lies in its flexibility, being capable of capturing changes in the variance between state processes, as well as changes in the mean. It has been applied with some success to other markets; for example, Engel [1994] uses a two-state model to study the behaviour of exchange rates.

Specifying the model in this manner differs from the well known ARIMA and ARCH (or GARCH) models - see Bollerslev, Chou & Kroner [1992] for a survey of the latter. In the former, the variance of the process is assumed to be constant, but the expected value of the series follows a memory process. As a result, ARIMA based models are highly restrictive, and inappropriate if the disturbances are heteroscedastic. In the latter, the mean of the process is assumed to be constant, whereas under Markov regime-switching models this restriction is relaxed.

Hamilton's basic model3 has an unobserved state variable zt, that can assume either the value of zero or unity. This variable evolves according to the first-order Markov process:

It can be shown by substitution that this scheme implies that zt evolves as an AR(1) process:


and conditional upon

with conditional upon

The intention is that observed returns xt evolve as:

where εt are n. i. d. (0, σ2). Equation (6.3) shows that the expected values xt in the two states are (μ0, μ0 + μ1) respectively, while the variances are (σ2, σ2+θ).

Thus, from equation (6.2), the following can be derived:

where vt will be an MA(1). It therefore follows that xt is an ARMA(1,1) process which is covariance stationary.

Returning to equation (6.3), express it as



due to the independence of ηt and εt, the variance of

is a linear function of zt_1.4 Combining (6.2) and (6.4) produces a two-equation procedure very similar to that used in generating the Kalman filter. There is an observation equation (6.4), a state dynamics equation (6.2), and the errors are jointly martingale differences i. e.

One difference, however, is that the error terms in both equations have time varying conditional variances that depend on unobserved quantities i. e.



E(ηt2 | zt-1)

Both depend upon zt-1; the Kalman filter allows for the conditional variances of the errors to vary in a known way with the past history of xt, but does not allow them to depend on the past unobserved states.

In the Kalman filter case the likelihood of the data x1, x2, ... , xT is built up recursively by assuming that the errors in both equations are jointly normal. The aim is to characterise the density of xt given the past history Xt-1 i. e.

f(xt | Xt-1)


Xt-1 = {x1, x2, ... , xT}

If the following is given 

f(zt-1, zt-2 | Xt-1)

when t=1, this would equal

f(z0, z-1 | X0)

This can either be set to the unconditional density

f(z0, z-1)

or estimated. Hamilton [1989] uses the former, whereas in Hamilton [1990] he proceeds under the latter assumption.

From the properties of conditional densities5


f(z1, z0 | X0) = f(z1 | z0) f(z0, z-1 | X0)


f(zt-1, zt-2 | Xt-1)

and recognising that

f(zt, zt-1 | Xt-1)

will be equation (6.1), the joint density of

(zt, zt-1, xt)

conditional upon Xt-1, is6

Once the joint density

f(xt | zt, zt-1,Xt-1) f(zt, zt-1 | Xt-1)

is determined, the density of xt conditional upon Xt_1 can be found by integrating out the states zt, zt_1. In this case the integration simply involves summation due to the discrete nature of the states, i. e.

Of the two densities from equation (6.6),

f(xt | zt, zt-1,Xt-1)


f(zt, zt-1 | Xt-1)

the first is found directly from the fact that

is N(0,1); see equation (6.3). The second comes from equation (6.5). The difficulty is
then the ability to determine

f(zt-1, zt-2, Xt-1) = f(zt, zt-1 | Xt-1)

generally. This is achieved by using the formula for a conditional density

as all the densities on the right-hand side of equation (6.8) have been previously determined.
Estimation of the unknown parameters can be undertaken by maximum likelihood estimation using

f(xt l Xt-1)

Iteration of equations (6.5) ... (6.8) for t=1,... ,T produces

f(xt|Xt-1) (t = 1,...,T)

To determine the log likelihood, the joint density of returns is then written as the product of a conditional, f(xt | Ft_1) and a marginal density, f(Ft_1).

f(xt,Ft_1) = f(xt | Ft_1) f(Ft_1)

Building this up for all T observations, gives the joint density

making the log likelihood of x1, ... , xT equal to

and this may be maximised with respect to the unknown parameters q, p, μ0, μ1, σ2 and φ, using the EM algorithm described by Hamilton [1990].


1 In the UK the Investment Property Databank has provided a comprehensive performance measurement service for institutional investors, since 1986.


3For a comprehensive discussion of Markov régime-switching models, and an excellent review of those econometric issues pertinent to financial markets, see Pagan [1993].

4It can be shown that

5f(zt | zt _1, zt _ 2, Xt-1) =f (zt |zt- 1) due to independence and the first-order Markov process assumed for the states. The f(.) is used to indicate both the density of the continuous variable xt and the probability function of the discrete random variable zt.

6Note that because the process is first-order Markov, information on states can be summarised by zt and zt-1 alone.