How do the posterior mean and variance change if more data is received?

How do the posterior mean and variance change if more data is received?

The short answer is that, in expectation, the posterior variance decreases as you get more information, but, depending on the model, in particular cases the variance can increase. For some models such as the normal and binomial, the posterior variance can only decrease.

What do we mean when we say that a prior is a conjugate prior for a likelihood?

In Bayesian probability theory, if the posterior distribution p(θ | x) is in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function p(x | θ).

How to replace the Mle with the posterior?

In order to get MAP, we can replace the likelihood in the MLE with the posterior: Comparing the equation of MAP with MLE, we can see that the only difference is that MAP includes prior in the formula, which means that the likelihood is weighted by the prior in MAP.

When to use MLE or sample mean and variance?

It is so common and popular that sometimes people use MLE even without knowing much of it. For example, when fitting a Normal distribution to the dataset, people can immediately calculate sample mean and variance, and take them as the parameters of the distribution.

When does the posterior reach the maximum likelihood?

In this case, even though the likelihood reaches the maximum when p (head)=0.7, the posterior reaches maximum when p (head)=0.5, because the likelihood is weighted by the prior now. By using MAP, p (Head) = 0.5. However, if the prior probability in column 2 is changed, we may have a different answer.

What’s the difference between MLE and maximum likelihood estimation?

Comparing the equation of MAP with MLE, we can see that the only difference is that MAP includes prior in the formula, which means that the likelihood is weighted by the prior in MAP. In the special case when prior follows a uniform distribution, this means that we assign equal weights to all possible value of the Θ.