Here is the true statement about confidence interval (taken from the paper):
"If we were to repeat the experiment over and over, then 95 % of the time the confidence intervals contain the true mean"
The true mean is the actual mean of the population that we try to measure. The sample mean is the average of the values that were measured.
I did a little bit of Matlab to try to relate sample mean (that we can measure) and confidence interval.
My question was the following: If we perform an experiment once and compute the 95% confidence interval of the mean (e.g. [0.1 to 0.4]). What is the probability that, if we repeat the experiment, the new sample mean will fall within the previously computed confidence interval (see Matlab code below).
Yesterday, I would have said 95%. But I would have been wrong.
It turns out that there is ONLY a 83% probability that the new sample mean (based on 100 samples) will fall within the computed 95% confidence interval and this number does not depend on the number of samples taken (same result with n=1000)
For the (larger) 99% confidence interval, this probability rises to 93%.
So the confidence interval does not provide a lot of information about the sample mean.
Similar misunderstandings were found for the p-values: (Gigerenzer 2004)
Just found a similar result here:
Gigerenzer, G. Mindless statistics. J. Socio. Econ. 33, 587–606 (2004).
Matlab code
%%% Number of repetitions Rep=10000; %%% Size of the sample population n=100; %%% the true mean is 0.25 and the true standard deviation is 0.75 %%% DATA is a Rep by n matrix. Each column represent a population of samples TrueM = 0.25; TrueSD = 0.75; DATA = TrueM+TrueSD*randn(n,Rep); %%% Z-value for computing confidence interval 1.96 for 95% CI and 2.575 for %%% 99% CI Z=1.96; %%% computing sample mean, SD and CI for each population M = mean(DATA);%% vector with Rep values SD = std(DATA);%% vector with Rep values CIlow = M-(Z*SD/sqrt(n));%% vector with Rep values CIhigh = M + (Z*SD/sqrt(n));%% vector with Rep values %%% to compute the probability that the TRUE mean is contained in the %%% computed 95% confidence interval (actual definition) P = 1-sum((CIlow>TrueM) + (CIhigh<TrueM))/Rep; disp(['Prob that the true mean is contain in the CI: ' num2str(P)]) %%% to compute the probability that the SAMPLE mean is contained in the %%% computed 95% confidence interval (actual misinterpretation) for i=1:Rep, N(i)=sum((CIlow(i)<M).*(CIhigh(i)>M)); end P2=(sum(N))/Rep^2; disp(['Prob that the sample mean is contain in the CI: ' num2str(P2)])
Prob that the true mean is contain in the CI: 0.9454 Prob that the sample mean is contain in the CI: 0.8293