Let's answer the following tweet:
So, I developed a Matlab program to test how many times random correlations would arise in two independent replications or in one single experiment with twice as many participants.
Each experiment tests for 20 variables and I computed the correlation for all possible pairs (190 independent correlations). There were 20 participants in each of the two experiments for the replication simulations and 40 participants in the single experiment case.
Without corrections for multiple comparisons
With replication, 1 in 4 experiments (25%) allow you to replicate a significant correlations between two variables in the two independent experiments. Note that this does take into account the fact that, to replicated, the significant correlations must have the same sign.
When you choose a bigger group without replication, you get 9.5 significant correlations per experiment
With corrections for multiple comparisons (Bonferroni)
With replication, 0.01% of the experiments allow you to replicate a significant correlations between two variables in the two independent experiments. Note that this does take into account the fact that, to replicated, the significant correlations must have the same sign.
When you choose a bigger group without replication, 5% of the experiments allow you to get a significant experiment, which is expected value given our significance threshold of 0.05.
Response: go for replications!!!
Matlab code used for the simulations
Nsim = 10000;%%% number of simulations Nsub = 20;%%% number of participants in each group Nvar = 20;%%% number of variables that you measure
for k=1:Nsim, % random variables - two studies A and B A= randn(Nsub,Nvar); B= randn(Nsub,Nvar); % correlation between each variable for each of the two studies % separately [Ra,pa]=corrcoef(A); [Rb,pb]=corrcoef(B); % extracting the relevant p-values and correlations (upper triangular matrix without diagonal) Ta = triu(pa,1); Tb = triu(pb,1); Rpa = triu(Ra,1); Rpb = triu(Rb,1); % matrix to vectors for non-zero entries Ca = Ta(Ta~=0); Cb = Tb(Tb~=0); Rca = Rpa(Ta~=0); Rcb = Rpb(Tb~=0); % detecting significant correlations PVa = Ca<0.05; PVb = Cb<0.05; % checking that the correlations have the same sign Sab = sign(Rca).*sign(Rcb)>0; % detecting when the two same variables correlate in the two % replications Sig2(k) = sum(PVa.*PVb.*Sab); end % average number of correlated pairs dectected across the two replications % Bear in mind, there should be none... disp(['average number of significant correlations present in both replications: ' num2str(mean(Sig2))])
average number of significant correlations present in both replications: 0.2388
for k=1:Nsim, % random variables C= randn(2*Nsub,Nvar); % correlation between each variable [Rc,pc]=corrcoef(C); % extracting the relevant p-values (upper triangular matrix without diagonal) Tc = triu(pc,1); % matrix to vectors for non-zero entries Cc = Tc(Tc~=0); % detecting significant correlations PVc = Cc<0.05; % detecting when the two same variables correlate in the two % replications Sig1(k) = sum(PVc); end % average number significant correlations detected % Bear in mind, there should be none... disp(['average number of significant correlations present in one bigger studies: ' num2str(mean(Sig1))])
average number of significant correlations present in one bigger studies: 9.4543
This post reflects the preparation of a discussion on social media in Science for the Neural Control of Movement Society Meeting in Jamaica 2016. This was prepared by @FinleyLabUSC @TobiasHeed @andpru and myself
The brain is specialized in merging information from different sources in order to improve sensory perception and motor control. Here, we will discuss how adding social networks to more conventional inputs can improve your career and your science. Take a drink from the nearest bar and join us for a lively and interactive discussion on the benefits (and pitfalls) of adding social networks to your scientific life.
Things you can learn from Twitter as a scientist
1. Science journalists and blogger in order to learn about studies beyond your very own specific problem and to learn about science journalism itself (Neuroskeptic).
2. Learning about impending revolution of for the scientific publishing process (preprint, sci-hub, open access) and thereby getting informed about the publishing industry (Bjoern Brembs)
3. To improve critical scientific skills such as statistics (Lakens, Deevybee)
4. To network with your peers (important presence for the motor control community @blamlab @kording )
5. To get rapid feedback about your studies (if you post your study online)
6. To outsource literature searching. For instance, if you look about studies on a given topics (what influence movement reaction time?) or if someone tweet about a study that he/she finds interesting
7. To learn about the problems of science (retraction watch, gender gap, p-hacking, pre-registration)
8. First steps towards science communication to the public
9. To advertise your own study to your colleagues (shameless plug)
10. To share interesting papers (not only your owns)
11. To hear about lectureship/professorship positions in some universities
12. To discover interesting posters that you must go and see while at a conference (#SfN2015)
13. To get comfort when you see that all scientists face the same problems (same goal as PhD comics)
14. To replace etoc by emails
15. To learn about the latest scientific tools (Pubpeer, Pubchase,...)
16. To learn about grantmanship (@drugmonkey)
17. To participate to the scientific debate.
18. Potential forum for recruiting participants, students, postdocs
Imagine that you have two populations that you want to compare with a simple statistical test (e.g. t-test). To interpret the results of such test, scientists often compare the resulting p-value to a given threshold (usually, 0.05) in order to judge the significance of the difference between the two populations.
There are many false beliefs about this number. This post aims at clarifying some aspects of p-values.
To do so, I simulated two populations.
The first one (P1) was normally distributed with mean =0 and standard deviation (SD) =1.
The second one (P2) was also normally distributed with SD=1 but its mean was varied from zero (P1 and P2 are essentially drawn from the same distribution) to one (P1 and P2 are drawn from two very different distributions) by step of 0.1. The distance between the means of P1 and P2 corresponds to the effect size (as the SD=1).
For each different level of effect size between zero and one, I simulated 10000 pairs of P1 and P2s and computed the p-value for each of these 10000 comparisons in order to obtain the distribution of p-values for different levels of effect size.
The top left graph shows the distribution of p-values when there is actually no difference between the two populations. Only 5% of the simulations had a p-value smaller than 0.05 by definition. It also shows that trending towards significance is not very meaningful in this case as the probability that p is between 0.05 and 0.10 is similar to the probability that the p-value is in any other interval.
When the actual difference between two populations becomes larger and larger, the distribution of p-values become more and more skewed and the percentage of p-values smaller than 0.05 is larger and larger.
For such sample size (each population has 10 samples), the proportion of p-values smaller than 0.05 for an effect size of 1 (which would be considered as large or very large) is only around 65%.
An effect size around 0.3 is pretty typical in science...
The following graph highlights the importance of the number of samples per population. Here, the number of samples for P1 and P2 was increased from 10 to 25. Now, for the largest effect size tested, the probability that p<0.05 is almost 95%
Take home message
%% distribution of p-values for different effect size
ES = [0:0.1:1];
Npop = 25;
Nsamp = 100000;
ESmat = repmat(ES,Npop,1);
Ppop = NaN*ones(Nsamp,length(ES));
P2 = ESmat + randn(Npop,length(ES));
for ij = 1:length(ES),
xlabel('bins of p-values')
ylabel('# of observations')
title(['effect size: ' num2str(ES(ij))])
For teaching purposes, I developed two matlab scripts that illustrate two typical motor adaptation paradigms on a computer with Matlab.
The first task is a gain scaling task where the mapping between the motion of the mouse and the cursor is altered by a given gain.
The second task is a visuomotor rotation task where the motion of the cursor is rotated by a given angle with respect to the motion of the mouse.
The instructions for each of the tasks appear in a pop-up window before each test.
The participants need to click on the starting red target before motion.
The evolution of the adaptation over the course of training is displayed once all the trials are completed.
Part of these functions are based on the freehanddraw function available here: http://www.mathworks.com/matlabcentral/fileexchange/7347-freehanddraw
Please, send me feedback about these routines
I invite all of you to read this wonderful and short paper that was the invited 2013 Commencement speech for the Department of Statistics, University of California at Berkeley from Ioannidis.
It is an excellent piece about all the mistakes we make all the time as scientists. I also enjoyed his sense of humor.
Here are the three key messages about errors:
EDIT 13th of May: These series of tweets are good illustration of the residual uncertainty linked to scientific studies and how we should treat it.
Ioannidis JP a (2014) Errors (my very own) and the fearful uncertainty of numbers. Eur J Clin Invest.
There is a very interesting paper about the notion of confidence intervals. I think that I did not have a clear idea of how it was defined and what it represented. So I read the paper and learned a lot.
Here is the true statement about confidence interval (taken from the paper):
"If we were to repeat the experiment over and over, then 95 % of the time the confidence intervals contain the true mean"
The true mean is the actual mean of the population that we try to measure. The sample mean is the average of the values that were measured.
I did a little bit of Matlab to try to relate sample mean (that we can measure) and confidence interval.
My question was the following: If we perform an experiment once and compute the 95% confidence interval of the mean (e.g. [0.1 to 0.4]). What is the probability that, if we repeat the experiment, the new sample mean will fall within the previously computed confidence interval (see Matlab code below).
Yesterday, I would have said 95%. But I would have been wrong.
It turns out that there is ONLY a 83% probability that the new sample mean (based on 100 samples) will fall within the computed 95% confidence interval and this number does not depend on the number of samples taken (same result with n=1000)
For the (larger) 99% confidence interval, this probability rises to 93%.
So the confidence interval does not provide a lot of information about the sample mean.
Similar misunderstandings were found for the p-values: (Gigerenzer 2004)
Just found a similar result here:
Thanks to Dorothy Bishop for bringing me to this line of reasoning: http://deevybee.blogspot.co.uk/2011/10/joys-of-inventing-data.html
Hoekstra, R., Morey, R. D., Rouder, J. N. & Wagenmakers, E.-J. Robust misinterpretation of confidence intervals. Psychon. Bull. Rev. (2014). doi:10.3758/s13423-013-0572-3
Gigerenzer, G. Mindless statistics. J. Socio. Econ. 33, 587–606 (2004).
%%% Number of repetitions Rep=10000; %%% Size of the sample population n=100; %%% the true mean is 0.25 and the true standard deviation is 0.75 %%% DATA is a Rep by n matrix. Each column represent a population of samples TrueM = 0.25; TrueSD = 0.75; DATA = TrueM+TrueSD*randn(n,Rep); %%% Z-value for computing confidence interval 1.96 for 95% CI and 2.575 for %%% 99% CI Z=1.96; %%% computing sample mean, SD and CI for each population M = mean(DATA);%% vector with Rep values SD = std(DATA);%% vector with Rep values CIlow = M-(Z*SD/sqrt(n));%% vector with Rep values CIhigh = M + (Z*SD/sqrt(n));%% vector with Rep values %%% to compute the probability that the TRUE mean is contained in the %%% computed 95% confidence interval (actual definition) P = 1-sum((CIlow>TrueM) + (CIhigh<TrueM))/Rep; disp(['Prob that the true mean is contain in the CI: ' num2str(P)]) %%% to compute the probability that the SAMPLE mean is contained in the %%% computed 95% confidence interval (actual misinterpretation) for i=1:Rep, N(i)=sum((CIlow(i)<M).*(CIhigh(i)>M)); end P2=(sum(N))/Rep^2; disp(['Prob that the sample mean is contain in the CI: ' num2str(P2)])
Prob that the true mean is contain in the CI: 0.9454 Prob that the sample mean is contain in the CI: 0.8293
It is widely believed that cerebellar plasticity is driven by climbing fiber inputs. For instance, David Marr (1969) suggested that the climbing fiber input served as a teacher for post-synaptic Purkinje cells, which has found some empirical support (Najafi & Medina, 2013).
However, the timing of complex spikes during saccade adaptation (Catz, Dicke, & Thier, 2008) suggests that climbing fiber input might not be the sole teacher in the cerebellum. In this study, Nguyen-Vu and colleagues tested the hypothesis that Purkinje cells themselves can drive adaptation of the vestibulo-ocular reflex (VOR), a form of motor learning.
In this study, Nguyen-Vu and colleagues demonstrates that there exists more than one teacher for cerebellar learning and that changes in Purkinje cell activity can drive motor learning.
Prior information about a stimulus can bias our perception of it. This influence of previous information on sensory processing is the basis of many illusions (Geisler and Kersten) such as the size-weight illusions, which stems from the expectation that larger objects are heavier.
This is true for the perception of motion direction as well. If one expects some specific stimulus motion direction, the perception of any other stimulus motion direction will be biased by the expectation. Using the ability to decode motion direction information from fMRI signals, Kok and colleagues (2013) demonstrate that this bias is already present in early visual areas.
The brain makes use of noisy sensory inputs to produce eye, head, or arm motion. In most instances, the brain combines this sensory information with predictions about future events. Here, we propose that Kalman filtering can account for the dynamics of both visually guided and predictive motor behaviors within one simple unifying mechanism. Our model relies on two Kalman filters: (1) one processing visual information about retinal input; and (2) one maintaining a dynamic internal memoryof target motion. The outputs of both Kalman filters are then combined in a statistically optimal manner, i.e., weighted with respect to their reliability. The model was tested on data from several smooth pursuit experiments and reproduced all major characteristics of visually guided and predictive smooth pursuit. This contrasts with the common belief that anticipatory pursuit, pursuit maintenance during target blanking, and zero-lag pursuit of sinusoidally moving targets all result from different control systems. This is the first instance of a model integrating all aspects of pursuit dynamics within one coherent and simple model and without switching between different parallel mechanisms. Our model suggests that the brain circuitry generating a pursuit command might be simpler than previously believed and only implement the functional equivalents of two Kalman filters whose outputs are optimally combined. It provides a general framework of how the brain can combine continuous sensory information with a dynamic internal memory and transform it into motor commands.
written by Jean-Jacques Orban de Xivry
Scientist in the motor control field.