Le Neurone Moteur
  • Home
  • Positions
  • Publications
  • CV
  • Thesis
  • Links
  • Pun Intended
  • NL
  • Onderwijs

If X and Y are different across groups, is it easier to get a significant correlation between X and Y with data pooled across groups?

7/20/2018

0 Comments

 
In our recent preprint (Vandevoorde and Orban de Xivry, bioRxiv, 2018), we report correlation of the explicit component of motor adaptation (variable Y) with measure of working memory capacity (variable X) pooled across two age groups (Fig. 5g and 5h). Off course, there were between-group differences in X (effect size: d=0.93) and in Y (effect size: d=0.2). Reviewers suggested that these correlations were due to the between-group differences.

Below, I outline why the reviewers were mostly wrong (there is an effect, but it is tiny) and how we can control for the effect of age group. on the regression. These answers were obtained via simulations of the problem in Matlab.

Do between-group differences in X and Y affect the correlation?

In the simulations, we make the assumption that there is no relationship between X and Y for both of the two groups but that there is a mean difference in X and in Y between groups. These mean differences had the same effect size as in our paper (d=0.2 for Y and d=0.93 for X).
Because there is no correlation between X and Y, the number of false positive should be 5% if our significance threshold is 5%. Simulations showed that the percentage of false positive was 6.8%, which means that the mean differences in X and Y between groups lead to more significant correlations than it should.
Solution 1: Set the significance threshold to 0.0368 (0.05*(0.05/0.068)) and then is false positive rate again 5%.

However, the correction of the significance threshold is related to the estimated effect size of the mean differences for X and for Y. This is thus specific to each correlation. Therefore, we looked for a more general solution:
Solution 2: multiple regression.

Can multiple regression partial out the effect of age group?

The answer is Yes.

We used the following model: Y = A + B*X + C*G + D*XG where G is the categorical variable linked to the groups and XG is the interaction term between X and G.

When using this model, we found that, in the absence of relationship between Y and X, the coefficient B and D had a false positive rate of 5%, which matches our significance threshold.

The coefficient C was significant more frequently because of a difference in Y between groups (effect size: d=0.2). Therefore, using this regression model allows us to partial out the effect of age group on the relationship between X and Y.

Matlab code

correlationtwoagegroups.m
File Size: 7 kb
File Type: m
Download File

0 Comments

Recruiting  PhD student and post-docs

1/1/2018

0 Comments

 
If you are interested in doing a PhD or a post-doc in my lab (motor control, motor learning, aging and fMRI), do not hesitate to contact me to enquire about funding opportunities.

See the Positions page for more information
0 Comments

How twitter made me a better scientist

9/26/2017

0 Comments

 
To be read on the KU Leuven blog

https://kuleuvenblogt.be/2017/04/28/how-twitter-made-me-a-better-scientist/kuleuvenblogt.be/2017/04/28/how-twitter-made-me-a-better-scientist/
0 Comments

 Replication or larger groups ?

12/19/2016

0 Comments

 
Let's answer the following tweet:

can someone point me to a link showing that 2 n=20 studies showing the same thing (replication) is better than a single n=40 study?

— Gavin Buckingham (@DrGBuckingham) December 19, 2016
 So, I developed a Matlab program to test how many times random correlations would arise in two independent replications or in one single experiment with twice as many participants.
Each experiment tests for 20 variables and I computed the correlation for all possible pairs (190 independent correlations). There were 20 participants in each of the two experiments for the replication simulations and 40 participants in the single experiment case.

Without corrections for multiple comparisons

With replication, 1 in 4 experiments (25%) allow you to replicate a significant correlations between two variables in the two independent experiments. Note that this does take into account the fact that, to be replicated, the significant correlations must have the same sign.

When you choose a bigger group without replication, you get 9.5 significant correlations per experiment

With corrections for multiple comparisons (Bonferroni)

With replication, 0.01% of the experiments allow you to replicate a significant correlations between two variables in the two independent experiments. Note that this does take into account the fact that, to be replicated, the significant correlations must have the same sign.

When you choose a bigger group without replication, 5% of the experiments allow you to get a significant experiment, which is expected value given our significance threshold of 0.05.

 Response: go for replications if false positive errors are really important for you.

Matlab code used for the simulations

internalreplication.m
File Size: 2 kb
File Type: m
Download File

Contents

  • two repliactions, only Nsub per group
  • one experiment with 2*Nsub per group
Nsim = 10000;%%% number of simulations
Nsub = 20;%%% number of participants in each group
Nvar = 20;%%% number of variables that you measure

two repliactions, only Nsub per group

for k=1:Nsim,
    % random variables - two studies A and B
    A= randn(Nsub,Nvar);
    B= randn(Nsub,Nvar);
    % correlation between each variable for each of the two studies
    % separately
    [Ra,pa]=corrcoef(A);
    [Rb,pb]=corrcoef(B);
    % extracting the relevant p-values and correlations (upper triangular matrix without diagonal)
    Ta = triu(pa,1);
    Tb = triu(pb,1);
    Rpa = triu(Ra,1);
    Rpb = triu(Rb,1);
    % matrix to vectors for non-zero entries
    Ca = Ta(Ta~=0);
    Cb = Tb(Tb~=0);
    Rca = Rpa(Ta~=0);
    Rcb = Rpb(Tb~=0);
    % detecting significant correlations
    PVa = Ca<0.05;
    PVb = Cb<0.05;
    % checking that the correlations have the same sign
    Sab = sign(Rca).*sign(Rcb)>0;
    % detecting when the two same variables correlate in the two
    % replications
    Sig2(k) = sum(PVa.*PVb.*Sab);
end
% average number of correlated pairs dectected across the two replications
% Bear in mind, there should be none...
disp(['average number of significant correlations present in both replications: ' num2str(mean(Sig2))])
average number of significant correlations present in both replications: 0.2388

one experiment with 2*Nsub per group

for k=1:Nsim,
    % random variables
    C= randn(2*Nsub,Nvar);
    % correlation between each variable
    [Rc,pc]=corrcoef(C);
    % extracting the relevant p-values (upper triangular matrix without diagonal)
    Tc = triu(pc,1);
    % matrix to vectors for non-zero entries
    Cc = Tc(Tc~=0);
    % detecting significant correlations
    PVc = Cc<0.05;
    % detecting when the two same variables correlate in the two
    % replications
    Sig1(k) = sum(PVc);
end
% average number significant correlations detected
% Bear in mind, there should be none...
disp(['average number of significant correlations present in one bigger studies: ' num2str(mean(Sig1))])
average number of significant correlations present in one bigger studies: 9.4543


Published with MATLAB® R2015b

0 Comments

Social networking for science: a useful tool or a useless distraction?

10/27/2016

0 Comments

 
This post reflects the preparation of a discussion on social media in Science for the Neural Control of Movement Society Meeting in Jamaica 2016. This was prepared by  @FinleyLabUSC @TobiasHeed @andpru and myself

The brain is specialized in merging information from different sources in order to improve sensory perception and motor control. Here, we will discuss how adding social networks to more conventional inputs can improve your career and your science. Take a drink from the nearest bar and join us for a lively and interactive discussion on the benefits (and pitfalls) of adding social networks to your scientific life.

Things you can learn from Twitter as a scientist

1.       Science journalists and blogger in order to learn about studies beyond your very own specific problem and to learn about science journalism itself (Neuroskeptic).
2.       Learning about impending revolution of for the scientific publishing process (preprint, sci-hub, open access) and thereby getting informed about the publishing industry (Bjoern Brembs)
3.       To improve critical scientific skills such as statistics (Lakens, Deevybee)
4.       To network with your peers (important presence for the motor control community @blamlab @kording )
5.       To get rapid feedback about your studies (if you post your study online)

@jjodx @JNeurophysiol nice paper! It agrees with Greene's (1972) notion of preparing a "ballpark" response

— Rajiv Ranganathan (@rrangana1) October 12, 2016
6.       To outsource literature searching. For instance, if you look about studies on a given topics (what influence movement reaction time?) or if someone tweet about a study that he/she finds interesting
7.       To learn about the problems of science (retraction watch, gender gap, p-hacking, pre-registration)

Women need to be seen and heard at conferences https://t.co/CDYk0VjKRf #WomenInScience pic.twitter.com/VRxCVmFkQE

— Nature News&Comment (@NatureNews) October 24, 2016
8.       First steps towards science communication to the public
9.       To advertise your own study to your colleagues (shameless plug)
10.   To share interesting papers (not only  your owns)
11.   To hear about lectureship/professorship positions in some universities
12.   To discover interesting posters that you must go and see while at a conference (#SfN2015)
13.   To get comfort when you see that all scientists face the same problems (same goal as PhD comics)
14.   To replace etoc by emails
15.   To learn about the latest scientific tools (Pubpeer, Pubchase,...)
16.   To learn about grantmanship (@drugmonkey)
17.   To participate to the scientific debate.
18.   Potential forum for recruiting participants, students, postdocs
0 Comments

P-value distributions

9/6/2016

0 Comments

 
Imagine that you have two populations that you want to compare with a simple statistical test (e.g. t-test). To interpret the results of such test, scientists often compare the resulting p-value to a given threshold (usually, 0.05) in order to judge the significance of the difference between the two populations.
There are many false beliefs about this number. This post aims at clarifying some aspects of p-values.

To do so, I simulated two populations.
The first one (P1) was normally distributed with mean =0 and standard deviation (SD) =1.
The second one (P2) was also normally distributed with SD=1 but its mean was varied from zero (P1 and P2 are essentially drawn from the same distribution) to one (P1 and P2 are drawn from two very different distributions) by step of 0.1. The distance between the means of P1 and P2 corresponds to the effect size (as the SD=1).

For each different level of effect size between zero and one, I simulated 10000 pairs of P1 and P2s and computed the p-value for each of these 10000 comparisons in order to obtain the distribution of p-values for different levels of effect size.
Picture
Simulations with small sample size ( N = 10)
The top left graph shows the distribution of p-values when there is actually no difference between the two populations. Only 5% of the simulations had a p-value smaller than 0.05 by definition. It also shows that trending towards significance is not very meaningful in this case as the probability that p is between 0.05 and 0.10 is similar to the probability that the p-value is in any other interval.

When the actual difference between two populations becomes larger and larger, the distribution of p-values become more and more skewed and the percentage of p-values smaller than 0.05 is larger and larger.

For such sample size (each population has 10 samples), the proportion of p-values smaller than 0.05 for an effect size of 1 (which would be considered as large or very large) is only around 65%.

An effect size around 0.3 is pretty typical in science...

The following graph highlights the importance of the number of samples per population. Here, the number of samples for P1 and P2 was increased from 10 to 25. Now, for the largest effect size tested, the probability that p<0.05 is almost 95%
Picture
Simulations with larger sample size (N=25)

Take home message

  • Distribution of p-values depends on the effect size and on the size of the samples
  • if two samples are drawn from the same distribution (there are essentially similar), the distribution of p-values is uniform.


Matlab code

%% distribution of p-values for different effect size

ES = [0:0.1:1];
Npop = 25;
Nsamp = 100000;
ESmat = repmat(ES,Npop,1);
Ppop = NaN*ones(Nsamp,length(ES));
for k=1:Nsamp,
    P1= randn(Npop,length(ES));
    P2 = ESmat + randn(Npop,length(ES));
    [~,Ppop(k,:)]=ttest2(P1,P2);
end

for ij = 1:length(ES),
    subplot(3,4,ij)
    hist(Ppop(:,ij),0.025:.05:.975)
    xlabel('bins of p-values')
    ylabel('# of observations')
    title(['effect size: ' num2str(ES(ij))])
end
0 Comments

Demo of visuomotor adaptation tasks

9/29/2015

0 Comments

 
For teaching purposes, I developed two matlab scripts that illustrate two typical motor adaptation paradigms on a computer with Matlab.
The first task is a gain scaling task where the mapping between the motion of the mouse and the cursor is altered by a given gain.
The second task is a visuomotor rotation task where the motion of the cursor is rotated by a given angle with respect to the motion of the mouse.
demogainadaptation.m
File Size: 5 kb
File Type: m
Download File

demovisuomotorrotation.m
File Size: 5 kb
File Type: m
Download File

The instructions for each of the tasks appear in a pop-up window before each test.
The participants need to click on the starting red target before motion.
The evolution of the adaptation over the course of training is displayed once all the trials are completed.
Part of these functions are based on the freehanddraw function available here: http://www.mathworks.com/matlabcentral/fileexchange/7347-freehanddraw
Please, send me feedback about these routines
0 Comments

John P.A. Ioannidis on errors 

5/12/2014

0 Comments

 
I invite all of you to read this wonderful and short paper that was the  invited 2013 Commencement speech for the Department of Statistics, University of California at Berkeley from Ioannidis.
It is an excellent piece about all the mistakes we make all the time as scientists. I also enjoyed his sense of humor.
Here are the three key messages about errors:

  1. when the results look weird, just recheck the equations
  2. when the results are too good to be true, don’t prepare your tuxedo for the award ceremony, and definitely don’t buy a new house in Palo Alto with $ 5 million in mortgage, just recheck the data, they are likely to be wrong
  3. even if everything is perfectly fine, there is always some residual uncertainty in scientific inferences, replication is an excellent idea most of the time, and it may not pan out
I particularly think that the last one is the most neglected one. Residual uncertainty is inherently present in each and everyone of our papers. Non-replication might be due to residual uncertainty and not necessarily to questionable research practice, p-hacking, ... (although these might definitely play some role). Non-replication does not mean that the paper was bad, it reflects on the stochastic nature of the sample data that we analyse.
EDIT 13th of May: These series of tweets are good illustration of the residual uncertainty linked to scientific studies and how we should treat it.

A new study on resveratrol makes me question the way we cover health stories. My latest @ngphenomena http://t.co/fWOCHxljER

— Virginia Hughes (@virginiahughes) May 12, 2014

@edyong209 @virginiahughes That& having enough time&knowledge to GIVE THE CONTEXT. A scientific study is a data point along a time series

— Emily Willingham (@ejwillingham) May 12, 2014

@edyong209 @virginiahughes What we need is an approach that writes about how new data changes what we know, not the new data.

— Matthew Herper (@matthewherper) May 12, 2014

Ioannidis JP a (2014) Errors (my very own) and the fearful uncertainty of numbers. Eur J Clin Invest.

0 Comments

On confidence intervals

1/22/2014

0 Comments

 
There is a very interesting paper about the notion of confidence intervals. I think that I did not have a clear idea of how it was defined and what it represented. So I read the paper and learned a lot.

Here is the true statement about confidence interval (taken from the paper):
"If we were to repeat the experiment over and over, then 95 % of the time the confidence intervals contain the true mean"

The true mean is the actual mean of the population that we try to measure. The sample mean is the average of the values that were measured.

I did a little bit of Matlab to try to relate sample mean (that we can measure)
and confidence interval.
My question was the following: If we perform an experiment once and compute the 95% confidence interval of the mean (e.g. [0.1 to 0.4]). What is the probability that, if we repeat the experiment, the new sample mean will fall within the previously computed confidence interval (see Matlab code below).

Yesterday, I would have said 95%. But I would have been wrong.

It turns out that there is ONLY a 83% probability that the new sample mean (based on 100 samples) will fall within the computed 95% confidence interval and this number does not depend on the number of samples taken (same result with n=1000)

For the (larger) 99% confidence interval, this probability rises to 93%.

So the confidence interval does not provide a lot of information about the sample mean.

Similar misunderstandings were found for the p-values: (Gigerenzer 2004)

Just found a similar result here:


Thanks to Dorothy Bishop for bringing me to this line of reasoning: http://deevybee.blogspot.co.uk/2011/10/joys-of-inventing-data.html

Hoekstra, R., Morey, R. D., Rouder, J. N. & Wagenmakers, E.-J. Robust misinterpretation of confidence intervals. Psychon. Bull. Rev. (2014). doi:10.3758/s13423-013-0572-3

Gigerenzer, G. Mindless statistics. J. Socio. Econ. 33, 587–606 (2004).

Matlab code

 %%% Number of repetitions Rep=10000; %%% Size of the sample population n=100;  %%% the true mean is 0.25 and the true standard deviation is 0.75 %%% DATA is a Rep by n matrix. Each column represent a population of samples TrueM = 0.25; TrueSD = 0.75; DATA = TrueM+TrueSD*randn(n,Rep);  %%% Z-value for computing confidence interval 1.96 for 95% CI and 2.575 for %%% 99% CI Z=1.96;  %%% computing sample mean, SD and CI for each population M = mean(DATA);%% vector with Rep values SD = std(DATA);%% vector with Rep values CIlow = M-(Z*SD/sqrt(n));%% vector with Rep values CIhigh = M + (Z*SD/sqrt(n));%% vector with Rep values  %%% to compute the probability that the TRUE mean is contained in the %%% computed 95% confidence interval (actual definition) P = 1-sum((CIlow>TrueM) + (CIhigh<TrueM))/Rep;  disp(['Prob that the true mean is contain in the CI: ' num2str(P)])  %%% to compute the probability that the SAMPLE mean is contained in the %%% computed 95% confidence interval (actual misinterpretation) for i=1:Rep,     N(i)=sum((CIlow(i)<M).*(CIhigh(i)>M)); end P2=(sum(N))/Rep^2;  disp(['Prob that the sample mean is contain in the CI: ' num2str(P2)]) 
 Prob that the true mean is contain in the CI: 0.9454 Prob that the sample mean is contain in the CI: 0.8293 


Published with MATLAB® 7.12

0 Comments

Two teachers for the cerebellum

12/2/2013

0 Comments

 
Picturesource: http://bit.ly/1cOuMk8
It is widely believed that cerebellar plasticity is driven by climbing fiber inputs. For instance, David Marr (1969) suggested that the climbing fiber input served as a teacher for post-synaptic Purkinje cells, which has found some empirical support (Najafi & Medina, 2013).

However, the timing of complex spikes during saccade adaptation (Catz, Dicke, & Thier, 2008) suggests that climbing fiber input might not be the sole teacher in the cerebellum. In this study, Nguyen-Vu and colleagues tested the hypothesis that Purkinje cells themselves can drive adaptation of the vestibulo-ocular reflex (VOR), a form of motor learning.

In this study, Nguyen-Vu and colleagues demonstrates that there exists more than one teacher for cerebellar learning and that changes in Purkinje cell activity can drive motor learning.


Read More
0 Comments
<<Previous

    written by Jean-Jacques Orban de Xivry

    Scientist in the motor control field.

    Tweets by jjodx
    Jean-Jacques Orban de Xivry's bibliography

    Categories

    All
    Bayes
    Decision Making
    Dopamine
    Eye Movement
    Eye Movements
    Gaba
    Internal Model
    Learning
    Motion Processing
    Motor Control
    Motor Cortex
    Neural Activity
    Plasticity
    Prediction
    Retina
    Reward
    Saccade
    Science
    Sensory Processing
    StatEd
    Statistics
    Synaptogenesis
    Tdcs
    Tms

    Archives

    July 2018
    January 2018
    September 2017
    December 2016
    October 2016
    September 2016
    September 2015
    May 2014
    January 2014
    December 2013
    October 2013
    July 2013
    March 2013
    December 2012
    November 2012
    October 2012
    March 2012
    November 2011
    August 2011
    June 2011
    March 2011
    December 2010
    November 2010
    October 2010
    September 2010
    August 2010
    April 2010
    February 2010
    December 2009
    November 2009
    September 2009
    June 2009
    May 2009
    April 2009

    RSS Feed

Powered by Create your own unique website with customizable templates.