Saturday, May 21, 2016

P Values: Everything You Know Is Wrong

The Gist:  P values are probably the most “understood” statistic amongst clinicians yet are widely misunderstood.  P values should not be used alone to accept or reject something as “truth” but they may be thought of as representing the strength of evidence against the null hypothesis [1].

Part of our 15 Minutes - 'Stats are the Dark Force?' residency lecture series

P: Everything You Know Is Wrong from Lauren Westafer on Vimeo.

At various times in my life I, like many others, have believed the p value to represent one of the following (none of which are true):
    • Significance
      • Problem:  Significance is a loaded term.  A value of 0.05 has become synonymous with “statistical significance.”  Yet, this value is not magical and was chosen predominantly for convenience [3].  Further, the term “significant” may be confused with clinical importance, something a statistic cannot answer.
    • The probability that the null hypothesis is true.
      • Problem: The calculation of the p value includes the assumption that the null hypothesis is true.  Thus, a calculation that assumes the null hypothesis is true cannot, in fact, tell you that the null hypothesis is false.
    • The probability of getting a Type I Error 
      • Background: Type I Error is the incorrect rejection of a true null hypothesis (i.e. a false positive) and the probability of getting a Type I Error is represented by alpha. Alpha is often set at 0.05 so that there is a 5% chance you are wrong if you reject the null hypothesis. This is a PRE test calculation (set before the experiment)
      • Problem: Again, the calculation of the p value assumes the null hypothesis is true. The p value only tells us the probability of getting the data we did, it does NOT speak to the underlying truth of whatever is being tested (i.e. efficacy). The p value is also a POST test calculation.
      • The error rate associated with various p values varies, depending on the assumptions in the calculations, particularly prevalence. However, it's interesting to look at some of the estimates of false positive error rates often associated with various p values:  p=0.05 - false positive error rate of 23-50%; p=0.01 - false positive error rate of 7-15% [5].
P value is the probability of getting results as extreme or more extreme, assuming the null hypothesis is true. Originally, this statistic was intended to serve as a gauge for researchers to decided whether or not a study was worth investigating further [3].  
  • High P value - data are likely with a true null hypothesis [Weak evidence against the null hypothesis]
  • Low P value - data are UNlikely with a true null hypothesis [Stronger evidence against the null hypothesis]
Example:  A group is interested in evaluating needle decompression of tension pneumothorax and proposes the following:
    • Hypothesis - Longer angiocatheters are more effective than shorter catheters in decompression of tension pneumothorax.
    • Null hypothesis - There is no difference in effective decompression of tension pneumothorax using longer or shorter angiocatheters.
A group, Aho and colleagues, did this study and found a p value of 0.01 with 8 cm catheters compared with 5 cm catheters.  How do we interpret this p value?  
  • We would expect the same number of effective decompressions or more in 1% of cases due to random sampling error.  
  • The data are UNLIKELY with a true null hypothesis and this is decent strength evidence against the null hypothesis.
Limitations of the Letter “P”
    • Reliability.  P values depend on the statistical power of a study. A small study with little statistical power may have a p value greater than 0.05 and a large study may reveal that a trivial effect has statistical significance [2,4].  Thus, even if we are testing the same question, the p value may be "significant" or "nonsignificant" depending on the sample size.
    • P-hacking.  Definition:  "Exploiting –perhaps unconsciously - researcher degrees of freedom until p<.05" Alternatively: "Manipulation of statistics such that the desired outcome assumes "statistical significance", usually for the benefit of the study's sponsors" [7].
      • A recent study of abstracts between 1990-2015 showed 96% contained at least 1 p value < 0.05.  Are we that wildly successful in research? Or, are statistically nonsignificant results published less frequently (probably).  Or, do we try to find something in the data to report as significant, i.e. p-hack (likely).
P values are neither good nor bad. They serve a role that we have distorted and, according to the American Statistical Association: The widespread use of “statistical significance” (generally interpreted as “p ≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process [1].  In sum, acknowledge what the p value is and is not and, by all means, do not p alone.

  1. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Stat. 2016;1305(April):00–00. doi:10.1080/00031305.2016.1154108.
  2. Goodman S. (2008) A dirty dozen: twelve p-value misconceptions. Seminars in hematology, 45(3), 135-40. PMID: 18582619 
  3. Fisher RA. Statistical Methods for Research Workers. Edinburgh, United Kingdom: Oliver&Boyd; 1925.
  4. Sainani KL. Putting P values in perspective. PM R. 2009;1(9):873–7. doi:10.1016/j.pmrj.2009.07.003.
  5. Sellke T, Bayarri MJ, Berger JO.  Calibration of p Values for Testing Precise Null Hypotheses.  The American Statistician, February 2001, Vol. 55, No. 1
  6. Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. Jama. 2016;315(11):1141. doi:10.1001/jama.2016.1952.
  7. PProf.  "P-Hacking."  Urban Dictionary. Accessed May 1, 2016.  Available at:


  1. This is a really good post - I think Simon Carly equally does some great stuff on risk



  2. The personal statement is a critical component of your medical school application. This article walks you through the five most important tips to ensure that your personal statement stands out. See more residency personal statement editing

  3. This comment has been removed by the author.

  4. The individual affirmation is often a important portion of your current health care institution request. This information guides anyone over the a few most crucial guidelines in order that your own affirmation shines.

  5. Thank you so much for sharing such informative post.
    Medical Translation Services

  6. Thank you very much for sharing such a beautiful article.

  7. Dr Itua cure my HIV, I have been a ARV Consumption for 10 years. i have been in pains until i came across Dr Itua on blogs site.I emailed him about my details of my HIV and my location i explained every thing to him and he told me that there is nothing to be scared of that he will cured me, he gave me guarantee,He ask me to pay for items fees so when i'm cured I will show gratitude I did and giving testimony of his healing herbs is what I'm going to do for the rest of you out there having HIV and other disease can see the good work of Dr Itua.I received his herbal medicine through EMS Courier service who delivered to my post office within 5 working days.Dr Itua is an honest man and I appreciate him for his good work.My GrandMa called him to appreciate him and rest of my friends did too,Is a joy to me that I'm free of taking Pills and having that fat belle is a will understand what i'm talking about if you have same problem I was having then not now though.I'm free and healthy Big Thanks To Dr Itua Herbal Center.I have his calendar too that he recently sent me,He Cure all kind disease Like,Cancer,Herpes,Hiv,Hepatitis B,Fibroid,Diabetes,Dercum,Copd ,and also Bring back Ex Lover Back..Here his Contact Or Whats_app Number +2348149277967