Data dredging/p-hacking
Data dredging/p-hacking

Image has been adapted from: under Creative Commons License 2.5 Randall Munroe. Ioannidis further explains that “What is less well appreciated is that bias and the extent of repeated independent testing by different teams of investigators around the globe may further distort this picture and may lead to even smaller probabilities of the research findings being indeed true.” Therefore, it is prudent to understand that interpretation of research and the data obtained from it is as important as classifying it as ‘positive’ or ‘negative’. Another instance that is closely associated with this phenomenon is the recent trend of multicentre studies. Ioannidis (2005) explains that ‘negative’ research is useful and, in fact, it is a misnomer to classify it as negative. Therefore, it should be understood that statistical inference only tells us about the range of the truth within the data that has been observed. Part of the reason for this is also because the difference between inferential and descriptive use of statistics is often blurred, and could be mistaken by novice epidemiologists. A significant part of the statistical estimate is based on the assumption that the correct statistical model is estimated. The predominant reason for this practice is the widespread notion among academics that “statistically significant data is noteworthy, and one that is not statistically significant is not”. We may use these term interchangeably in the discussion below.

data dredging/p-hacking

Data dredging is recognized by several names such as ‘fishing trip’, ‘data snooping’, ‘p-hacking’ and so on. This may lead to an exponential increase in the risk of inclusion of large quantities of false positive results, thereby corrupting the data that was meant to be originally reported.

data dredging/p-hacking data dredging/p-hacking

  • Impact of data dredging on epidemiologyĭata dredging is defined as “cherry-picking of promising findings leading to a spurious excess of statistically significant results in published or unpublished literature”.
  • data dredging/p-hacking

    The following discussion will attempt to define data dredging and provide an answer to such questions. What is data dredging? How does it affect the p-value? What is its impact on the world around us?

    Data dredging/p-hacking