Total Pageviews

Monday, March 2, 2020

No raw data, no science -- Another possible source of the scientific study reproducibility crisis, by Tsuyoshi Miyakawa

My edited version below
uses only direct quotes
from the full article, 
which is located here:


SUMMARY
( quotes selected 
by me, Ye Editor,
from various places
in the original article, 
to create a summary )

"This might be one of the most serious concerns in our research community in this era.

In the current system, where we assume that every researcher is honest, and where raw data are not required to be submitted, the consequence is that fabricated data escapes scrutiny and gets published. 

In academia ... We have a strong tendency or custom to suppose an honest mistake, rather than to suspect a fabrication and to start an official investigation following such protocol. 

We really cannot know what percentage of those manuscripts have fabricated data. 

Without formal investigation in all suspected cases, I can only speculate.

The supposition that everyone is honest cannot be valid whilst simultaneously a situation exists in which more than half of the researchers guess that over 25% of all studies are based on non-existing data.

As part of the submission process, journals could require authors to confirm that the raw data are available for inspection (or to stipulate why data are not available). 

I believe that it is now time to design a system, based on such realistic reasoning of the majority of researchers, that not everyone is “honest,” replacing the “trust-me” system that is based on the traditional idealistic assumption that everyone is good.

In the past age of print publishing, it was technically impossible to publish all raw data due to the limitation of space. 

This limitation, however, has been virtually eliminated, thanks to the evolution of data storage devices and the internet.

I propose that all journals should, in principle, try their best to have authors and institutions make their raw data open in a public database or on a journal web site upon the publication of the paper, in order to increase the reproducibility of published results and to strengthen public trust in science." 



ABSTRACT:
"A reproducibility crisis is a situation where many scientific studies cannot be reproduced. 

Inappropriate practices of science, such as HARKing, p-hacking, and selective reporting of positive results, have been suggested as causes of irreproducibility. 

In this editorial, I propose that a lack of raw data or data fabrication is another possible cause of irreproducibility.

As an Editor-in-Chief of Molecular Brain, 
I have handled 180 manuscripts since early 2017 and have made 41 editorial decisions categorized as “Revise before review,” requesting that the authors provide raw data. 

Note that I requested raw data only when 
I felt that the data were ‘too beautiful to be true’. 

In such cases where the figures looked real, I did not ask the authors to provide raw data, which was not ideal but practically unavoidable under the current data availability policy of the journal that does not require but just encourages data deposition.

Surprisingly, among those 41 manuscripts, 21 were withdrawn without providing raw data, indicating that requiring raw data drove away more than half of the manuscripts. 

I rejected 19 out of the remaining 20 manuscripts because of insufficient raw data. 

Thus, more than 97% of the 41 manuscripts did not present the raw data supporting their results when requested by an editor, suggesting a possibility that the raw data did not exist from the beginning, at least in some portions of these cases.

Considering that any scientific study should be based on raw data, and that data storage space should no longer be a challenge, journals, in principle, should try to have their authors publicize raw data in a public database or journal site upon the publication of the paper to increase reproducibility of the published results and to increase public trust in science."



INTRODUCTION
"The reproducibility or replicability crisis is a serious issue in which many scientific studies are difficult to reproduce or replicate. 

It is reported that, in the field of cancer research, only about 20–25%, or 11% of published studies could be validated or reproduced, and that only about 36% were reproduced in the field of psychology. 

... I argue that making raw data openly available is not only important for reuse and data mining but also for simply confirming that the results presented in the paper are truly based on actual data. 

Flowchart of the manuscripts handled by Tsuyoshi Miyakawa in Molecular Brain from December 2017 to September 2019:








DETAILS
Among the 41 manuscripts, 21 were withdrawn without providing raw data
...  out of the remaining 20 cases where the authors resubmitted the manuscripts with some raw data, 19 had insufficient data. 

Among these, nine presented partial or no data (e.g., only one sample for a condition). 

In these cases, the authors were willing to provide at least some but not all of the data. 

In seven cases, the raw data presented by the authors did not match the data that were presented in the results."

Among the 41 manuscripts, only one was sent out for review and this was accepted for publication. 

Thus, more than 97% of the 41 manuscripts did not or could not provide appropriate raw data supporting the results shown when requested by an editor. 

Note that the editorial policy of Molecular Brain states that submission of a manuscript implies that materials described in the manuscript, including all relevant raw data, will be freely available to any scientist wishing to use them for non-commercial purposes.

Among the 40 withdrawn or rejected manuscripts, 14 were later published in other journals. 

Twelve journals out of those that published the 14 papers require or recommend that the authors provide raw data upon request from readers in their policies. 

Therefore, we sent emails and printed letters to the authors of the 12 papers in those journals requesting raw data for the results in a Figure in the papers. 

Ten of the authors of the 12 papers did not respond to our request. 

The one who responded sent us raw data only for one sample per condition, while each condition was supposed to have 6 samples. 


Then, how about 140 other manuscripts that were not considered “too beautiful to be true? "