Rice University statistician Genevera Allen knew she was raising an important issue when she spoke earlier this month at the American Association for the Advancement of Science (AAAS) annual meeting in Washington, but she was surprised by the magnitude of the response.
Allen, associate professor of statistics and founding director of Rice’s Center for Transforming Data to Knowledge (D2K Lab), used the forum to raise awareness about the potential lack of reproducibility of data-driven discoveries produced by machine learning (ML). She cautioned her audience not to assume that today’s scientific discoveries made via ML are accurate or reproducible. She said that many commonly used ML techniques are designed to always make a prediction and are not designed to report on the uncertainty of the finding.
Her comments garnered worldwide media attention, with some commentators questioning the value of ML in data science. Allen recently met with Rice News to discuss ML reproducibility.
Q: Were you surprised by the response?
I was very surprised by the magnitude of the response. In my talk, I was simply trying to point out a general problem with using machine learning for scientific discovery to motivate the main purpose of my talk, which was to highlight new avenues of research that can help solve this problem.
Studying the reproducibility crisis is not my area of research or expertise. Instead, I develop new tools to help scientists make more reproducible, data-driven discoveries, as well as techniques to help them assess the uncertainty in their discoveries. In my talk, I was interested in sharing some of my own research in this area as well as highlighting the challenges and potential open areas of research to other data scientists.
Q: Do you have any findings about the number or percentage of ML-based scientific discoveries that are not reproducible?
No. Anecdotal evidence suggests that there is a problem, but the extent is unknown.
There is general research on the reproducibility crisis in science, and there are many potential problems that can lead to irreproducible science. Failing to validate discoveries made with machine learning techniques is just one potential cause of irreproducibility.
But there does seem to be growing recognition that machine learning may well be contributing to the reproducibility crisis in science, and the AAAS attention kicked off a lively debate about the extent to which that might be happening. The responses I have received from machine learners as a result of the media attention are largely supportive. Most agree that scientists should be careful not to overinterpret data-driven discoveries.
But the bottom line is that we need more research, both to determine the extent of the problem and to address it.
Q: What can be done to ensure that data-driven ML discoveries are reproducible and accurate?
Journals currently require careful reporting of uncertainty about a priori hypotheses using the language of statistical inference. P-values and confidence regions are examples. Journals should require the same rigorous reporting for data-driven discoveries. For instance, journals could require that all scientific discoveries made using machine learning techniques either include a separate validation study or a section that quantifies and communicates any potential uncertainty in the discovery.
Data science researchers should also continue to develop and disseminate new techniques for not only making data-driven discoveries, but also assessing and quantifying the probability that a discovery will be reproducible. This is an exciting, open area of research, with new lines of inquiry that include post-selection inference, the stability principle and more.
Q: Do you have a published paper on this topic?
Not yet. While several of my recent research papers discuss techniques to make reproducible data-driven discoveries, I have not published a paper that highlights the problem along with potential practical solutions. However, thanks to the media attention from AAAS, I’ve begun working on one with several collaborators who also work in this area. It’s exciting work, but it is too early to say when a preprint will be available.