According to Adarsh Kalikadien, PhD candidate at the Faculty of Applied Sciences, open access publishing is not enough if you really want to perform open science, especially when you work with large data sets or self-designed software. "Freely accessible or not, a PDF full of messy data is useless to me as a fellow researcher. Publishing data and code openly is only of value if someone else can work with it."
Out of frustration comes inspiration
The frustration arose when Kalikadien developed his own software ChemSpaX during his master's research, a tool that designs variations on catalysts in 3D. "I had to write the program completely from scratch. Open source tools from others were unusable or I had to modify them too much. Sometimes the code was so messy that its purpose was untraceable." The thesis project resulted in a 10 as a final grade and two publications.
Problem: organizing your data properly and writing your code in a useful way takes a lot of time and energy. "And you don’t really have the time for that, because of all other requirements that come along with working on your PhD.” Still, Kalikadien went for it. In fact, the lack of good documentation in academia was his main reason to start his current PhD project.
“Freely accessible or not, a PDF full of messy data is useless to me as a fellow researcher. Publishing data and code openly is only of value if someone else can work with it.”
After Kalikadien graduated as a chemical engineer from TU Delft in 2021, his supervisor Prof. Evgeny Pidko of the Inorganic Systems Engineering group, knew he was the right PhD candidate for a project that had been ready to start for a while. He could continue his work on ChemSpaX during the project. "During the project, we will develop a workflow, or roadmap, to digitally design catalysts and make predictions of properties at high speed. ChemSpaX will be a part of that."
Catalysts are molecules used by the industry to make all kinds of products we encounter in our daily lives, from drugs to plastic recycling. Their function is to speed up the chemical reaction and make the process less energy intensive. Traditionally, the development of a new catalyst is done through trial and error. Kalikadien and his colleagues want to make that process data-driven. "We are developing a workflow consisting of several software tools, that draws molecules in 3D and predicts what properties they will have. You can make and test the most interesting suggestions in the lab. You then feed the observation data back to the program, improving the model continuously."
Flow with farma
To give the project direction, Kalikadien is working with pharmaceutical company Janssen. "At Janssen, they test dozens of catalysts at a time in a high-throughput experimentation lab. We use their data to further develop our models." What Janssen does with the specific catalysts remains behind closed doors, but that makes no difference to Kalikadien and his colleagues. "What matters to us is the research process, the software and the automated workflow. Based on that, we (or others) can eventually design catalysts with the desired properties."
I want to show that it is indeed possible to do scientific research as well as develop open source tools that are easy to use and adapt.
Still, the field of catalysis research is rather closed, partly because commercial parties often play an important role. "Even academic groups often don't just share their data. There are several research groups around the world working on digitizing catalysis, but there are hardly any software tools available or easily accessible to others. Some scientists do publish their data, but without molecular structures of the catalyst and a thorough manual, you can't do anything with it."
From the first line
Kalikadien is taking a different approach, along with bachelor and master students who are helping out. They publish their software open source via online software platform GitHub. In addition, they write the code in such a way that it is usable and adaptable by others. "From the very first lines of code, we take this into account. It requires a different way of thinking, but if that works, it takes little extra time. Moreover, when a student finishes their research, we avoid having to figure out afterwards what their generated code and data mean. With this new approach, it is immediately clear and adoptable by the next student."
Examples and standards
So with good documentation and carefully written code, among other things, Kalikadien succeeds in making his results more accessible to colleagues in his own groups, as well as other interested researchers. He hopes to set an example. "I want to show that it is indeed possible to do scientific research as well as develop open source tools that are easy to use and adapt."
But just leading by example is not enough, the doctoral candidate believes. ""The effort all of this takes is not yet rewarded in academia, so we need to create incentives that change that. For example, academic journals can set requirements for researchers on how they submit their data and code more often. These requirements would have to be met before they can publish your article.
Universities and research groups can also take more control, according to Kalikadien. “For example, at TU Delft you have to create a data management plan at the start of your PhD project. You can also participate in the Open Life Science program, where you can work on your own open science project and get access to mentors and connections. PhD candidates get graduate school credits for it. Thanks to this program, I was able to rewrite ChemSpaX to the Python programming language, so that the computer program works well with the other software tools.
The effort Open Science takes is not yet rewarded in academia, so we need to create incentives that change that.
When you publish well-organized data through the principles of Open Science, you help make academia more transparent and efficient. "Moreover, as a researcher, you don't have to keep reinventing the wheel. Instead, you can build on what your fellow researchers have designed much more efficiently," Kalikadien adds. "In addition, science is at the forefront of knowledge and development. If we work openly and transparently, the effects trickle down to the rest of society."