Describing Software So We Can Cite Software
Katherine Thornton and Finn Årup Nielsen
When researchers publish scientific articles some include information about software used in their research. In traditional publication models the rare occasions in which research software is provided along with a publication, the links between those research outputs are not always easily maintained. Funders, publishers, scholars, librarians, and other information professionals need ways to describe software that is used in research just as they need to describe scientific publications. Connecting scientific articles and the software used in their creation is not currently well-supported in scholarly communication. This is changing as more funders and publishers are mandating that research data and research software be published alongside research papers. A recent announcement from the Wellcome Trust is a new policy on research data and research software. Several other similar initiatives are underway to promote awareness of the need for software citation. We will report on exploratory research into statements about software that can be mined out of publications indexed by PubMed. We will describe the workflows we created to add statements to the Wikidata items for scientific articles from our study to describe the software that was used in the study. We will share findings about how to reuse existing ontologies to further refine statements we can make about what aspect of the research process a particular piece of software was used. This research will then be used to enrich or inform existing recommendations about software citation, software curation, research data citation, and research data curation. This will help us create data models within Wikidata that will support our work to describe complex research objects relevant to scholarly communication, including software citation involving dereferencable Wikidata URIs. Practices of description, curation, and structuring are emerging in the Wikidata community that will support precise software description according to data models that are currently being developed. We are working toward increasing the breadth of citable software (describing versions, describing configured software environments, etc.) and toward the breadth of aspects of the research process (data collection, data analysis, document typesetting, figure creation, data set used, etc). Wikidata can have properties to describe as many parts, facets, processes, perspectives on the research process as we create for ourselves. We have the option to describe complex research outputs collectively in a open collaboration system so that anyone can freely reuse this content.