Code Is Science: Treat It with the Respect It Deserves Through Training, Sharing, and Sustainable Archiving
Code is not an afterthought in science — it is a core part of the research process. From simulating climate systems to modeling protein folding and exploring the universe, software powers modern scientific discovery. Yet despite its critical role, code is often treated as secondary to data and results. This oversight undermines reproducibility, transparency, and long-term scientific progress. One major challenge is the evolving nature of software. Unlike static data, software is continuously updated, with new versions, patches, and features introduced regularly. Without a clear "version of record," it becomes difficult to cite, reproduce, or build upon past work. This creates confusion, especially when different versions of the same project require different citations, dependencies, or licensing terms. The traditional FAIR principles — making data findable, accessible, interoperable, and reusable — have been adapted for software, but they come with significant drawbacks. Applying FAIR to software demands constant metadata updates, version tracking, and archiving for every release. For projects with frequent updates — some releasing daily — this creates a heavy administrative burden that distracts from actual development. It also risks discouraging contributors who already work under tight time and resource constraints. Moreover, the current system often fails to recognize the real effort behind software development. Researchers who maintain open-source tools often go unrecognized, despite their vital role in advancing science. To address these issues, we propose a new framework called CODE beyond FAIR. This approach shifts focus from rigid, bureaucratic archiving to practical, sustainable practices that support both preservation and ongoing development. First, scientists must be trained to share code effectively. Every researcher, regardless of discipline, should understand the basics of software engineering — including version control, documentation, licensing, and archiving. This doesn’t mean every scientist must become a developer, but they should know how to properly share and deposit code. Integrating foundational computational skills into PhD programs across all disciplines — as done at institutions like Stanford, Harvard, Oxford, and Cambridge — is essential. Organizations like The Carpentries and Neuromatch Academy have already proven the value of such training, reaching thousands of researchers globally through accessible, hands-on workshops. Second, institutions and publishers must make code sharing a standard requirement. Publishers should mandate that code be archived at submission, using platforms like GitHub, Zenodo, or Software Heritage. These platforms are already widely used and offer reliable, permanent links to code and its history. A simple button click can ensure code is preserved and citable. Third, systems should support seamless cross-referencing across platforms. Initiatives like the European Open Science Cloud help connect projects, versions, and repositories, enabling better traceability and reuse. Ultimately, software should be treated not as a final product to be archived, but as a living, evolving tool. By investing in training, simplifying archiving, and recognizing contributors, the scientific community can ensure that code is valued, preserved, and shared — not overlooked.
