Early developments (1950–1990) Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by
Paul Otlet or
Vannevar Bush already incorporated numerous features of online scientific infrastructures. After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output. The issue became politically relevant after the successful launch of
Sputnik: "The Sputnik crisis turned the librarians' problem of bibliographic control into a national information crisis." The emerging computing technologies were immediately considered as a potential solution to make a larger amount of scientific output readable and searchable. Access to foreign language publication was also a key issue that was expected to be solved by
machine translation: in the 1950s, a significant amount of scientific publications
were not available in English, especially the one coming from the Soviet bloc. Influent members of the
National Science Foundation like
Joshua Ledeberg advocated for the creation of a "centralized information system",
SCITEL that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency. In the plan laid out by Ledeberg to Eugen Garfield in November 1961, the deposit would index as much as 1,000,000 scientific articles per year. Beyond full-text searching, the infrastructure would also ensure the indexation of citation and other metadata, as well as the automated translation of foreign language articles. Although it anticipates key features of online scientific platforms, the SCITEL plan was technically irrealistic at the time. The first working prototype on an online retrieval system developed in 1963 by Doug Engelhart and Charles Bourne at the Stanford Research Institute was heavily constrained by memory issues: no more than 10,000 words of a few documents could be indexed. Instead of a general purpose publishing platform, the early scientific computing infrastructures focused on specific research areas, such as
MEDLINE for medicine, NASA/RECON for space engineering or OCLC Worldcat for library search: "most of the earliest online retrieval system provided access to a bibliographic database and the rest used a file containing another sort of information—encyclopedia articles, inventory data, or chemical compounds." This early development of scientific computing affected a large variety of disciplines and communities, including the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection". Yet these infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication. To become technically feasible, scientific infrastructure could never be open and became fundamentally hidden to their end users: The development of digital infrastructure for scientific publication was largely undertaken by private companies. In 1963, Eugene Garfield created the
Institute for Scientific Information that aimed to transform the projects initially envisioned with Lederberg into a profitable business. The
Science Citation Index relied on a computational processing of citation data. It had a massive and lasting influence on the structuration of global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal. Garfield also successfully launched
Current Contents, a periodic compilation of scientific abstracts that acted as a simplified commercial version of the central deposit envisioned within SCITEL. Rather than being replaced by a centralized information system, leading scientific publishers have been able to develop their own information infrastructure that ultimately reinforced their business position. By the end of the 1960s, the dutch publisher
Elsevier and the german publisher
Springer have started to computarize their internal data, as well as the management of the journal reviews. Until the advent of the web, the landscape of scientific infrastructures remained fragmented. Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols". The birthing place of the World Wide Web, the CERN, had its own version of Internet, CERN-Net and also supported its own protocol for e-mail exchange. The European Space Agency used its own iteration of the RECON system also used by NASA engineers (ESRO/RECON). The insulated scientific infrastructures could hardly be connected before the advent of the web. Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".
The Web Revolution (1990–1995) The
World Wide Web was originally framed as an open scientific infrastructure. The project was inspired by
ENQUIRE, an information management software commissioned to
Tim Berners-Lee by the
CERN for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth". While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community". Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases, and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other". Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data". The web rapidly superseded pre-existing online infrastructure, even when they included more advanced computing features. From 1991 to 1994, users of the
Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the
Worm Community System could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services." The Web and similar protocols developed at the time have had a similar impact on scientific publications. Early forms of open access publishing were not developed by large scale institutional infrastructures but through small initiatives. Universal access, regardless of the operating system, made it possible to maintain and share community-driven electronic journals year before online commercial scientific publishings became viable: The first
open-access repositories were individual or community initiatives as well. In August 1991,
Paul Ginsparg created the first inception of the
arXiv project at the
Los Alamos National Laboratory in answer to recurring storage issue of academic mailboxes on account of the increasing sharing of scientific articles.
Building scientific infrastructures for the web (1995-2015) The development of the World-Wide Web had rendered numerous pre-existing scientific infrastructure obsolete. It also lifted numerous restrictions and obstacles to online contribution and network management that made it possible to attempt more ambitious project. By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue. The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific time period, critical databases, online tools or publishing platforms could hardly be maintained; and project managers were faced with a
valley of death "between grant funding and ongoing operational funding". Several competing terms appeared to fill this need. In the United States, the
cyber-infrastructure was used in a scientific context by a US National Science Foundation (NSF) blue-ribbon committee in 2003: "The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy." E-infrastructure or e-science were used in a similar meaning in the United Kingdom and European countries. Thanks to "sizable investments", major national and international infrastructures have been incepted from the initial policy discussion in the early 2000s to the economic crisis of 2007–2008, such as the
Open Science Grid,
BioGRID, the
JISC, or the
Project Bamboo. Specialized free software for scientific publishing like
Open Journal Systems became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal website and the digital conversion of existing journals. Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and not evolved significantly since then. By 2010, infrastructure are "no longer in infancy" and yet "they are also not yet fully mature". While the development of the web solved a large range of technical issues regarding network management, building scientific infrastructure remained challenging. Governance, communication across all involved stakeholders, and strategical divergences were major factors of success or failure. One of the first major infrastructure for the humanities and the social science, the
Project Bamboo was ultimately unable to achieve its ambitious aims: "From the early planning workshops to the
Mellon Foundation's rejection of the project's final proposal attempt, Bamboo was dogged by its reluctance and/or inability to concretely define itself". This lack of clarity was further aggravated by recurring communication missteps between the project initiators and the community it aimed to serve. "The community had spoken and made it clear that continuing to emphasize
Service-oriented architecture would alienate the very members of the community Bamboo was intended to benefit most: the scholars themselves". Budgets cuts following the economic crisis of 2007-2008 underlined the fragility of ambitious infrastructure plans relying on a significant recurring funds. Leading commercial publishers were initially distanced by the unexpected rise of the Web for academic publication: the executive board of
Elsevier "had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal". The persistence of high revenues from subscription and the consolidation of the sector made it possible to fund the conversion of the pre-existing online services to the web as well as the digitization of past collections. By the 2010s, leading publishers have been "moving from a content-provision to a data analytics business" and developed or acquired new key infrastructures for the management scientific and pedagogic activities: "Elsevier has acquired and launched products that extend its influence and its ownership of the infrastructure to all stages of the academic knowledge production process". Since it has expanded beyond publishing, the
vertical integration of privately owned infrastructures has become extensively integrated to daily research activities.
Toward open science infrastructures (2015-…) The consolidation and expansion of commercial scientific infrastructure had entailed renewed calls to secure "community-controlled infrastructure". The acquisition of the open repositories
Digital Commons and
SSRN by Elsevier has highlighted the lack of reliability of critical scientific infrastructure for open science. The SPARC report on European Infrastructures underlines that "a number of important infrastructures at risk and as a consequence, the products and services that comprise open infrastructure are increasingly being tempted by buyout offers from large commercial enterprises. This threat affects both not-for-profit open infrastructure as well as closed, and is evidenced by the buyout in recent years of commonly relied on tools and platforms such as SSRN, bepress, Mendeley, and Github." In contrast with the consolidation of privately owned infrastructure, the open science movement "has tended to overlook the importance of social structures and systemic constraints in the design of new forms of knowledge infrastructures". It remained mostly focused to the content of scientific research, with little integration of technical tools and few large community initiatives. "Common pool of resources is not governed or managed by the current scholarly commons initiative. There is no dedicated hard infrastructure and though there may be a nascent community, there is no formal membership." More precise concepts were needed to embed ethical principles of openness, community-service and autonomous governance in the building of infrastructure and ensure the transformation of small localized scholarly networks into large, "community-wide" structures. In 2013,
Cameron Neylon underlined that the lack of common infrastructure was one of the main weakness of the open science ecosystem: "in a world where it can be cheaper to re-do an analysis than to store the data, we need to consider seriously the social, physical, and material infrastructure that might support the sharing of the material outputs of research". Two years later, Neylon, Geoffrey Bilder and Jenifer Lin defined a series of
Principles for Open Scholarly Infrastructure that reacted primarily to the discrepancy between the increasing openness of scientific publications or datasets and the closeness of the infrastructure that control their circulation. Since 2015 these principles have become the most influential definition of Open Science Infrastructures and been endorsed by leading infrastructures such as Crossref, OpenCitations or Data Dryad and has become a common basis for the institutional evaluation of existing open infrastructures. The main focus of the
Principles is to build "trustworthy institutions" with significant commitments in terms of governance, financial sustainability and technical efficiency sot that it can be durably relied on by scientific communities. By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer." According to the 2021 Roadmap of the (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm." Examples of extensive data sharing programs include the
European Social Survey (in social science),
ECRIN ERIC (for clinical data) or the
Cherenkov Telescope Array (in Astronomy). In agreement with the original intent of the
Principles, open science infrastructure are "seen as an antidote to the increased market concentration observed in the scholarly communication space." In November 2021, the UNESCO Recommendation for Open Science acknowledged open science infrastructure as one of the four pillar of open science, along with open science knowledge, open engagement of societal actors and open dialog with other knowledge system and called for sustained investment and funding: "open science infrastructures are often the result of community-building efforts, which are crucial for their longterm sustainability and therefore should be not-for-profit and guarantee permanent and unrestricted access to all public to the largest extent possible." The development of open scientific infrastructure has become a debated topic regarding the future of online scientific research. In January 2021, a collective of researchers called for a
Plan I or
Plan Infrastructure in reaction to perceived shortcomings of the international initiative for open science of the cOAlition S, the
Plan S. In contrast with the focus of Plan S on scientific publication, Plan I aims to integrate all research outputs on large interoperable infrastructures: "research and scholarship are crucially dependent on an information infrastructure that treats all scholarly output, text, data and code, equally and that is based on open standards and open markets." == Organization of open infrastructures ==