A major change to our profession is afoot. Well, more than afoot – the “E-science” ship has sailed and has some major momentum behind it, but are we on board? If you’re one of the librarians still standing on the dock wondering what “E-science” is, you’re not alone. In simple terms, E-science is international, collaborative, technology-driven science that brings together data, research, and people around the world. The Joint Task Force on Library Support for E-Science describes it as an “inter- and multi-disciplinary” enterprise “with significant dependence on computation and computer science;” and as a data-intensive approach to scholarship that is focused on team-based research composed of scholars spread across the globe.
Some examples of team-based, cross-disciplinary research with people and computers connected within a “grid” of networks across the world: ClimatePrediction.net which leverages the underused computer processor power of home computers to study climate change models; the Southern California Earthquake Center which has over 600 collaborators from Tokyo to Woods Hole, Massachusetts working on ways to understand earthquake behavior in order to minimize the damages of earthquakes; and the Biomedical Informatics Network which has pooled together biomedical researchers and computer scientists from sites spanning the UK and the US to share data and research insights to enhance diagnosis and treatment of diseases. This is science that rises above place, institution, and even country, science that shatters the boundaries upon which our libraries are traditionally built.
Stepping back a bit to take in a wider view, an ever broader term, E-research, is defined as “the development of, and the support for, advanced information and computational technologies to enhance all phases of research processes” (Luce, 2008). What this all comes down to is supporting research on the broadest scale, with added layers of depth that include high performance computing, both human and non-human consumers of information, and an utterly complex world of data types and data quantities. Add in the diverse expectations of not only the scientists conducting the work, but also their funding sources, and their network of existing and potential colleagues, and you start to get the picture (see Nature’s Big Data issue (September 3, 2008), for a nice sampling of where things are headed).
While scholarly communication and open access were the big issues of library conferences a few years ago, expect to see E-science take its place in prominence. As a case in point, I recently attended the ARL/CNI Fall Forum on “Reinventing Science Librarianship.” With E-science as the main spotlight, the conference speakers delved into themes surrounding data curation, transforming libraries to support the needs of researchers, support for virtual organizations, developing cyberinfrastructure, and training for librarians in the E-science landscape (see the Proceedings for more details).
The theme from this conference that I want to focus on in this post is what elements could be holding our profession back from being able to become major players in the E-science landscape and what elements are going to give us a leg-up in enabling us to become credible, respected participants in shaping the future of E-science/E-research. As a profession, we are at a point where the successes of what we have done traditionally act both as limitations and advantages to our ability to play a major role in E-science (aka, “What one loses on the swings one gains on the roundabouts”) – currently, the balance is weighted more heavily by our limitations. I’ll outline some of the limitations and counterbalance those with the aspects of our profession, that if they become more fully fleshed out, would shift the balance.
To begin with, E-science is global while libraries (for the most part) are not globally-oriented. In the E-science landscape, our users are no longer identified by institution nor are they even necessarily human – our E-science users are also networks of computers. However, the home institutions of libraries are our comfort zones and we are bound to them in many ways, but most importantly, we need them to be fiscally afloat. How do we break free of the mentality that we can only support our institutional users when, in an E-science landscape, our users cross all kinds of institutional boundaries? Even the licenses that we negotiate and sign reinforce the restrictive behaviors of libraries in terms of defining who and where our user communities are.
The other major issue is that we are dealing with a very fast event horizon when it comes to E-science. As James Mullins noted at the ARL/CNI Fall Forum, our profession has had over 100 years to develop best practices for managing, organizing, and curating print objects – books, journals, manuscripts, etc. But because of the rampant pace at which researchers are generating data that they need to share, re-use, and preserve “what took us 100 years to do for print, we now have to do in ten years for digital data” (Mullins, 2008). While our profession’s goal for E-science does and will include traditional roles like collecting, storing, organizing, and making information useful, we need to be able to perform these roles with datasets that are diverse and multi-dimensional in the sense that data lends itself to constantly being built upon by students and scholars. We’re going to need to help researchers by connecting datasets with articles, scholars, computer programs, and networks that aren’t necessarily easily identified/pigeonholed into a particular discipline or a single geographic area. Many disciplines are already embarking on their own collaborative research solutions (astrophysics, for example), but lack some of the standards and archival considerations that are distinctive of the library discipline; in essence, they are creating their own virtual research networks because libraries, for the most part, have not yet taken steps to meet these needs.
Despite this limitation of our profession to jump on board, many are ready to reconceptualize and reposition ours jobs to address the needs of E-science. Because E-science is institution-agnostic, this re-envisioning of the librarian process must involve crossing institutional boundaries, but we are so closely tied to our institutional identities and support structures that this is going to be a major hurdle. We will need to look to unique partnerships so that we can hybridize our organizations with other organizations that will enable us to build expertise and support beyond our institutional boundaries. These kinds of partnerships would need to be positioned to enable the active development of technologies for sharing, managing and curating massive quantities of diverse datasets while growing a workforce of data savvy librarians and information scientists. Partnerships like the Data Intensive Cyber Environments Research group (DICE) with a new arm at the University of North Carolina at Chapel Hill and the San Diego Supercomputing Center + University of California, San Diego partnership are a few great examples of these kinds of partnerships for big scientific research agendas. What about not-so-big research? It has been noted that while “small science” (research not necessarily backed by lots of grant dollars) is most in need of an E-science solution, it is most overlooked in terms of funding and support. Libraries at small schools without research grant support won’t be able to get resources to support E-science even if they’d like to. To this end, library and other academic consortia alongside professional organizations like the ARL are likely going to have to take the lead to make any headway at all. As touched on in the following sections, some potential areas for progress include training for librarians, reconceptualizing the benchmarks for what make our libraries successful, and building relationships with publishers and grant funding agencies that focus on defining standards and best practices for data sharing, re-use and curation.
The lack of E-science training opportunities for librarians has been brought to the fore as a major limitation. Swan and Brown (2008) offer many recommendations and reflections on “skilling up” for E-science. Not only do MLS/MLIS programs need to develop courses in data curation, data management and data infrastructure, but libraries need staff who are skilled enough to be involved at every stage of data generation, collection, analysis, interpretation, synthesis, preservation, storage, and re-use. Existing librarians will need to take part in practical, hands-on, career-long training for the whole data life cycle. Exemplars include data curation courses such as those offered by the Specialization in Data Curation and the Summer Institute in Data Curation at the University of Illinois. These courses are sought after by both bench scientists as well as librarians. Some have even postulated that the necessity of holding an MLS/MLIS degree is an antiquated notion in this new context. Libraries who are already dabbling at the cutting edge are positioning themselves to get in on the act by creating jobs to support E-science that don’t require an MLS/MLIS. They are turning library services on their head and hiring people who can collaborate with scientists at the lab bench, in the grant proposal process and in the classroom.
The ways in which we’ve defined our libraries based on our collections and services raise several questions that our institutions will need to come to terms with: What is unique about our research library content and services? Think about things like the published and unpublished output of the researchers at your institution – how is the library showcasing that content to the global community? What percent of our budget resources support unique services? In our drive to be competitive, we find ourselves duplicating collections that are already available at flagship universities while neglecting the truly unique content on our campuses. Libraries could begin to build collections using scholarship generated “‘at the source’—that is, collect, organize, and host data sets generated by researchers at their own institutions. In doing so, libraries have the potential to exert influence over the emerging data sets market rather than waiting for commercial vendors to harvest and package the data for later re-sale” (Davis and Vickery, 2007). Some research communities are already taking the lead on connecting datasets to publications (something libraries have been partially successful at with institutional repositories) – examples include Dryad (a database of evolutionary biology and ecology research articles and datasets) and the Angiosperm Phylogeny Website (a compilation of all known research on the systematics of flowering plants). Why aren’t libraries more fully involved in these efforts?
There are increasing expectations for scientists to save their research data and document the research process. Beyond being ethically responsible researchers, they are increasingly becoming responsible for complying with federal and institutional regulations, protecting their intellectual property rights, maintaining a record-keeping plan and an audit trail, and managing data files so they can be accessed into the future. Funding sources are increasingly mandating that researchers make their data accessible (e.g., NIH) and more and more publishers require deposit of datasets as a prerequisite for publication. These are very complex issues for anyone to deal with, but many of these are issues that libraries have deep knowledge about. Within the E-science landscape, libraries are going to be expected to evolve to act “as a catalyst for an interdisciplinary community…The role of the library moves from manager of scholarly products to that of participant in the scholarly communication process” (Lougee, 2002). We have expertise in intellectual property and copyright and we’ve got a healthy respect for openness (Open Data/Open Science) balanced with ownership issues that impact promotion and tenure. We have expertise in standards and in developing and applying metadata in ways that support the management and curation that drive future reuse and repurposing of digital content. Educating researchers on these issues and even stepping in to help manage these issues is an important role for librarians to continue to build upon.
Information has dimension – it can exist in many different contexts and serve many different needs – as library professionals and lifelong students, we have an obligation to recognize and seize opportunities that enhance the dimensionality of information and help information seekers tap into, evaluate and fully exploit this dimensional quality of scholarship. We’ve planted our profession at the nexus of many different disciplines and organizationally we have broad knowledge across all of those disciplines. By making our depth within those disciplines go a little deeper with proper training for librarians, by helping researchers make useful connections across disciplines, by educating and collaborating with researchers on how to cultivate their data in such a way that it can be shared, re-used and preserved over space and time, we can have significant impact in shaping the future of E-science/E-research. Scientists are often hard to pin down and their research process is often hard to isolate into discrete, recognizable stages that librarians can develop relationships with and solutions for, but it’s our responsibility to become relevant within the process. If libraries can pull together, re-envision our roles, and build the sort of support networks required by the international collaborations inherent in E-science, the rewards will exceed all expectations. These opportunities for libraries to be key players in team-based, cross-disciplinary research are opportunities that our profession and the scientific enterprise cannot afford to miss.
Time to hear from you – a few question to spur your comments:
1. What does E-science mean to you?
2. What does an E-science librarian look like?
3. Do you think our profession is ready to support researchers in an E-science landscape?
- E-science Talking Points for ARL Deans and Directors
- Data Audit Framework Development Project
- Agenda for Developing E-Science in Research Libraries
- The Institutional Challenges of Cyberinfrastructure and E-Research
Much appreciation to Kim and Derik from ITLWTLP for their invaluable editing skills, and to Annette Day, Honora Eskridge, and Marcus Helfrich for providing thoughtful feedback on drafts of this post.