    November 12, 2008
    “If where our scientists are and how they work is fundamentally changing, doesn’t that fundamentally change how we support them?” (Luce, 2008 – audio | slides)

    A major change to our profession is afoot. Well, more than afoot – the “E-science” ship has sailed and has some major momentum behind it, but are we on board? If you’re one of the librarians still standing on the dock wondering what “E-science” is, you’re not alone. In simple terms, E-science is international, collaborative, technology-driven science that brings together data, research, and people around the world. The Joint Task Force on Library Support for E-Science describes it as an “inter- and multi-disciplinary” enterprise “with significant dependence on computation and computer science;” and as a data-intensive approach to scholarship that is focused on team-based research composed of scholars spread across the globe.

    Some examples of team-based, cross-disciplinary research with people and computers connected within a “grid” of networks across the world: ClimatePrediction.net which leverages the underused computer processor power of home computers to study climate change models; the Southern California Earthquake Center which has over 600 collaborators from Tokyo to Woods Hole, Massachusetts working on ways to understand earthquake behavior in order to minimize the damages of earthquakes; and the Biomedical Informatics Network which has pooled together biomedical researchers and computer scientists from sites spanning the UK and the US to share data and research insights to enhance diagnosis and treatment of diseases.  This is science that rises above place, institution, and even country, science that shatters the boundaries upon which our libraries are traditionally built.

    Stepping back a bit to take in a wider view, an ever broader term, E-research, is defined as “the development of, and the support for, advanced information and computational technologies to enhance all phases of research processes” (Luce, 2008). What this all comes down to is supporting research on the broadest scale, with added layers of depth that include high performance computing, both human and non-human consumers of information, and an utterly complex world of data types and data quantities. Add in the diverse expectations of not only the scientists conducting the work, but also their funding sources, and their network of existing and potential colleagues, and you start to get the picture (see Nature’s Big Data issue (September 3, 2008), for a nice sampling of where things are headed).

    While scholarly communication and open access were the big issues of library conferences a few years ago, expect to see E-science take its place in prominence. As a case in point, I recently attended the ARL/CNI Fall Forum on “Reinventing Science Librarianship.” With E-science as the main spotlight, the conference speakers delved into themes surrounding data curation, transforming libraries to support the needs of researchers, support for virtual organizations, developing cyberinfrastructure, and training for librarians in the E-science landscape (see the Proceedings for more details).

    The theme from this conference that I want to focus on in this post is what elements could be holding our profession back from being able to become major players in the E-science landscape and what elements are going to give us a leg-up in enabling us to become credible, respected participants in shaping the future of E-science/E-research. As a profession, we are at a point where the successes of what we have done traditionally act both as limitations and advantages to our ability to play a major role in E-science (aka, “What one loses on the swings one gains on the roundabouts”) – currently, the balance is weighted more heavily by our limitations. I’ll outline some of the limitations and counterbalance those with the aspects of our profession, that if they become more fully fleshed out, would shift the balance.

    To begin with, E-science is global while libraries (for the most part) are not globally-oriented. In the E-science landscape, our users are no longer identified by institution nor are they even necessarily human – our E-science users are also networks of computers. However, the home institutions of libraries are our comfort zones and we are bound to them in many ways, but most importantly, we need them to be fiscally afloat. How do we break free of the mentality that we can only support our institutional users when, in an E-science landscape, our users cross all kinds of institutional boundaries? Even the licenses that we negotiate and sign reinforce the restrictive behaviors of libraries in terms of defining who and where our user communities are.

    The other major issue is that we are dealing with a very fast event horizon when it comes to E-science. As James Mullins noted at the ARL/CNI Fall Forum, our profession has had over 100 years to develop best practices for managing, organizing, and curating print objects – books, journals, manuscripts, etc. But because of the rampant pace at which researchers are generating data that they need to share, re-use, and preserve “what took us 100 years to do for print, we now have to do in ten years for digital data” (Mullins, 2008). While our profession’s goal for E-science does and will include traditional roles like collecting, storing, organizing, and making information useful, we need to be able to perform these roles with datasets that are diverse and multi-dimensional in the sense that data lends itself to constantly being built upon by students and scholars. We’re going to need to help researchers by connecting datasets with articles, scholars, computer programs, and networks that aren’t necessarily easily identified/pigeonholed into a particular discipline or a single geographic area. Many disciplines are already embarking on their own collaborative research solutions (astrophysics, for example), but lack some of the standards and archival considerations that are distinctive of the library discipline; in essence, they are creating their own virtual research networks because libraries, for the most part, have not yet taken steps to meet these needs.

    Despite this limitation of our profession to jump on board, many are ready to reconceptualize and reposition ours jobs to address the needs of E-science. Because E-science is institution-agnostic, this re-envisioning of the librarian process must involve crossing institutional boundaries, but we are so closely tied to our institutional identities and support structures that this is going to be a major hurdle. We will need to look to unique partnerships so that we can hybridize our organizations with other organizations that will enable us to build expertise and support beyond our institutional boundaries. These kinds of partnerships would need to be positioned to enable the active development of technologies for sharing, managing and curating massive quantities of diverse datasets while growing a workforce of data savvy librarians and information scientists. Partnerships like the Data Intensive Cyber Environments Research group (DICE) with a new arm at the University of North Carolina at Chapel Hill and the San Diego Supercomputing Center + University of California, San Diego partnership are a few great examples of these kinds of partnerships for big scientific research agendas. What about not-so-big research? It has been noted that while “small science” (research not necessarily backed by lots of grant dollars) is most in need of an E-science solution, it is most overlooked in terms of funding and support. Libraries at small schools without research grant support won’t be able to get resources to support E-science even if they’d like to. To this end, library and other academic consortia alongside professional organizations like the ARL are likely going to have to take the lead to make any headway at all. As touched on in the following sections, some potential areas for progress include training for librarians, reconceptualizing the benchmarks for what make our libraries successful, and building relationships with publishers and grant funding agencies that focus on defining standards and best practices for data sharing, re-use and curation.

    The lack of E-science training opportunities for librarians has been brought to the fore as a major limitation. Swan and Brown (2008) offer many recommendations and reflections on “skilling up” for E-science. Not only do MLS/MLIS programs need to develop courses in data curation, data management and data infrastructure, but libraries need staff who are skilled enough to be involved at every stage of data generation, collection, analysis, interpretation, synthesis, preservation, storage, and re-use. Existing librarians will need to take part in practical, hands-on, career-long training for the whole data life cycle. Exemplars include data curation courses such as those offered by the Specialization in Data Curation and the Summer Institute in Data Curation at the University of Illinois. These courses are sought after by both bench scientists as well as librarians. Some have even postulated that the necessity of holding an MLS/MLIS degree is an antiquated notion in this new context. Libraries who are already dabbling at the cutting edge are positioning themselves to get in on the act by creating jobs to support E-science that don’t require an MLS/MLIS. They are turning library services on their head and hiring people who can collaborate with scientists at the lab bench, in the grant proposal process and in the classroom.

    The ways in which we’ve defined our libraries based on our collections and services raise several questions that our institutions will need to come to terms with: What is unique about our research library content and services? Think about things like the published and unpublished output of the researchers at your institution – how is the library showcasing that content to the global community? What percent of our budget resources support unique services? In our drive to be competitive, we find ourselves duplicating collections that are already available at flagship universities while neglecting the truly unique content on our campuses. Libraries could begin to build collections using scholarship generated “‘at the source’—that is, collect, organize, and host data sets generated by researchers at their own institutions. In doing so, libraries have the potential to exert influence over the emerging data sets market rather than waiting for commercial vendors to harvest and package the data for later re-sale” (Davis and Vickery, 2007). Some research communities are already taking the lead on connecting datasets to publications (something libraries have been partially successful at with institutional repositories) – examples include Dryad (a database of evolutionary biology and ecology research articles and datasets) and the Angiosperm Phylogeny Website (a compilation of all known research on the systematics of flowering plants). Why aren’t libraries more fully involved in these efforts?

    There are increasing expectations for scientists to save their research data and document the research process. Beyond being ethically responsible researchers, they are increasingly becoming responsible for complying with federal and institutional regulations, protecting their intellectual property rights, maintaining a record-keeping plan and an audit trail, and managing data files so they can be accessed into the future. Funding sources are increasingly mandating that researchers make their data accessible (e.g., NIH) and more and more publishers require deposit of datasets as a prerequisite for publication. These are very complex issues for anyone to deal with, but many of these are issues that libraries have deep knowledge about. Within the E-science landscape, libraries are going to be expected to evolve to act “as a catalyst for an interdisciplinary community…The role of the library moves from manager of scholarly products to that of participant in the scholarly communication process” (Lougee, 2002).  We have expertise in intellectual property and copyright and we’ve got a healthy respect for openness (Open Data/Open Science) balanced with ownership issues that impact promotion and tenure. We have expertise in standards and in developing and applying metadata in ways that support the management and curation that drive future reuse and repurposing of digital content. Educating researchers on these issues and even stepping in to help manage these issues is an important role for librarians to continue to build upon.

    Information has dimension it can exist in many different contexts and serve many different needs – as library professionals and lifelong students, we have an obligation to recognize and seize opportunities that enhance the dimensionality of information and help information seekers tap into, evaluate and fully exploit this dimensional quality of scholarship. We’ve planted our profession at the nexus of many different disciplines and organizationally we have broad knowledge across all of those disciplines. By making our depth within those disciplines go a little deeper with proper training for librarians, by helping researchers make useful connections across disciplines, by educating and collaborating with researchers on how to cultivate their data in such a way that it can be shared, re-used and preserved over space and time, we can have significant impact in shaping the future of E-science/E-research. Scientists are often hard to pin down and their research process is often hard to isolate into discrete, recognizable stages that librarians can develop relationships with and solutions for, but it’s our responsibility to become relevant within the process. If libraries can pull together, re-envision our roles, and build the sort of support networks required by the international collaborations inherent in E-science, the rewards will exceed all expectations. These opportunities for libraries to be key players in team-based, cross-disciplinary research are opportunities that our profession and the scientific enterprise cannot afford to miss.

  • Nate says:

    Fascinating post. I’m trying to decide what the public library looks like in an e-science landscape, or would you say this is about academia?

  • Hilary Davis says:

    Hi Nate – thanks for your question and for reading the post. My sense is that E-science seems more closely tied to academic libraries right now since that is where those researchers are situated (for the most part), but the tie-in to public libraries and school libraries is at the point where members of the wider community become participants in research (in the sciences, social sciences, humanities, etc.), see the return on investment of their tax dollars in scientific progress that impacts our daily lives, and enhancing educational opportunities.

    Much of scientific research depends on federal grant dollars which are generated from taxpayers. The costs of generating the data that leads to progress are enormous, therefore finding ways to preserve that data and make it accessible to others to learn from and build upon becomes paramount. The public education system is one area that benefits from these kinds of data. Hands-on learning with students and members of the community taking an active role in contributing to and adding to research (“citizen science”) is a great instance where taxpayers can see a return on their investment.

    A couple of examples of where Citizen science has been used to advance research: Galaxy Zoo led by four major research universities is a project where citizens can help classify galaxies; and eBird which is a project initiated by Cornell Univ and the National Audubon Society where anyone can help study changes in how birds are distributed in North and Central America.

    Just search your favorite search engine for “citizen science” for other examples.

  • @Nate: It would practically have to be academic– or at least outside the realm of public libraries. Public libraries by definition serve their communities and need to specialize themselves to those communities’ needs– most of which are not going to include high-level research projects.

  • Nate says:

    @Jenny yep, I hear you. Would be kind of neat if the public pcs could be used for some kind of distributed computing project at night while the public library is closed though…

  • Ellie says:

    @Nate: I love that idea!

  • Nate says:

    @Ellie thanks- probably near impossible in big urban library systems but i bet quite doable in smaller libraries, right?

    @Hilary citizen science… this is fascinating stuff. I just wrote a post connecting a friend at the Exploratorium w/ some public librarians doing interesting programming. I wonder if citizen science programming might be a good way for the Exploratorium to reach the 18-35 demographic. It sounds sensible to me…

  • Here’s a question I ask myself almost daily: “If we started from scratch, but knowing what we know and with the amount of funding and the same workers we already have, would libraries look anything like they do now?”

    The answer is obvious. Personally, I think we would operate far more cooperatively. Every item we catalog would be processed once and we would catalog significantly more material in house, especially serials and data sets. We would collect far more information and make it available with less limiting restrictions.

    The fact that we aren’t presently doing these things inhibits our ability to further E-science, which is a great example of why all libraries need to start asking tough questions.

    What’s clear from Hilary’s article is that we’re trapped by our assumptions about institutional boundaries and money, and by our understanding of the requests we hear from those we serve.

    What people want is accurate information they can readily assimilate, and they’re willing to pay a premium for it, though they prefer to buying in bulk (currently through tax and tuition dollars). The thing is, they don’t care who provides their information, so long as they get it quickly and it’s sufficiently trustworthy.

    As for public libraries and E-science: of course we should support it. For one thing, it’s intrinsically interesting. As librarians we can help our neighbors find out about the studies that may affect their lives or could enrich their understanding of the world. Public libraries could also serve as an ideal gateway for scientists to reach a large, valuable, and diverse group of volunteers. Citizen science is a great example; I think positive deviance is another. Certainly, there are multiple possibilities.

    But that’s a small part of a larger point. The distance between where we are and where we should be is huge. To cross that divide, to support contemporary uses for information, we’re going to have to be willing to change.

  • camila alire says:

    Hilary —
    Enjoyed reading your article. As an academic library administrator, I always wanted us to make it easy for folks to use our resources — particularly “in their jammies” at 2:00a.m. or whenever they needed the material. E-resources were and still are in demand. But with that came the commitment that our librarians were adequately trained to work with those resources and to help folks use those resources. Additionally, these e-resources are not cheap, but I maintain that they do allow us to be more effective and efficient in serving our users.

    E-science appears to be another excellent resource to help us serve those users and not just in academic libraries.

  • Nate says:

    Just ran into a post on the NYTimes bits blog that pointed me at the public data sets Amazon now offers access to.
    Thought it seemed relevant to this article.

