Not just another pretty picture
I’m a slave to spreadsheets. Trying to decide between a stacked column bar chart and a 3-D area chart is par for the course in my work. Microsoft Excel© is great for many practical needs, but it doesn’t always support the need to create simple, compelling and interactive graphical data visualizations that are critical for libraries to best express value, communicate trends, and test assumptions about library services and collections. Data visualization is the study of strategies and methods for conveying information, as captured by data, in an efficient, functional way that leads to insights about a process or system. Good data visualization can drive home a point quickly and have lingering impact. Data visualizations can help you see something that you hadn’t noticed before. These days, libraries can’t afford to not be wise and impactful with the data that is collected and conveyed about patrons, services and collections. Many libraries are reporting declines in reference desk queries against the backdrop of massive surges in use of computers and other tech-related services. Most libraries are undergoing comprehensive reviews of journal and database usage (among other metrics) with the aim to cut collections to comply with shrinking budgets. To express these kinds of trends, to seek support, or to simply try to assess library collections and services, many libraries fall back on the use of tables with a few pie charts and bar graphs thrown in for added measure. When I started having conversations with my library colleagues about data visualization tools and techniques, I was humbled by what I didn’t know and embarrassed that I hadn’t heard about, much less tested, some of the data visualization tools that are surfacing. So, I decided to start exploring what I’ve been missing while hiding behind the ubiquity of Microsoft Excel© graphs and charts. In this post, I present some examples using a few popular data visualization tools and I give an overview of some inspirational guides for creating compelling data graphics that may help you better express your own library metrics. First, let’s explore a little further why data visualization matters for libraries.
Library data in context
Libraries serve users at the reference desk, circulation desk, and special collections centers. Library staff engage with constituents through committees and working groups, at the library security gate, and through online chat. Librarians attempt to expose valuable services and collections via library catalogs, carefully-crafted subject guides, during bibliographic instruction sessions, and via long lists of databases and online journals. Libraries assess usage and patrons needs via web statistics, gate counts, circulation transactions, LibQual surveys, usage statistics, and feedback forums. Why do we measure these experiences? To show value for money or time and to understand the uptake of our collections and services. Library value has been a popular topic since at least the 1930s and libraries have gotten better at showing return on investment (ROI). We’re not completely there yet, as the recent $1,000,000 IMLS sponsorship of “Lib-value” grant suggests. Libraries are pretty adept at measuring lots of different kinds of interactions, so how can we be so bad at demonstrating our worth and making our point? What if part of our problem in demonstrating value lies in how we attempt to showcase library value? Libraries also want to make good, sound decisions in the context of their user communities. Libraries collect a lot of data that encompass complex networks about how users navigate through online resources, which subjects circulate the most or the least, which resources are requested via interlibrary loan, visitation patterns over periods of time, reference queries, and usage statistics of online journals and databases. Making sense of these complex networks of use and need isn’t easy. But the relationships between use and need patterns can help libraries make hard decisions (say, about which journals to cut) and creative decisions to improve user experiences, outreach, achieve efficiencies, and enhance alignment with organizational goals.
Not another library ROI article, please!
Relax, this isn’t another post about how calculate library ROI nor is it about how to collect data that show library worth. This post is an exploration of visualization techniques that can help libraries make a compelling case to stakeholders and get insight about how data visualization can help libraries make more informed decisions. Disclaimers: I’m not an expert on visualization techniques; I’m part of the slew of librarians who need to know how to better illustrate what we do and learn how to better allocate resources. Visualization strategies have made their debut at library conferences already (e.g., 2009 Computers in Libraries; 2008 Computers in Libraries; 2009 NASIG Conference). However, I haven’t seen a groundswell of examples indicating that libraries have taken these strategies and these conference presentations to heart. What I have experienced is a few really good ideas popping up in conversations with colleagues about how to make the case for libraries in simple, compelling, visual ways. I want to share what I’ve learned so far in my exploration and open the door to some more good ideas.
Data needs to be “humanized”
During various conversations about how to represent library collections and expenditures data, one of my very smart colleagues, Cory Lown, introduced me to the work of John Tukey and Edward Tufte. Cory explained that Tufte’s aim is to encourage the use of as much data as possible (“to clarify, add detail”) and to use visualization techniques that “fit” the data.
“Often bar charts and pie charts (which tend to have low data to ink density) obscure more nuanced and interesting data. It’s not just about new and interesting tools, but matching the data to the right visualization so we can make use of data we have.” (Lown, 2009, pers. comm.)
This isn’t a trivial process by any means due to the uniqueness of each set of data due to variation in methods for collection, data clean-up, analysis and so on. But, according to Tufte’s principles, focusing on giving as much attention to the data in a chart, graph or image (aka “maximizing the data-ink ratio“) while reducing the “fluff” (aka “chartjunk“) (e.g., chart borders, text legends, background fill, decorations) can aid in getting the point across.
In the spirit of the work of Tukey and Tufte, a recent book, aptly named Beautiful Data (2009, edited by Toby Segaran and Jeff Hammerbacher) brings together a great compilation of data visualization, data handling and data sense-making strategies. In one chapter, Nathan Yau, also author of a terrific blog called FlowingData (to which I’ll refer a little later in this post), describes the development of a simple, user-friendly tool to track and measure what he calls “personal data” (e.g., eating, sleeping, travel habits). Yau is interested in creating tools for people to distill their personal data into stories that can help them understand patterns about their personal habits and eventually help relate people to the bigger picture about their impact on their environment and vice versa. This concept of creating a way for a person to relate to the bigger picture through data is an important lesson for libraries.
“Data has to be presented in a way that is relate-able; it has to be humanized. Oftentimes we get caught up in statistical charts and graphs, which are extremely useful, but at the same time we want to engage users so that they stay interested. . .Users should understand that the data is about them and reflect the choices they make in their daily lives.” (Yau, 2009)
All of those interactions with patrons that libraries collect and track – circulations, journal usage statistics, cost/use metrics, etc. – are about the patron. However, most of the metrics that libraries present to make the case to patrons, aren’t presented in a way that relates the patron to the data. An example: Academic libraries spend a lot of money on journals. In fact, the NCSU Libraries spent around $6 million on journals during 2008-2009, but how many of our patrons know that when they download a journal article that it’s paid for by the NCSU Libraries? That $6 million dollars doesn’t necessarily “translate” to a user when they download an article. We tell them how many articles were downloaded, but is there a better way to make the connection between the user and the cost of resources? For the most part, library metrics aren’t good at telling stories that keep our users interested and help them inform the choices that they make. We are in need of some great ideas and examples from the field.
Simple data visualization tools
While several other visualization tools exist, I want to focus on three of the most popular tools and demonstrate what is possible using a few datasets that I’ve created using the kind of library metrics that you might be dealing with in your own library. After trying a few different types of library metric datasets in Google Gadgets for spreadsheets, ManyEyes and Swivel, my favorites are Google Gadgets and ManyEyes because of their ease of use and diversity of visualization styles.
Google Gadgets: First, I have to give props to Cory Lown for making me aware of Google Gadgets for spreadsheets. It’s really quite simple to use. If you have a Google account (e.g., Gmail), then you can use Google Gadgets. Log into your Google account, choose Google Documents, Create New Spreadsheet, then add your data (just as you would in Excel). Once your data is ready, go to the Insert menu and choose Gadget. As of the time of this writing, there are over 35 different visualizations you can choose from: everything from the standard bar charts to motion graphs to piles of money. The upside: You can experiment with the different visualizations and pick one that fits the point that you’re trying to make or the audience that you’re trying to reach. You can share your visualizations with a simple URL that you plug into an email, or into your website or blog. The downside: You don’t have a lot of control over font size or positioning of elements on the charts.
Google Gadget Motion Charts are excellent for showing change in values over time. They are the primary visualization mode for sites like GapMinder to illustrate changes in global issues over time. Below is an example using data that I collected from the data from the Association of Research Libraries on research library expenditures plotted against university expenditures spanning from 1982 through 2006. Try the motion chart with the default variables, then try changing them. You’re welcome to access the dataset itself to create your own data visualization.
ManyEyes: As part of the agreement to use ManyEyes with your own data, any data you upload is made publicly and freely available for others to use. After signing up with an email address, the process is easy and straightforward. You can be up and running with several visualizations of your data within a few minutes and you can share your visualizations with links in emails, or embed them in your website or blog. The upside: the choice of visualizations is pretty extensive: Word Tree, Phrase Net, Wordle, Tag Cloud, Bar Chart, Block Histogram, Bubble Chart, Network Diagram, Scatterplot, Matrix Chart, Treemap for Comparisons, Treemap, Pie Chart, Country Map, US County Map, World Map, Stock Graph, Line Graph, Stack Graph. The downside: if you want to compare more than two variables, you have limited options. The example that I’ve included here is data that I collected on the publication and citation patterns of NCSU scholars. Researchers at an academic university will almost always have more citation activity than publication activity in a journal. But just how much more? This visualization illustrates the scale of citations for journals in which NCSU scholars publish 0 times, 2 times, 3 times, on up to 41 times. Try the visualization below and experiment with the dataset to create other ManyEyes visualizations.
Swivel: With Swivel, you have a choice to let the data that you upload be freely available to others or to keep your data private. If you choose to keep your data private, be prepared to commit to a fee of $12/month. For most of us who use Excel to prepare data for upload into a tool like Swivel, an Excel toolbar is available from the Swivel Confectionary. The upside: You have a little more control over things like font size and font face (compared to Google Gadgets for spreadsheets); it’s just as easy to share data and visualizations (email or embedding in websites or blogs); and if you want your audience to be able to interact with your charts, Swivel makes that a trivial process. The downside: The choice of graphs is limited (Bar, Line, Area, Stacked Bar, Stacked Area, Scatter, and Pie) and the site isn’t very responsive with larger sets of data (e.g., I tested it with a dataset of over 1900 rows and it had trouble switching between different types of graphs). In this example, I’ve uploaded a small dataset of usage of the major types of digital collections provided by the NCSU Libraries. Try interacting with the pie chart and download the dataset if you want to use it to experiment with your own visualizations (you’ll need to create an account before you can do much with the data in Swivel).
There are some excellent resources that help provide some insight into what is considered good and bad data visualization practices. These sites are filled with examples of interesting data visualizations to inspire your own work and in some cases (e.g., GapMinder) also offer datasets with which to experiment.
FlowingData: The FlowingData blog is one of the most compelling, idea-filled blogs I’ve come across – ever. Authored by Nathan Yau (UCLA PHD student in statistics focusing on data visualization), this blog highlights great examples of how to make a compelling point with data and visual creativity. FlowingData offers a great deal, but I want to point out 5 specific
- Statistical Visualization – strategies for visualizing different types of statistics
- Infographics – examples of aesthetically pleasing and intellectually captivating modes for presenting data in graphical format
- Mapping – examples of data mapped to geographic representations
- Artistic Visualization – examples of data as art
- Network Visualization – examples of visualizations showing networks or relationships between entities
Not only does Yau collocate examples of how to display data to different audiences, but he also provides thoughtful analysis about why a visualization is effective (or not) and what could be improved about it.
Infosthetics: Authored by Andrew Vande Moere (faculty member of Architecture, Design and Planning at the University of Sydney in Australia), Infosthetics acts much like the FlowingData blog, but tends to focus more on data as art. There’s overlap between Infosthetics and FlowingData, but you’ll find a slightly different perspective in Infosthetics – one that deals with data visualization from the design and interaction approach.
Visual Complexity: Manuel Lima uses the Visual Complexity blog to bring together examples and ideas around the study of the visualization of complex networks such as data from library systems, the social web, biological systems, and transportation patterns.His aim is to analyze methods for conveying the adage, “the whole is always more than the sum of its parts.” Currently a Senior User Experience Designer at Nokia’s NextGen Software & Services, Lima provides an industry perspective on the utility of networks to display information.
GapMinder: GapMinder is an organization that runs a website for displaying trends in global issues such as poverty plotted against inequality indices or oil consumption plotted against oil production. Its main visualizations are based on Google Motion Charts, and have been featured in the famous TED Talks.
Information Dashboards: Information dashboards are user interfaces that serve the need of providing critical information at a glance. A book aptly named Information Dashboard Design (2006, by Stephen Few) promises to teach readers how to use graphs discriminately to enhance communication. Some excellent examples of information dashboards that might fit in library contexts are the Indianapolis Museum of Art (IMA) Dashboard (thanks to Adrienne Lai for sharing this site with me) and the Sprint Now Dashboard.
The IMA Dashboard presents simple, compelling data in a graphically aesthetic way. It tells a visitor things like how many plants are in the gardens, how many visitors are at the IMA, how much energy is being consumed by the IMA, and the number of active memberships. Each widget window leads to a little more information about the IMA, drawing the visitor in to learn more without overwhelming him/her with too many options or underwhelming with too few avenues to explore. The Sprint Now Dashboard, on the other hand, creates a slightly different experience. There’s a lot going on that isn’t necessarily relevant here – from the creepy voice-over to the number of eggs being produced or the number of people stuck in elevators – but the concept of surfacing this kind of real-time information is compelling.
The possibilities in libraries for these kinds of information dashboards are obvious. An external audience might find it helpful to know which books are being checked out (similar to Seattle Public Library display of circulating materials), real-time locations of available computers, how many journal articles are being downloaded, how many e-books are being read, the number of devices (e.g., laptops, ipods, Kindles) that are checked out, the latest articles by your campus researchers, upcoming community events, and maybe even an ROI metric on the value of library services and collections per tuition dollar (or tax dollar) per hour. Tack on a catalog search box, real-time webcam views of the coffee shop wait line and the Info Commons, and you’ve got a mode for making a case for the value of library services and collections while providing real-time information all in one view.
An internal audience of library staff and decision makers might find it helpful to see in a dashboard view the “health” of the library budget, cost/use metrics based on circulation data or electronic journal or e-book usage statistics, hourly gate counts, keywords searched in the catalog, cataloging activity, and a current snapshot of the composition and use of the collection broken down by format or by material type plotted against community demographics (e.g., number of full-text journal downloads per graduate student in the Chemical Engineering Department) among other things. Other than the Seattle Public Library, I am not aware of any libraries presenting this kind of (more or less) real-time, dynamic information dashboard to the public, but I suspect that any data displayed for public consumption would require that personally-identifying information be excluded.
The ultimate goal of libraries is to help patrons make smart decisions about the information they use and create. As an extension of that goal, Jason Casden, one of the reviewers of this article noted that data visualization techniques should be adopted to be part of a library’s organizational culture for assessment and justification to not only best serve patrons, but also to help guide the allocation of limited resources. Investing in ways to leverage the data that libraries collect to show value, communicate trends, and test assumptions about library services and collections is part of the solution for making the library be all about the patron. Try out some of the visualization tools and sample datasets used in this post or share your own data visualization creations via the Comments.
- Few, Stephen. 2006. Information Dashboard Design: the effective visual communication of data.
- Fichter, Darlene. 2008. “Data Visualizations.” Presented at Computers in Libraries 2008 Conference.
- Fichter, Darlene and Jeff Wisniewski. “Harnessing New Data Visualization Tools: Say It Visually.” Presentation at Computers in Libraries 2009 Conference
- Kurt, Lisa and Will Kurt. 2009. “Making Usage Data Understandable with Visual Representation.” Presented at the North American Serials Interest Group 2009 Conference.
- Legrady, George. 2005. “Making Visible the Invisible: Seattle Library Data Flow Visualization.” Presented at International Cultural Heritage Meeting 2005.
- Seattle Public Library Data Visualization: “Making the Visible Invisible.”
- Segaran, Toby and Jeff Hammerbacher. 2009. Beautiful Data: the stories behind elegant data solutions.
- Tenopir, Carol. 2009. “Value, Outcomes, and Return on Investment of Academic Libraries (Lib-Value).” IMLS Awards 3-Year Grant.
- Tufte, Edward. Statistician and author of data visualization books such as “Beautiful Evidence,” “Envisioning Information,” and “Visual Display of Quantitative Information.”
- Tukey, John. Statistician and author of Exploratory Data Analysis (1977).
Thanks to the people who’ve opened my eyes to the possibilities and who reviewed this post and offered valuable feedback: Cory Lown, Jason Casden, Brett Bonfield, and Kim Leeder.
You might also be interested in:
- Marketing Search: An Interview with Pete Bell of Endeca and Gabriel Weinberg of DuckDuckGo
- Randall Munroe’s What If as a Test Case for Open Access in Popular Culture
- Articulating Value in Special Collections: Are We Collecting Data that Matter?
- Getting to Know You… even better
- Charles A. Cutter and Edward Tufte: Coming to a Library Near You, via BIBFRAME