As librarians, we invest a great deal of time and effort instructing researchers on how to use our materials. This is especially true for special collections librarians, as we attempt to familiarize researchers with our unique resources and intricate collection arrangements. At the end of that instruction investment, we often wonder if we have been effective and what our students have truly learned. Have we taught them lasting research skills? If so, how do we illustrate the value of this service to cost-cutting administrators? How do we quantify the skills gained from working with our materials? Most importantly—how do we know if our instruction is making a difference for the researcher?
Last year, we had the opportunity to collaborate with the Association of Research Libraries (ARL) in the production of SPEC Kit 317: Special Collections Engagement. SPEC Kits, produced annually, survey the 124 ARL member institutions and collect data on current practices and policies of libraries.
We surveyed member institutions about the ways special collections are engaging students, faculty, and researchers through exhibits, events, and curricular involvement, and found that over 95% of respondents are involved in these activities (Berenbak et al., 2010, 16). A core component of many of these outreach efforts was instructional engagement in the use of special collections materials.
As we began the work of analyzing the survey results, a recurrent theme surfaced: the inconsistency of instructional engagement assessment. We began to ask ourselves questions about the concepts of evaluation and assessment of instruction, and how those terms are articulated and understood in the context of special collections. For example, when conducting a one-time instruction session, should evaluation focus on the librarian’s presentation skills, the use of archival collections by participants after a session, or the number of participating students or classes?
Although special collections are attempting to assess their instruction in a variety of ways, these efforts are not consistent, not standardized, and often not driven by a “need for information that fosters targeted change” (Ariew, 2007, 508). Many special collections would like to move assessment beyond use counts and anecdotal feedback, but the majority of ARL special collections have no plan or policy for outreach or engagement, and few have dedicated outreach staff (Berenbak et al, 2010). Under these circumstances, how do special collections conceptualize what success looks like, or what measurements will convey when success has been achieved?
What IS being assessed?
Currently, most research libraries contribute annual statistics to government agencies and organizations such as the Association of Research Libraries (ARL). These statistics include the size of each library’s collections, circulation, staff. Additionally, ARL asks libraries to describe instructional engagement efforts, reporting on the number of presentations that are given to groups, the number of participants in those groups, and the number of reference transactions. ARL provides the following definitions for its categories:
Presentations to Groups. Report the total number of sessions during the year of presentations made as part of formal bibliographic instruction programs and through other planned class presentations, orientation sessions, and tours . . . Presentations to groups may be for either bibliographic instruction, cultural, recreational, or educational purposes . . . the purpose of this question is to capture information about the services the library provides for its clientele. (Kyrillidou, & Bland, 2009, 100, emphasis added)
Participants in Group Presentations. Report the total number of participants in the presentations. For multi-session classes with a constant enrollment, count each person only once. Personal, one-to-one instruction in the use of sources should be counted as reference transactions (Kyrillidou, & Bland, 2009, 100).
Reference Transactions. A reference transaction is an information contact that involves the knowledge, use, recommendations, interpretation, or instruction in the use of one or more information sources by a member of the library staff. The term includes information and referral service. Information sources include (a) printed and nonprinted material; (b) machine-readable databases (including computer-assisted instruction); (c) the library’s own catalogs and other holdings records; (d) other libraries and institutions through communication or referral; and (e) persons both inside and outside the library . . . . (Kyrillidou, & Bland, 2009, 100).
Special collections departments are asked to contribute their numbers to their library’s general pool; ARL does not differentiate between general library instruction and the instructional efforts of special collections departments, a practice that makes the compiled statistics less useful for both ARL and the responding institutions.
It is clear from the results of our SPEC Kit findings that few institutions are doing any assessment of instructional engagement beyond what is required by ARL. Most responding institutions do not have formal evaluative measures.  Instead, these institutions tend to rely heavily on feedback and conversations with students, faculty, and researchers (Berenbak et al, 2010, 78, 79,91,92).  Special collections tend to either quantify the usefulness of their instruction when patrons mention they “learned” or “got something” from the instruction, or when they count how many items were checked out to patrons. And while counting items is arguably important for certain kinds of assessment, without measuring against a desired and stated outcome, what does a number like this really tell a special collections about its practices?
We know that very few special collections departments have any sort of formalized planning or policies guiding their instructional programming (Berenbak et al, 2010, 15). Different circumstances in each special collections contribute to this situation. In some cases, staff are short on the time and energy to devote to this activity (or, more commonly, staff tasked with this activity are a luxury most special collections cannot afford). In others, the responsibility of instruction is delegated at the time of need to the staff person whose background most closely aligns with the subject area of the instruction, limiting the consistency of the instruction. Sometimes the institution simply has not considered or not yet formally developed a plan for instructional engagement that fits into the overall activities of that special collections.
Whatever the circumstances, the results of our survey showed that most (80%) special collections are engaging in instructional sessions on a steady basis, and will likely continue to do so in the future — perhaps at an even greater frequency than their current rates (Berenbak et. al, 2010, 13). If special collections are going to direct more focused efforts at planning their instructional engagement, they will need articulated and useful assessment metrics. After all, we cannot know if our engagement planning is a worthwhile investment if we are not assessing the outcomes of that engagement.
Though we recognize a need for better assessment, we are struggling to respond to this need. Determining which metrics will provide useful information about instruction is a conundrum that is keeping many special collections frustrated or hesitant to try assessment at all. A few institutions provide evidence that assessment is not daunting for everyone — one special collections, for example, looks for citations of materials from their holdings in student papers as an indication of the success of their instruction; some look for any citations of primary source materials; and some have undertaken short surveys and faculty interviews.  But by and large, most special collections seem uncertain as to what to collect or how to collect it.
What are we teaching?
When we do an instruction session with patrons in special collections, what are our objectives? Aware that specific objectives will vary from one session to another — informed by the needs, topics, or other parameters that may frame a session — there are still general objectives that we, as instructors, are always hoping to meet. Helping patrons find exactly what they need is possibly the most successful outcome we can achieve, but the many steps along the path to discovery are the components of instruction that perhaps most need to be measured in order to gauge the effectiveness of our instruction. Before patrons can find exactly what they are looking for, they first have to learn how to find it. From our perspective as instructors, a successful journey is more indicative of our instructional impact than arrival at the destination.
Why is the journey so important in special collections? Elizabeth Yakel, in her article “Listening to Users,” describes archives as a tabula rasa for researchers (Yakel, 2002, 122). She makes the important point that, unlike libraries where the “paradigm for assistance, access tools, and rules” has been learned by users from childhood at their public and school libraries, archives are considered a great unknown (Yakel, 2002, 122). The intricacies of the different rules, different materials, and different access tools often stump even the most experienced library user or researcher. Some archivists have correctly compared a successful special collections instruction session to an “archaeological dig” (Schmiesing & Hollis, 2002). Since the majority of special collections materials are not reflected on an item-by-item basis in either the library catalog or a finding aid, researchers must “dig” through boxes of materials, digital images, or artifacts. Because of the nature of this type of research, and because materials are not individually pre-selected for consumption, users must constantly reformulate their queries as they discover new materials. This often necessitates close collaboration with the special collections staff throughout the research process.
Unfortunately, this type of instruction is not accurately reflected in our measurements. Certainly limited head counts and use statistics do not paint an accurate picture of this work, nor do brief reactionary evaluations.  These evaluations are important and necessary, especially when reporting to organizations outside the library, but they fail to assess whether or not learning objectives are being met.
What’s out there now?
Academic libraries recognize that the reactionary evaluation of instruction often falls short, and have developed tools to help libraries make sure students and users are meeting learning objectives. These include guidelines such as ACRL’s “Information Literacy Competency Standards for Higher Education,” “Standards for Proficiencies for Instruction Librarians and Coordinators: A Practical Guide,” and skills tests such as Project SAILS. These guidelines give a framework for conducting meaningful evaluation for instruction librarians. And although there is no shortage of literature to be found on the subject of library instruction and assessment, we are only beginning to see similar literature and tools dealing with evaluating instruction in archives and special collections. A good example of this emerging interest can be found in Michelle McCoy’s article, “The Manuscript as Question: Teaching Primary Sources in the Archives – The China Missions Project.” McCoy details methods for the planning, instruction, and innovative assessment of a collaborative effort between the special collections and archives department at DePaul University and Professor Warren Schultz’s undergraduate History 199 Historical Concepts and Methods class.
Arguably the most important current project appearing in the assessment literature for archives and special collections is the Mellon-funded Archival Metrics Project, which includes models for assessing instruction (discussed at length below). In addition to the products themselves, Archival Metrics investigators have produced papers detailing initial studies for the project such as Duff and Cherry’s “Archival orientation for undergraduate students: An exploratory study of impact.“
Although we appear to be making progress, current assessment practices of efforts to instruct patrons on the use of special collections resources — both the materials themselves and the many discovery tools we’ve created (finding aids, databases, and subject guides) — would probably not receive a passing grade. Measuring and quantifying the journey is a daunting task.
While we should not stop collecting the statistics that are needed by ARL, the general library community — and especially special collections — should have a clear understanding of what these numbers actually represent. Any instruction or reference librarian will tell you that a headcount for their curricular sessions or a tally mark for a reference transaction does not adequately measure what they do or the instruction they provide. Especially when tally sheets obscure the difference between a quick question lookup and an hour-long research consultation at the desk.
We face a number of difficulties in achieving the goal of both establishing and collecting useful assessment metrics. In addition to a lack of policies or plans regarding curricular outreach and engagement, special collections often do not have positions designated to conduct instructional outreach. As discussed earlier, these duties often fall to the person in the department with the greatest subject knowledge, or the most available time. It will be difficult to take on additional duties — especially when there are no easy answers and many special collections are short on staff and funding — but we offer some suggestions for ways that special collections might start.
Assessing the Journey
First, we must share. Some special collections are reaching students and evaluating their work with them in innovative ways, and the success of these efforts needs to be promoted.
Some of these innovations include:
- Making early contact with graduate student instructors so that they have experience working with special collections before they enter faculty positions;
- Working with subject librarians to incorporate relevant material into their teaching efforts;
- Giving awards to undergraduate research projects that make extensive use of the collections;
- Working with students to create virtual and physical exhibits highlighting materials used in special collections.
Assessment examples include:
- Monitoring use statistics of particular collections after an instruction session;
- Asking classes to donate copies of student papers to review the citations as a tool for better understanding the effectiveness of instruction;
- Using student focus groups to evaluate video tutorials;
- Monitoring books and articles published, performances given, and theses written;
- Tracking number and value of grants received;
- Examining web server statistics;
- Feedback forms and surveys;
- Monitoring number of graduate and practicum students using the collections;
- Soliciting and compiling one-on-one feedback from professors and students. 
The assessment practices that generate the most useful results are multipronged in their approach. The China Missions Project, for example, was organized in such a way that the students included a self-assessment of their experience using archival materials and research methods as part of their class research papers (McCoy, 2010, 55). Copies of these papers were deposited with the University Archives, and then staff conducted a qualitative survey of the papers to assess their responses. Recognizing that self-assessment in a graded paper might encourage students to write positive responses regardless of actual understanding, staff further scrutinized the papers’ citations. “Students who used a total of four citations or fewer or relied heavily on Wikipedia or other Web sources whose reliability cannot be verified were moved to a neutral position and not included in the positive total” (McCoy, 2010, 55). This approach—as well as other methods listed above—have plusses and minuses, but becoming aware of what other special collections are trying gives the rest of us a jumping off point.
A variety of special collections have noted their relationships with outreach and subject liaison librarians. Developing these close relationships can be beneficial for everyone involved. Understanding the holdings in a special collections, and illustrating how those materials might be incorporated into the curriculum, creates a great opportunity for instructing students in the value of primary sources. Drawing on the skills and backgrounds of subject specialists and instruction librarians can help special collections staff (often untrained in these areas) to develop sound instruction techniques.
Additionally, our colleagues in outreach and instruction have done an extraordinary amount of work related to best practices for evaluating instruction. In a 2007 article, Ariew and Lener state that one of the main insights gained in their study “Evaluating instruction: developing a program that supports the teaching librarian” was that teaching evaluation forms should be “tailored to specific classes, objectives and learning outcomes.” Most importantly, the group learned that “effective assessment requires a variety of assessment procedures be used” (Ariew, & Lener, 2007, 512). From teaching portfolios to 3-2-1 cards to surveys, the literature yields a great deal of information about what works and what doesn’t for each type of instruction. Although not all of these practices can be used to evaluate special collections instructional engagement practices, they provide guideposts to start from.
Fortunately, some people are starting to address the problem of how to assess the engagement work being done by special collections departments. The Archival Metrics Toolkits, for example, attempt to standardize evaluation in archives. This work recognizes that the “administration and use of primary sources are sufficiently different from libraries that they deserve tools that appropriately measure service to users” (Yakel, & Tibbo, 2010, 221). This creation of a standardized survey tool for archives could relieve a large part of the assessment burden, which is particularly important for archives with small staff. It also begins to answer the call for standardized evaluation that was so apparent in the results of our SPEC Kit survey.
The Archival Metrics Toolkit is particularly useful in laying out a set of standard questions about archives use, and they provide clear instructions on how to gather, compile, and analyze the data from the surveys. This information provides a basis for making comparisons across institutions, and could give special collections a better chance of identifying best practices and trends.
However, even the best of surveys have drawbacks such as rate of completion (particularly difficult in archives due to small numbers), survey fatigue, and a focus on perceptions. Supplementing surveys by seeking evidence of skills mastered, such as citation analysis or testing, seems a more well rounded method to determining “what students have learned as opposed to how they feel about what they have learned” (Barclay, 1993, 198).
Special collections must clearly state engagement goals in order for any type of evaluation to be meaningful. Good practice in evaluating instructional engagement starts “with the learning objectives of the instructor” (or the department), and uses those to shape the tools being applied for evaluation (Areiw, & Lener, 2007, 512). As the libraries at Virginia Tech discovered, evaluation, when possible, should be unique to specific classes and desired student and faculty outcomes and will likely require that a variety of assessment procedures be used (Ariew, 2007, 512).
Today more than ever, library administrators are being asked to describe in a quantifiable way the value of their academic libraries and their practices. Therefore, special collections must be able to articulate to administrators why current evaluation methods are insufficient. Simple forms, tally marks, and baseline ARL statistics will never be able to get at the information we need to improve our practices. Specials collections need to make the case for developing more appropriate evaluation methods — even though this will require a commitment of valuable staff resources — and then make the commitment to using the results of these evaluations to enhance services. Ultimately, more meaningful data will help us provide better service to the students, faculty, and researchers who rely on special collections, and it will better equip us to tell their story and our own.
Huge thanks to our editors and advisors: Kathy Brown, Hyun-Duck Chung and Brett Bonfield. Your thoughtful comments have made this a much better post,` and sparked ideas for future avenues of exploration. And of course, thank you so much to all of our SPEC Kit co-authors, Adam Berenbak, Claire Ruswick, Danica Cullinan and Judy Allen-Dodson.
 Adam Berenbak et. al, Special Collections Engagement SPEC Kit 317, (Washington D.C.: Association of Research Libraries), p. 14-16. Of respondents to SPEC Survey 317, 87% of have no formal plan or policy for outreach and engagement (p. 14) and approximately half of the institutions cite their primary engagement barrier as insufficient staffing, in particular “lack of dedicated outreach staff” (p. 15). Also most institutions “rely on patron or item counts and anecdotal feedback to assess the effectiveness of their outreach” (p. 16). At the same time, many special collections “clearly expressed a desire to move beyond this to a more systematic approach” (p. 16).
 The ARL states that these data “describe collections, staffing, expenditures, and service activities” of the 114 university libraries and 10 public, governmental, and nonprofit research libraries that collectively form ARL (Association of Research Libraries, 2008).
 This is not to imply that no one is attempting to assess instruction, but it is not standardized, and based on the survey responses, in general, it is fairly ad-hoc (Berenbak et al., 2010, 78, 79, 90, 91).
 Question 35 in the ARL Spec Kit did not specifically ask about the evaluation of instructional engagement but more broadly inquired, “What measure(s) have been used to evaluate special collections engagement with faculty/scholars/researchers who are affiliated with your institutions.” Many of the responses were similar to the more directed question 28 “What measure(s) are used to evaluate student use of unique materials in research projects.” The following types of statements made up the bulk of the responses: “no evaluation,” “much to few [sic],” “no particular measures have been used,” “nothing systematic,” “little evaluation has been done,” and “none to date.”
 The number of respondents actively working to engage students for curricular purposes is even higher at 99% (Berenbak et. al, 2010, 62).
 These examples are taken from the responses to the question “What measure(s) are used to evaluate student use of unique materials in research projects?” Responses include examining the “extent and breadth of primary resources and collections in any format,” a “learning outcomes survey,” and “discussion with faculty of results” (Berenbak et. al., 2010, 78, 79 and 80).
 Reactionary refers to a short survey after a presentation that often focuses on a students’ perception of the presentation rather than on whether or not new skills have been developed.
 When respondents were asked who had primary responsibility for coordinating curricular engagement, 15% had one individual who held primary responsibility, 15% said one individual leads a team or staff, 31% stated that all (or most) special collections staff shared the responsibility, and 39% noted that it varied depending on the project (Berenbak et al, 2010, 64).
 These examples are drawn from the responses to questions 28 and 35 of the ARL SPEC Kit 317 “What measure(s) are used to evaluate student use of unique materials in research projects” and “what measure(s) have been used to evaluate special collections engagement with faculty/scholars/researchers who are affiliated with your institution”(Berenbak et. al, 2010 78, 79, 80, 91). The respondent’s institutions are kept anonymous in SPEC Kit publications, so although these are specific examples, we are unable to point out specific schools for the purposes of this post.
The toolkit includes sections for “Researchers” (A user-based evaluation tool for on-site researchers to evaluate the quality of services, facilities, and finding aids in university archives and special collections), “Online Finding Aids” (A user-based evaluation tool for visitors to evaluate the quality and usability of online finding aids in university archives and special collections), “Websites” (A user-based evaluation tool for visitors to evaluate the quality and usability of websites in university archives and special collections), ”Student Researchers”(A user based evaluation tool for students use the archives or special collections as part of a class and participate in archival orientations), and a “Teaching Support” section (A user-based evaluation tool for instructors who have used the university archives and special collections to evaluate its services.)
 Small numbers can make it difficult to obtain an appropriate sample size.
Ariew, S., & Lener, E. (2007). Evaluating instruction: developing a program that supports the teaching librarian. Research Strategies, 20. Retrieved from http://www.sciencedirect.com/science/article/B6W60-4MWXT97-2/2/e3d2a22ec51f17a15bc53a77240d49e7 doi: 10.1016/j.resstr.2006.12.020
Association Of Research Libraries, (2010). Association of Research Libraries: SPEC Kits. Retrieved Sep. 15, 2010, from http://www.arl.org/resources/pubs/spec/index.shtml.
Association of Research Libraries. (2008, February 4). Association of research libraries: annual surveys. Retrieved from http://www.arl.org/stats/annualsurveys/index.shtml
Barclay, D (1993). Evaluating library instruction: Doing the best you can with what you have, RQ 33 (2), pp. 195–202.
Berenbak, Adam, Putirskis, Cate, O’Gara, Genya, Ruswick, Claire, Cullinan, Danica, Dodson, Judy Allen, Walters, Emily, & Brown, Kathy (2010). Spec kit 317 special collections engagement. Washington, DC: Association of Research Libraries.
Knight, L. (2002). The Role of assessment in library user education. Reference Services Review, 30(1), Retrieved from http://www.emeraldinsight.com/journals.htm?articleid=861677&show=html
Kyrillidou, Marth, & Bland, Les. (2009). Arl statistics 2007-2008 Washington, DC: Association of Research Libraries. Retrieved from http://www.arl.org/stats/annualsurveys/arlstats/arlstats08.shtml
McCoy, M. (2010). The Manuscript as question: teaching primary sources in the archives – the china missions project. College and Research Libraries, 71(1), Retrieved from http://crl.acrl.org/content/71/1/49.full.pdf+html
Schmiesing , Ann, & Hollis, Deborah. (2002). The Role of special collections departments in humanities undergraduate and graduate teaching: a case study Libraries and the Academy, 2(3), Retrieved from http://muse.jhu.edu/journals/portal_libraries_and_the_academy/v002/2.3schmiesing.html
Yakel, E. (2002). Listening to users. Archival Issues, 26(2), 111-127.
Yakel, E., & Tibbo, H. (2010). Standardized survey tools for assessment in archives and special collections. Performance measurements and metrics, 11(2), Retrieved from http://www.emeraldinsight.com/journals.htm?articleid=1871188&show=abstract doi: 10.1108/14678041011064115