CENDI PRINCIPALS AND ALTERNATES MEETING
National Agricultural Library
June 6, 2006
Strategic Challenges and Opportunities
Dr. Walter Warnick, CENDI Chair, opened the meeting at 9:00 am. He thanked NAL for hosting the meeting.
Strategic Challenges and Opportunities
Dr. Lorcan Dempsey, Vice President and Chief Strategists, OCLC
When thinking about libraries in a networked age, it is helpful to look at the total landscape of resources and services. One approach is to consider the e-portfolios being developed for managing the variety of digital assets a student collects over the course of his career – a personal learning landscape. This includes the social, personal and network identities that students create. A student may have a blogging environment, be registered for an online journal service; have several RSS feeds; participate in a social network (FOAF); use a variety of search engines, including specialized systems such as Flickr and delicious; and contribute to personal and institutional repositories and content management systems. Dr. Dempsey presented a slide with the various digital assets but it was 18 months old. He commented that a revised personal learning landscape slide would now include Facebook and MySpace, which is the second most highly used site on the Internet. He noted that there are no library resources on the chart and how they or the services of CENDI memers might be put on the chart.
Increasingly, people are creating social and digital places and using services to interact with the network. The trail from databases to web sites to workflow, or the “networkflow,” may be self assembled or prefabricated. The only requirement is that the resources must be URL addressable. If resources are not addressable, they are relatively inaccessible and cannot be referenced or acted upon. Attention is important because time is scarce. Users don’t want to do too much work, which is also reflected by the “just good enough” syndrome. People are producers as well as consumers, and they are increasingly interested in making their work available on the Web.
What is the perception of libraries in this environment? In November 2005, OCLC commissioned a Harris Interactive poll to investigate the perception of libraries by the general public and then more specifically by college students and teenagers. The largest part of the random sample was in the US. (The full report is available on the Web at http://www.oclc.org/reports/2005perceptions.htm. The more recent report on the perception of college students is also available at http://www.oclc.org/reports/perceptionscollege.htm.)
Across the whole population, satisfaction with libraries is high and they are trusted resources. However, Internet search engines are considered as trustworthy as libraries. Among total respondents, satisfaction with the total experience with librarians is lower than with search engines. The library brand is still perceived to be closely linked to books and a physical place. There is low awareness of the digital resources that are available from libraries. When looking at responses by college students and young people, there is a lack of awareness and the nature of libraries don’t fit with their lifestyle. College students have more awareness of libraries than do teenagers. Overall, despite the value placed on libraries, this doesn’t equate to financial and political support.
Based on the results of the report, Dr. Dempsey speculates that just putting more information on the web, doing more of what we have been doing, may not get libraries the same levels of satisfaction as large collections did in the past. Libraries must determine how to add value and how to establish more gravitational pull to libraries.
Being on the network is increasingly about being inside people’s behavior. Libraries need to synthesize the various resources and specialize them for the local environment, constituency and context. Mobilizing the information to distribute it where the users are will also be important. The information from the library must be in the users’ workflows in workplace applications. The synthesis must be hidden behind a single interface, but federated searching is turning out to be very difficult. People are exploring how to do an interface that provides valuable services across a variety of disparate resources, while synthesizing at a local level.
OCLC is working on some prototypes of these principles. For example, an OCLC service to expose the OCLC Union Catalog is a form of synthesis at the local level. A search box to WorldCat and eventually to a portal exposes library services in a network environment. Web services through OCLC can be targeted by virtue of the users’ IP address. In effect, this mobilizes a whole library system.
Adding functionality to the browser could be achieved by determining audience level of material by computing a score via a script based on which libraries in WorldCat hold the item. A search can be sent into Open WorldCat based on the zip code. Libraries can use this service. An RSS feed could be set up for any search you do.
The Ann Arbor District Library (aadl.org) allows users to annotate an image of a card catalog record. A service has been developed to support the collection, storage and retrieval of digitized images submitted by the community, creating community memories. The undergraduate library at the University of Minnesota provides UThink, a blogging environment. The library also provides other tools, like a planning calculator, to help students track their deadlines. The goal is to pick services that meet the user’s needs and behavior.
The University of Minnesota developed a framework for analyzing user behaviors and identifying the appropriate resources (http://www.lib.umn.edu/about/mellon/docs.phtml). The approach begins with a series of primitives. At the most basic, users want to discover, gather, share and create. Working out from these primitives, the analysis identifies more specific behaviors such as organizing, annotating, data sharing, rights assessment, etc. These behaviors provide the basis for conducting surveys and analyzing available data to determine the current effectiveness of the users’ behaviors in each of these areas. For example, 60% of the faculty members share materials with colleagues in paper format. Over 70% would like assistance with organizing and storing materials. The last step of the process is to determine services that address the behaviors based on the data collected. This framework helps to ensure a better marriage of the services with the behaviors and workflows of the users.
In order to establish gravitational pull toward libraries, it is important to understand that no single resource is at the center of the user experience. At the center are the network and the user.
Those organizations that are effective have brought together supply and demand. The long tail is aggregating supply by unifying discovery. Unifying demand by mobilizing large populations is being done. When aggregation takes place, transaction costs are lowered. For example, iTunes has a single experience and low transaction costs. It aggregates demand by having a large number of users. Therefore, it is more likely that the person will find an otherwise rare resource. Similarly, businesses that think it isn’t worthwhile to advertise in the local paper advertise with Google and other Internet services because of the large audience. Google, Yahoo and others have moved activities from the local to the network level.
Libraries tend to be active only at the local level. However, when libraries network aggressively, they aggregate supply and demand. They cut out the issues that come from going to individual web sites and the chances of people finding what they need go up as does the likelihood that a resource will be used. One example of this is OhioLink which has resulted in library collections in Ohio being more heavily used than elsewhere in the country. Australian libraries have also been consolidated through a single interface. This raises the question as to whether the library catalog should be maintained at the consortial level or at the local level.
The push of Google Scholar is one of aggregating supply and demand. Google has had to take steps to aggregate, including linking into Open WorldCat. It is very important to Google to make people find what they want by linking to resolution data and reducing the number of clicks to the information. This involves synthesizing the network of resolvers while setting scripts in an individual’s browser to resolve to his own local environment.
OCLC is working on thesaurus federation and integrating this service into MicroSoft Research Pane. This approach will make the thesauri available to any MS application within the workflow.
Libraries must add real value in a networked environment. They must create gravitational pull through aggregation and moving to a higher place (the network). Fragmentation is the enemy. Dr Dempsey’s concluding point was: Pay attention and seek the attention of your users; they are changing their information habits.
Clifford Lynch, Executive Director, Coalition for Networked Information
CNI is an organization dedicated to supporting the transformative promise of networked information technology for the advancement of scholarly communication. It sponsored by Educause and the Association of Research Libraries. CNI has about 200 members including major research universities in the US, government organizations, professional societies, publishers and non-US members. CNI has strong ongoing collaborations with other organizations, particularly in the UK, around its agenda. CNI does its work by convening meeting, representing the members, and authoring. They do not perform operational services and are not a lobbying group. However, they are often involved in discussions that may lead to the development of legislation.
One example is the examination of the orphan works issue. The extension of copyright resulted in works with little commercial context or value, and users have difficulty finding out who owns the work in order to seek permission for use. How do people want to use these works and what frustrations do they encounter? The discussions, facilitated by CNI, resulted in the introduction of legislation which limited liability after a reasonably diligent search for the copyright owner and resulted in different remedies depending on the intended use. CNI was also involved in discussions of the safe harbor and take down provisions of the Digital Millennium Copyright Act (DMCA).
CNI has long been interested in the technical and policy issues surrounding the preservation of digital material. Real progress has been made on a series of reasonably plausible ways to deal with the preservation of scholarly journals. CNI continues to monitor this. Members are interested in how new kinds of information, from data (including observational and simulations) to new genre works such as blogs, will be preserved.
At the same time that CNI is monitoring traditional scholarly publishing, it asks what is and can be done outside the traditional scholarly publishing system. Encyclopedias are undergoing a resurgence. Linking and search aids have made them more useful, multimedia can be included at an affordable cost, and it is possible to update the content at any time.
Specialty reference databases are also being developed outside the traditional publishing arena. These databases, many of which may be of great importance to a particular community, are particularly at risk. They may be on some old computer or in the hands of a retiree. An institution may not be aware that the resource even exists on the computer of one of its scholars and, when the person retires, the computer’s content are just deleted or archived off-line.
Institutional repositories (IR) have a role to play in addressing these issues. Institutional repositories are intended to disseminate and manage what is produced by an institution. Dr. Lynch’s definition of IR is broad and would handle both new and old genres. (There is a more narrow view which is popular in the UK, which equates IRs to e-print repositories, which is often connected to issues of author self-archiving in the context of open access.)
CNI looks beyond the current environment to shifts in scholarship and scholarly communication as they relate to IRs. There are many technical and policy problems. CNI is tracking the rate of deployment and growth by working with UK and Dutch colleagues.
In recognition of the change in scholarly publishing, CNI is increasingly focused on data curation and preservation. To date, changes have focused more on the data which may be of interest to a broader community. Data set reuse and ultimately meta-analysis is true repurposing of data. In these contexts, what does curation mean? A more refined set of activities related to data curation must be identified.
There are two basic approaches. Disciplinary access at the national or international level is one approach. The UK established large discipline-oriented data archives for the Environment, Humanities, etc. In the US, institutional repositories can potentially take on this role, but there are problems of scale and economy of scale. The National Center for Biotechnology Information (NCBI) has built very specific tools to support the discipline, which individual institutions could not afford to do. It is easy to forget that half of the faculty do not get grants and many aren’t in the sciences. There is no big funding stream for curating data.
The issues in the humanities are similar to those in the sciences, but the current state of affairs is not as uniform. A draft report on Cyberinfrastructure in the Arts and Humanities is available. Dr. Lynch is not optimistic that there will be a national system of data centers, so, the question is how to make the mixed environment work.
CNI is also interested in policy and practice issues. These issues are receiving the attention of provosts and Chief Resource Officers. Grants that saddle universities with IR requirements are of concern.
Bits are at risk even in the short term. For example, the 2005 hurricanes caused academics to think about business continuity and they are increasingly concerned about digital preservation in this context. The demand for networked university services doubled in the 2 months after Katrina as researchers realized that their work is at risk.
CNI is interested in the implication of large scale digitization projects such as Google Books. We are still using the old models of search and retrieval, while these digitization projects open the door for computing on large literature bases and text mining, changing how we think about literature. Rather than a machine-user interface, machines will interface with machines. The results to users will be at a higher level. Most of our delivery systems and intellectual property regimes are inconsistent with this large-scale environment. User interfaces have been developed for individuals not for machines. One small example of a project that is a showcase for this is by Greg Crane called Perseus. It shows how to embed, geo-reference and otherwise improve the linking and value within large repositories of classical literature. Dr. Lynch suggested that large scale digitization projects may have an early impact on public domain repositories, because of reduced intellectual property concerns.
The group discussed issues within the education environment including the preservation of course catalogs as records. One of the drivers for IRs at some universities is the desire to archive institutional records. Some institutions are doing web crawls of their own domain. This kind of material should be in scope along with ephemera such as special events and symposia.
The learning or course management systems being procured by institutions will have a major impact. Face-to-face material and ephemera are turned into syllabi and learning objects. What should be done with this material when the course is over? Is it a record? What is its usefulness? Who can see it? What rights do the students have if they have contributed to the content? What is the obligation for making notes available on an ongoing basis when students are encouraged to use online resources rather than taking notes? The consensus is that learning management systems are not adequate for archiving and the content must be exported to institutional repositories.
Dr. Greer suggested that data and text archiving are diverging, when it would be better to consider the text as good metadata for the data. Universities are in a unique position to discourage this. Data and text could be linked through persistent cross referencing. Alternatively, a new kind of documentation method could be developed that more closely links data and commentary with appropriate navigation by linking the data into the stream of argument. Authoring tools are not currently available to produce such objects easily, and better navigation is required.
NAL Showcase (Maria Pisa)
Maria Pisa introduced the vision of the National Digital Library for Agriculture. The NDLA is envisioned to be a comprehensive digital collection, accessed by a robust search engine with links to analytical tools vital to finding solutions for the problems faced by the food and agriculture enterprise. The concept for the NDLA came from the 2001 Blue Ribbon Panel Report on the NAL and was endorsed by the Agriculture Research Service Advisory Board and the ad hoc Task Force on the NAL.
The challenges for the NDLA include the interdisciplinary nature of food and agriculture, the size and complexity of the relevant datasets, the increasing cost of licensing digital content, and the continuing importance of legacy information. Pre-1942 literature is highly relevant in solving today’s problems, such as identifying pre-pesticide solutions.
The NDLA concept is built on a network of partnerships collaborating on resources and services. NAL would provide coordination. The partners would share resources, plan redundancy in support of preservation, and extend services across time zones and audiences. Other key components include tools, including database systems and robust search engines; and knowledge assets, such as technical experts, indexing and metadata to support retrieval, and extensive digitized collections.
Many elements of the NDLA already exist. However, they lack a blueprint for leveraging existing programs, collections and services; system-wide coordination; the funds to develop the infrastructure to store, preserve and provide access to the collections; and the funds to build advanced tools to effectively use the knowledge held in the collections.
The collection must be built in a cooperative way, but no plan yet exists to do this. The Agriculture Networked Information Center (AgNIC), which has been coordinated by NAL for several years, needs to be taken further. In order to gain interest in collaboration, NAL is beginning a community-wide visioning process, including presentations at SLA. The FY07 budget request includes seed money for the NDLA.
Following the NDLA presentation, Ms. Pisa introduced the various presentation stations available to attendees, which had been set up in the lobby. These included:
- the NAL Digital Repository, which provides public access to the full text of selected U.S. Department of Agriculture publications;
- the Voyager-Relais system, a system to receive and process requests for documents;
- NAL’s redesigned Web site, which acts as a gateway connecting users with NAL’s services and with billions of pages of agricultural information; and
- the Prestele Exhibition, entitled Inspiration and Translation: Botanical and Horticultural Lithographs of Joseph Prestele and Sons, which features original watercolors and lithographic prints that document the family’s work for botanists and horticulturalists in the late 1800s.
Products are available from the Sales Desk are based on some of the Library’s images, and a portion of the proceeds funds the conservation treatment for special collections.