CENDI PRINCIPALS AND ALTERNATES MEETING
Government Printing Office
January 6, 2009
“Information Issues: A Conversation with CENDI” (Bruce McConnell and Jane Griffith, Obama-Biden Transition Project)
“Strategic Planning for Digital Data Policy in the US: IWGDD Status and Recommendations” (Dr. Chris Greer, NITRD, and Cita Furlani, NIST – Co-chairs, Interagency Working Group on Digital Data
“Government Printing Office Showcase” Cindy Etkin, GPO
Ms. Herbst, CENDI Chair, opened the meeting at 9:15 am. She thanked the Government Printing Office for hosting the meeting.
2009: A YEAR OF TRANSITION
Mr. McConnell began by providing background on the New Administration transition. Transition Team members are charged to listen as part of Obama-Biden Team’s principal of open inquiry. This type of conversation is close to the operational style of the team and the Administration. In that spirit, information about the meeting and the document prepared by CENDI will be posted on Change.gov so that people can comment on it.
The Transition Team is composed of two parts. The Agency Review Teams are preparing materials and recommendations for incoming Administration appointees. Mr. McConnell and Ms. Griffith are part of the team for NARA and GPO. For Cabinet agencies, the review teams have already transformed into briefing teams. There is a set of Policy Teams on Health, Energy and other key areas, and a TIGR (Technology Innovation and Government Reform) Team dealing with cross-cutting issues such as health, government and science that may impact multiple agencies. The team on Government is already looking at improving the Federal Register. It is also dealing with open government initiatives such as transparency and FOIA.
Ms. Herbst introduced the topics of interest by saying that CENDI members believe that STI provides information so that users can make informed decisions. The budgets of the information centers are a minor percentage of the total investment. Even so, they are important to the Return on Investment on the R&D. We believe that the collection and sharing of this information accelerates future discoveries and that STI touches and improves all areas of government.
CENDI members identified nine major challenges and opportunities related to Scientific and Technical Information (STI) in the federal government. The conversation around each point was led by a CENDI member.
#1 Build the information infrastructure and create high value jobs.
As the Administration thinks about rebuilding the nation’s infrastructures, we should make sure to include the kind of infrastructure needed in an information-based economy. CENDI defines “infrastructure” as “information infrastructure.” IT is necessary but should not be the sole focus. “NSF’s Cyberinfrastructure Vision for 21st Century Discovery” provides details on the value of investment in science and the need to preserve the results. A cross-cutting examination of STI infrastructure is needed. CENDI is an informal organization and has a role to play, but a more formal review is needed, perhaps under the auspices of OSTP.
Legacy collections are extremely important. They should be turned into “findable” electronic information. Much of this effort is currently voluntary. However, it needs a major influx of funds to coordinate and build on what has been done. For example, 75 percent of the NTIS repository is not yet digitized.
Equally important are the white-collar jobs that such an effort to digitize and preserve can generate. People can be trained to do this in economically depressed areas where jobs are needed. NLM has been supporting a tribal consortium performing digitization activities. The work with the tribe began with simple digitization efforts, and they are working up to higher levels of activity and complexity. This type of effort could be expanded. The WPA model could be used to improve people’s new job skills and to teach skills for the future.
There have been discussions about the role of the private sector versus the public sector. Some concerns have already been addressed. It is important to discuss how to get this job done in the public interest.
#2 Harness the potential of STI to support STEM Education and the implementation of the America COMPETES Act (ACA).
Federation is one example of how STI can help science education. STI is the content of STEM education and provides insights as to how an information technology infrastructure can support it as well.
The ACA, passed in 2007, addresses this too. For example, it re-charters the Department of Energy in the area of education and calls for the Department to develop an integrative tool in consultation with NSF and others.
Education is part of all agency missions in one way or another. The CENDI agencies deal with education in a variety of ways from supporting underrepresented minorities to experiments at NLM regarding public education on health and healthcare issues. New tools such as virtual spaces may be used in this regard. There is a role for STI organizations in this as well. We are beginning to see changes in the traditional journal literature, with increased interactivity. They address different learning styles in order to meet students where they are.
The New Board at the Academy and others are addressing the need to better train young scientists in the use of large datasets and the linking of them. “Data Issues 101” might focus on common content and introduce them to the research and information science programs. IMLS is working with library schools to build curricula in digital curation, including internships. The first cadre of PhD candidates will be graduating soon.
#3 Promote public access (PA) to Government-funded R&D results.
The current, mandatory NIH directive has improved publisher/author compliance from less than 25 percent to approaching 60 percent. However, this includes only NIH-funded literature. NIH has sought to address publisher and author issues and problems over the time that this legislation has been in place. In the last Congress, a bill was introduced that would have made the public access policy illegal. This bill is likely to be reintroduced. Perpetual language is in the appropriations language for both the House and Senate.
Public access requires an open repository. The PubMed Central repository infrastructure is freely available and is being used internationally. A lighter weight version was just installed in the United Kingdom, and a joint activity is underway with the Canadian Institute of Health Research and CISTI. The Archival DTD for journal articles used in PubMed Central has been endorsed by the LOC and the British Library as an archival standard for journal literature. NAL has also developed a repository based on DSpace for final published versions of intramural publications.
The value of public access to education, including K-12, is huge. Innovation and advancement come from this public access. The Human Genome Project proved that public access to scientific data produced by both publicly and privately funded research groups fuels scientific discovery and commercial innovation. There are barriers to sharing within the agencies and some federal libraries are buying back agency materials in journal issues because they can’t get them any other way. Getting the results from grants is actually more difficult than from contractors. However, activities are underway. NIST is doing interim and other project report formats. Environmental conservation data is being included in the Biodiversity Heritage Library, which is being created through a partnership including the Smithsonian.
#4 Strengthen the role of the Office of Science and Technology Policy (OSTP) in the STI infrastructure.
CENDI sees OSTP as key in the STI infrastructure. Is there a sense that there will be a change to the current structure? CENDI is involved in some working groups and subcommittees. The Interagency Working Group on Digital Data (IWGDD) report recommends support for a digital data infrastructure. The recommendation is to turn the IWGDD into a standing subcommittee. An STI portfolio in OSTP would help to coordinate information across agencies by establishing longevity and formality.
#5 Take a broad approach to improve the accessibility of Government Information through e-Government: focus on ends, not means.
Agencies have embraced e-government initiatives. However, e-government is often focused on the means and not on the ends. Focusing on the ends allows agencies to identify better possibilities and to keep the initiatives technology independent. The recent report language also mentioned that corresponding library support, including that provided by Federal Depository Libraries, is also important.
Making agency databases more accessible to search engines and more linkable is important and agencies have been doing this. However, agency databases have other features that make them valuable in their own rights. Integration of government information, such as that done through Science.gov, is beneficial. Science.gov celebrated its 5th Anniversary last year. The Department of Transportation Library has recently joined. Alerts can allow people sign up for updates on key areas of interest. Science.gov is the US contribution to Worldwidescience.org, a self-funded Alliance modeled on Science.gov, with over 50 member countries representing 80 percent of the world’s population. China and India have recently been added. Worldwidescience.org will be meeting at the end of February, sponsored by the International Council for Scientific and Technical Information.
These activities in which CENDI and others are involved have leveraged funding and broken down agency silos. They also serve to filter authentic information from the government.
#6 Develop an understanding of the new dynamic between the public and private information infrastructures.
With the rapid advance of information technology, there is a new dynamic in information. New technologies, such as hand helds, have changed the way we work and computing is going into the “cloud.” We need to develop a better understanding of this dynamic as we determine the resources that must go into building an information infrastructure for the future. Successful state models and other technology transfer processes should be examined for best practices. The focus should be on small businesses and start-ups.
#7 STI as a tool in international diplomacy.
STI is a powerful tool in international diplomacy. Programs such as Fulbright should be re-funded, and immigration policies should be reviewed. We need to encourage the development of international exchanges including across businesses. Subject matter experts from the Department of State and other agencies should be linked. Policies related to intellectual property (IP) and patent laws should be developed with regard for their impact on scientific exchange – not solely their impact on the entertainment and other industries. More diplomatic relationships are needed with countries and regions where we have historically had less diplomatic connections. Science should be brought to bear on all relevant issues. The STI people should be at the table when the policies are being made, especially policies that relate to Intellectual property.
#8 Support government accountability and transparency and effective implementation of the e-Government act.
Accounting for research in progress is very important. Previous systems, such as RADIUS, have been developed but none has been fully successful and some have been very expensive. Through CENDI’s approach of a federated network of agency database, a workable, more efficient system might be developed. However, there is a need for funding and agency buy-in. Many agencies have project summary databases that could be federated. This allows the Administration to capitalize on what exists. If the Administration is committed to transparency in the publicly financed R&D sector, a proposal could be made on developing such a federated system.
#9 Support improved health care and better disaster preparedness and response through the development of an interoperable health information technology infrastructure.
There are many federal agencies that deal with human health directly and indirectly. Health and Health care are major Administration issues. Technical standards and interoperability are needed to take the best advantage of the work that has been done. Disaster Preparedness is another cross-cutting issue. In both cases, you need to provide Point of Service where needed. The public library can be used as a resource; it is worthwhile to explore roles and encourage involvement. Of interest may be the Computer Science and Health Communications Board report on health care that has just been released and is available from the NLM web site.
Mr. McConnell and Ms. Griffith indicated that this conversation with CENDI was very helpful. They will summarize the key points and share it with incoming Administration staff. There is some overlap with the interests of other Science teams being led by Tom Kahlil. They expect that those teams may have heard some similar comments. They encouraged the CENDI members to reach out to the new team.
“Strategic Planning for Digital Data Policy in the US: IWGDD Status and Recommendations”(link to presentation, .pdf) (Dr. Chris Greer, NITRD, and Cita Furlani, NIST – Co-chairs, Interagency Working Group on Digital Data
The Interagency Working Group on Digital Data (IWGDD) is one of several working groups under the Committee on Science of the National Science and Technology Council (NSTC) of the Office of Science and Technology Policy (OSTP). There are 29 members including a number of CENDI members. There has been good participation with some of the very best data people involved.
The charge to the group was to develop a strategic plan and promote its implementation. They are now turning to implementation issues. The scope and definition of digital scientific data was very broad including algorithms, software, numeric data, and text. The whole data life cycle was addressed from the proper and reliable management of the data with an appropriate policy framework to the publishing of the data.
Several guiding “first principles” were identified. Preservation is both a government and a private sector responsibility. This is a long-term investment for the public good. However, the private sector also has needs and resources. Communities of practice are needed because one size does not fit all. Communities will differ by culture, types of data, size and volume. The IDC estimates that the amount of data now exceeds the capacity to store it on a global basis. (NOTE: The IDC has no obvious acronym but its parent company is IDG, or International Data Group.) Obvious and less than obvious users must be considered and all stakeholders should be involved in the decision process. A dynamic framework is needed because what people expect today is a lot less than people will expect in the future.
There are three core recommendations in the IWGDD’s report. A Standing Subcommittee is essential to continue promotion and implementation of the recommendations. Appropriate departments and agencies should lay the foundation for agency digital scientific data policy and make the policy publicly available. Agency projects should be required to submit data management plans.
In advance of a decision about recommendation #1, the working group is moving forward on recommendations 2 and 3. They are looking at best practices, principles and templates that could be used more widely. The Committee on Science representative is the key point of contact for CENDI members into this group.
Cita Furlani emphasized the fact that no single approach will meet everyone’s needs. The approaches need to adapt by mission, recognizing the broad community goals. The working group has developed a list of policies and plans, but is looking for others. A bibliography has been developed that can be shared. OMB and CIO Council activities have focused on the scientific data portion; agency CIOs should be involved. Having agencies work together is the best way to achieve the goals.
It was suggested that the Public Access debates might provide some lessons learned that would be valuable for ensuring data generator compliance. The data management plan is really key. Each proposal should be required to state the broad impact expected by making the data from the research available. The Data Management Plan formalizes these issues.
There has been remarkable support for the need for repositories from some NSF communities, especially in the biology area. Private foundations are now making stricter requirements regarding repository deposition. A full spectrum of solutions would be best. Incentives were also mentioned. Credit is needed for the use of data. Provenance and attribution are needed. The shared instrument realm is a new paradigm that will require new approaches as well. Dr. Greer believes that the government can lead by example here. The European Union is increasingly turning to open data to encourage economic growth. This has always been the US approach.
Metadata Initiatives (link to presentation, .pdf) – Laurie Hall
GPO’s cataloging initiatives are based on Title 44, which is the statutory mandate for GPO’s indexing of Congressional and public documents, GPO’s production of a monthly catalog, and GPO’s position as the national cataloging authority. Of course, GPO is no longer doing the print catalog. The Catalog of Government Publications is now updated monthly online.
To address the metadata challenges, GPO is investigating flexible cataloging treatment. Different levels of metadata, abridged, brief or full, would be applied depending on the content. For example, a Shelf List Conversion Project is digitizing old catalog cards. Scanned pre-1976 records would be the basis for brief MARC records. The goal is to extract whatever they can from the records and create more metadata. The next goal would be to link up the metadata to the documents through the Federal Depository Libraries.
GPO has initiated a two-year R&D Project with Old Dominion University (ODU). ODU developed metadata creation software with funding from NASA and DTIC. GPO has provided ODU with test records from the EPA web harvest results and from Congressional monographic material. ODU’s approach uses templates. GPO is currently working on the testing and evaluation plan; a demonstration isexpected later this year.
GPO is also planning to take advantage of connections between FDSys and its other systems. One initiative involves mapping between FDsys and the GPO Integrated Library System. The bibliographic metadata in AACR2 and MARC21 would be stored in the ILS with preservation and technical metadata in FDSys. The two systems would exchange metadata through complete packages or a Z39.50 gateway.
Federal Digital System (FDsys) (link to presentation, .pdf)– Selene Dalecky
One of the challenges for GPO is that users increasingly expect information to be electronic. However, digital information needs to be authentic and verified as to its correct version. Digital information also needs to be available almost immediately. It is easier to do this with new information, but how do you know the information will remain available?
GPO has been working on the FDsys system for almost five years. A vision statement was written in October 2004. FDsys is a customer-driven, digital information system, which support all three branches of government. It will automate the collection, management and dissemination of government information, moving GPO from a print-centric to a content-centric environment.
FDsys focuses on the submission of material to GPO from the agencies. GPO will continue to harvest content from agency sites and to convert content from previously printed publications.
The access philosophy is to provide a simple search with advanced result options. Advanced search features will also be provided that allow a user to efficiently find a specific document. In addition, the system will provide relevant results quickly.
Authentication is particularly important when there is no print equivalent to which people can refer. FDsys also provides preservation through a safeguarded repository. It will be possible to assess the condition of the repository and to ensure that preserved material can be used despite the change in technology.
FDsys is being rolled out in a series of releases. The first release, which will provide the basic infrastructure, lay the foundation, and replace GPO Access is scheduled for the end of January. Beta testing is underway now and agency staff members are invited to e-mail firstname.lastname@example.org to register as a beta tester.
The scope of FDsys is closely tied to Federal Depository Library Program (FDLP) requirements. The OAIS Archival Reference Model is being used by GPO and NARA to ensure interoperability. The FDLP libraries will provide assistance, legacy collections and subject matter expertise. API and web service capabilities will be important. A test of APIs is being planned for a later phase.
Eight collections out of 50 will be included. Over the next six to seven months, the remainder of the collections will be migrated. In the meantime, GPOAccess will remain the system of record until the cut over is completed.
The second release will introduce the submission of Congressional content and add search enhancements. The third release will include agency and converted content submissions. Additional releases will include customization, collaboration, and authoring tools, alerts, and preservation processes.
Authentication of Online Federal Resources (link to presentation, .pdf) – Lisa Russell
GPO is engaged in a major authentication initiative designed to assure users that the information provided by GPO is official and authentic. The initiative employs Public Key Infrastructure (PKI) technology. It allows users to determine that the files remain unchanged since GPO authenticated them.
Ms. Russell showed examples of the Seal of Authenticity and how it would be displayed in particular circumstances. GPO uses a digital certificate to apply digital signatures to PDF documents. The digital certificate is issued by a Certificate Authority (CA) when it receives proof of identity. In order to validate the signature, there must be a certification path between the certificate and the CA. An example of a certification path is a driver’s license that is provided to an individual by the department in the state which has been authorized to do so by the particular state. In the case of digital certificates, the certification is provided to the Superintendent of Documents through the GeoTrust CA that has been authorized by the Adobe Root CA.
Ms. Russell showed screen shots of the validation process under different versions of Adobe Acrobat Reader. The process is supported by a series of Adobe Validation Icons. For example, a blue ribbon icon indicates that the certification is valid and a red X icon indicates that the certification is not valid.
Beta testing began in May 2007 with GPO Access, using the 110th Congress Authenticated Public and Private Laws. At the time when this beta was underway, the existing text files and unsigned PDF files remained available from GPO Access. This approach allowed for testing of the technology and analysis of user feedback before full release of the authentication process. There was also a link from the web page to a survey to collect feedback.
In January 2008, GPO deployed an Automated PDF Signing system. This allows GPO to automate the digital signing of the files more efficiently. The first application of this system, digitally signing the PDF files for the FY2009 E-Budget on GPO Access, was released in February 2008. GPO’s second use of the APS was to integrate it into the workflow for the beta release mentioned above. After successful integration with the beta application, it was integrated into the live application.
GPO is currently conducting a beta test with the House and Senate staff regarding authentication of the Congressional Bills on GPO Access. They plan to sign from the 110th Congress/2008 forward as new applications are authenticated on GPO Access. The Authentication Web Page can be found at http://www.gpoaccess.gov/authentication. This site contains links to the applications, presentations, and general information about authentication including a glossary.
Eventually GPO will launch the digitally signed Congressional Bills application. Additional applications will be added in coordination with agencies publishing in FDsys. The digital signature capabilities will be expanded to various file formats and levels of granularity over time, including inter-document authentication.
The technical morning program adjourned for lunch.