Agenda

May 12, 2011

US GEOLOGICAL SURVEY

USGS Visitor’s Center
12201 Sunrise Valley Drive
Reston, VA 20192


9:00 am - Welcome and Introductions
                     Lisa Weber, Director, Information Technology Policy and Administration, NARA, and CENDI Chair


9:15 am
- From Science to Innovation to Jobs: The Role of Scientific and Technical Information

                  “Interaction of Big Data and Technology in NITRD Initiatives: Where Should R&D be Taking Us?” [presentation .pdf]
                  -Wendy Wigen, Human Computer Interaction and Information Management Coordinator and Large Scale
                   Networking Associate Coordinator, National Coordination Office for NITRD

                  “Technology Transfer from Science, to Innovation, to Jobs: The Federal Laboratory Consortium and Interagency Cooperation”
                  [presentation .pdf]
                  -Paul R. Zielinski, Director, Technology Partnerships Office, National Institute of Standards and Technology

                  “Citizen-Driven Innovation Through Open Data” [presentation .pdf]
                  -Jeanne Holm, Evangelist, Data.gov, General Services Administration (on loan from JPL)

 

11:15 am - Host Showcase - US Geological Survey
                  Mark Fornwall, Annie Simpson, and Ben Wheeler

                  » Jewels of the USGS Library (Richard Huffine) [website demo only]
                  » New Advances in the Integrated Taxonomic Information System (Gerald "Stinger" Guala) [presentation .pdf]
                  » USGS Raptor Search (Tim Woods) [presentation .pdf]

12:15 pm - Group Lunch and Special Presentation



Minutes

 

Members
Dr. Philip Bogden (NSF)
Dr. Blane Dessy (LOC/FLICC)
Dr. Mark Fornwall (DOI/USGS)
Eleanor Frierson (NAL)
Glenn Gardner (LOC)
Tina Gheen (NSF) David Jones (DOT)
Sharon Jordan (DOE/OSTI)
John Martinez (NARA, via teleconference)
Jerry Sheehan (NLM)
Annie Simpson (DOI/USGS)
Gerald Steeman (NASA, via teleconference)
Wayne Strickland, (NTIS)
Dr. Walter Warnick (DOE/OSTI)
Lisa Weber (NARA)

Speakers
Dr. Gerald “Stinger” Guala (DOI/USGS)
Jeanne Holm (GSA/Data.gov)
Richard Huffine (DOI/USGS)
Wendy Wigen (NSF/NITRD)
Tim Woods (DOI/USGS)
Paul Zielinski (NIST)

Observers
Michael Huerta (NLM, via teleconference)
Bohdan Kantor (LoC, via teleconference)
Dr. Neal Kaske (NOAA)
Peter Lincoln (DOE/OSTI, via teleconference)
Dr. Robert Shepanek (EPA, former alternate)

Working Group Chair
Vakare Valaitis (DTIC)

Secretariat
Bonnie C. Carroll
Gail Hodge
J. R. Candlish (via teleconference)

 

Welcome

Ms. Lisa Weber, CENDI Chair, opened the meeting at approximately 9:20 am EST. She thanked USGS for hosting the meeting. Dr. Mark Fornwall, USGS, introduced Mike McDermott, Acting Director for the Office of Biological Informatics in the Core Science Systems group.

Mr. McDermott welcomed the CENDI group and explained the changes that are underway at USGS. The traditional discipline-oriented breakdown that has existed was questioned in the 2007 USGS Science Strategy. This science strategy was a science-oriented plan on top of the strategic planning that was already underway. When Dr. McNutt came in as the USGS Director, she liked the report and decided to realign the agency along the social challenges identified in the plan. This is the first year in the formation of the new organization which focuses on moving across discipline boundaries and into integrated science.

There are six to seven mission areas. The Ecosystems mission reflects the biological resources disciplines, including fisheries, aquatics and a lot of the previous Biological Resources Discipline (BRD) perspectives. The Water mission focuses on water quality, streams, lakes, reservoirs, etc. Energy and Minerals also includes Environmental Health and covers toxic substances, hydrology, and contaminant biology. Climate Change and Land Use includes the Earth Remote Observation System (EROS) and sequestration issues. Natural Hazards covers earthquakes, volcanoes and landslides.

The Core Science System brings forward the informatics emphasis and supports the other mission areas through information management, data management, and data integration. The previous biological informatics program is now part of Core Science Systems along with the two largest mapping programs -- the National Map and the National Cooperative Geologic Map. It also includes the libraries and the Federal Geographic Data Committee.

Mr. McDermott emphasized the commitment to cooperative groups such as CENDI. They are still reorganizing and making decisions about who will participate.

From Science to Innovation to Jobs:
The Role of Scientific and Technical Information


“Interaction of Big Data and Technology in NITRD Initiatives: Where Should R&D be Taking Us?” - Wendy Wigen, HCI and IM Coordinator and Large Scale Networking Assoc. Coordinator, NCO/NITRD [presentation .pdf]


There is a dilemma around data. Data is valuable but we don’t know what to do with it. Therefore, OSTP has mandated the National Information Technology Research and Development (NITRD) Big Data Initiative. Ms. Wigen used the analogy of grocery stores. We have become so successful at gathering the food of science that we are totally overwhelmed. How do you get from the raw food to various stages of product? The delivery and organization of data is a global issue. It is a problem of abundance. What are we going to do to make it useful to our scientists?

NITRD is celebrating its 20th anniversary this spring. It was organized for agencies to discuss their IT R&D programs and has focused on IT and networking R&D. The goal is to collaborate and coordinate to achieve efficiencies. The National Coordination Office (NCO) is the coordination office for NITRD. It interfaces for NITRD with the Office of Management and budget (OMB), Government Accountung Office (GAO), Congress, etc. Because the White House is tuned in to technology, this has resulted in a number of initiatives, including Big Data.

Ms. Wigen described the NITRD program components, including Human Computer Interaction and Information Management (HCI and IM), which is where the Big Data Initiative has been placed. Senior Steering Groups are now being formed to address key issues. The newest are Wireless Spectrum Efficiency and Big Data, also known as data-intensive science. However, the group decided that data-intensive science is only a part of Big Data.

Under the NITRD structure, HCI&IM will be doing a lot of the leg work to determine what agencies are currently doing with regard to the management and distribution of Big Data. This is all about R&D. What are the next sets of technologies needed? What are the agencies doing and what are the gaps? How can limited funds be used to get where we need/want to be?

She reviewed CENDI’s major technical challenges with data and confirmed that NITRD has also identified these same problems. These include multi-media and non-structured data. NCO is working on the Semantic Web. Traditional analytical methods won’t work any longer. New tools are needed that focus on having machines do the grunt work, producing summaries, identifying what is unusual, detecting relationships, and visualizing the data for human analysis.

In addition to the challenges jointly identified by CENDI and NITRD, Ms. Wigen highlighted the issue of interoperability. This comes to the forefront, particularly in moments of national disaster and emergency. Data is normally organized around the agency’s mission and vocabulary. One successful scenario is NASA working with the Department of Interior on fighting fires. They worked together to make their databases more compatible, coordinated, and standardized. Their individual databases were built for a particular use case and they needed to think ahead about what might be needed by others, both nationally and internationally. How do we implement interoperability more systematically? As an issue for NITRD, this focuses on the R&D aspects only. Ms. Wigen believes that we don’t have a data problem-- we have an analysis problem.

Big Data’s workplan is very aggressive beginning with the identification of relevant FY11-12 solicitations, educational offerings, agency competitions, and public-private partnerships, such as what Google and Facebook are doing from which we could learn. This is due by June 1, 2011. By July 1, NITRD must describe what a national Big Data initiative would look like, including its vision, goals, and scope. This initiative has been tagged the National Information Initiative, combining NITRD’s previous networking and supercomputing initiatives. The national initiative would be fleshed out by September 1 for presentation to the White House and Congress.

Communication and collaboration with other organizations such as CENDI will be important. We need to jointly establish a consolidated message that is easily understandable in order to raise appreciation for the importance of R&D. We need to determine where we can start and then “spool up” to get beyond the current budget issues. This will require not only interagency cooperation but public-private partnerships.


“Technology Transfer from Science, to Innovation, to Jobs: The Federal Laboratory Consortium and Interagency Cooperation” - Paul Zielinski, Director, Technology Partnerships Office, NIST [presentation .pdf]

Innovation has been a theme of the Obama Administration from the beginning. The Innovation and Entrepreneurship Working Group is co-chaired by Ginger Lew of the National Economic Council and Aneesh Chopra of the Office of Science and Technology Policy (OSTP). Several subcommittees have been established including Small Business Innovative Research (SBIR), Proof of Concept, Access to Capital, and Federal Lab Commercialization. Big initiatives for economic innovation are coming from this initiative. Old programs are being repackaged as new initiatives by using current authorities and bending them to work. For example, Access to Capital has established “Start Up America” which repurposes $2.5 billion using existing authorities and is having some success. The Federal Lab Commercialization was pushed to the back for a while but is now coming to the fore again.

The goal of technology transfer is to promote availability and use of technology. The goal of federal technology transfer is to get ideas out of the lab and into the economy, use, and practice. This is inherently an information issue. There are a lot of similarities with Scientific and Technical Information Management, but technology transfer takes some different approaches.

The federal laboratories control over 100 billion dollars in federal R&D. Technology transfer uses a number of approaches to disseminate the results, including technical publications that capture the output of contracts, grants, and other activities; hosting guest researchers and post-docs as well as informal collaborations; participation on standards committees; and more formal collaborations. Public-private partnerships are part of this as well, especially within the laboratory environment. About 2.5 percent of the license fees from public domain software is used to capture and grow innovations from small business. Another strategy is to use the intellectual property and patent holdings of the government by licensing them to help build businesses. This allows the government to pay on business terms while allowing a way to control the capital. US businesses are given preference in this licensing.

Authority is provided through several pieces of Technology Transfer Legislation beginning with the Stevenson-Wydler Technology Innovation Act of 1980, which establishes technology transfer as a role of every scientist and engineer within the federal government and laboratories. The Bayh-Dole Act establishes the rights to inventions developed with federal funds. This act is now being used as a model internationally. The Federal Technology Transfer Act enabled government labs to take money from the private sector and put up in-kind resources through Cooperative Research and Development Agreement (CRADA) mechanisms.

The Federal Laboratory Consortium for Technology Transfer was established under statute. It includes 300-700 transfer offices across the government. The consortium, with an elected Executive Board, aims to establish best practices and synchronize these practices across agencies.

The Interagency Working Group on Technology Transfer, established by OMB Circular A-11, includes 11 agencies. It is at the agency level rather than the laboratory level and focuses on policy. The Department of Commerce, National Institute of Standards and Technology (DOC/NIST) provides the coordinator.

A major issue is finding information on what the laboratories produce. Each laboratory and agency creates its own access to this information. There is a need for consolidated technology locator tools, including a repository that brings everything together. This is important, but has not been done, to date. The possibility of using Science.gov as a platform was discussed. It raised the question about coverage of the federal laboratories in Science.gov. Mr. Zielinski offered to be the initial contact for the Federal Laboratories in getting this information added to Science.gov.

ACTION: The Secretariat will review the coverage of the federal laboratories in Science.gov.

Mr. Zielinski believes that there are many areas in which CENDI and the Federal Laboratory Consortium could collaborate, including repository development, locator tools, and metrics and measures. Mr. Zielinski and Ms. Carroll will discuss how to continue the communication between the two groups.

ACTION: Ms. Carroll will contact Mr. Zielinski to determine how the two groups can continue to communicate on a regular basis.



“Data.gov 2.0: Next Generation What’s Now, What’s Next” - Jeanne Holm, Evangelist, Data.gov/General Services Administration [on loan from JPL] [presentation .pdf]

The initial goal of Data.gov was to make data accessible and interoperable across federal, state, local, tribal, and international boundaries. Data.gov is part of the Open Government Initiative to promote transparency and accountability. How do we get it in the hands of people and make sure the data is useful and improves the quality of life for the American people? Ms. Holm’s position was created to address what should come next. Having big datasets available can only go so far.

Data.gov sees itself as a conduit with links back to some information to provide context from the agencies. To date, over 380,000 datasets have been made available. These datasets are connected back to the FOIA groups and the Data.gov search is integrated with FOIA search. This encourages a check of Data.gov before going to release through FOIA. Data.gov wants to link back to the agencies to establish pedigree and provide agency acknowledgement.

Data.gov is also an economic initiative. Ms. Holm provided several anecdotes where government data has helped to build significant business such as affordable GPS devices and the weather information business.

The data also supports responses to global events. Data.gov has 396 different points of contact within agencies providing significant reachback capabilities. The resources that might be useful to support certain events can be identified. A special Data.gov site was created called “Restore the Gulf: Deepwater Horizon Response” which included state information. EPA RADNET has sensors internationally that were used by news agencies to inform the public during the Japanese tsunami and nuclear reactor crises.

Ms. Holm presented the plan for Data.gov. Funding will drive the ability to reach these goals, but government is actually only a small piece of the final vision. A key component is the development of applications and publication of data beyond the government. They have launched a K-12 educational foundation and are featuring virtual internships to help develop applications and publish data. Applications help make the data more accessible and useful.

A new platform for Data.gov using Socrata is about to be announced. Socrata begins to solve the visualization and interoperability problem. The search is not just for metadata but to find specific instances inside the datasets. The software uses a categorization capability to filter out some of the false hits. Topic tag clouds can capture keywords and filtering can be done by view types such as datasets, maps, calendars, forms, etc. A variety of map formats (Google, Bing, ESRI) will be supported. Different chart views are also available, so the user can do a mash-up on the fly without having to bring the data into Excel. There is a large focus on geospatial data integration. Geospatial OneStop, an interagency geoportal managed by the US Geological Survey, is being brought into Data.gov.

Data.gov is also working on a platform for engagement to provide a rich social experience around data that promotes participation and gives people a voice. The concept of communities has been rolled out. There is a specific method for becoming a community, including up-front identification of how the community’s impacts will be measured.

There are currently five active communities. The “Health Community” includes eight agencies. Prizes are offered to encourage participation and the development of applications. Awards will be made soon. Thirty-nine new apps have been posted in the Health Community since February. The “Law Community” is interested in posting associated opinions as to why agencies have done something. They do not have discussion forums yet, but it is significant in that it provides information from the agency Offices of General Counsel. “America at Work” is underway and includes the Small Business Administration, the Department of Commerce, and the Office of Naval Research. The interest is in creating technologies and making these connections into American Business. “America at Work” is a very US-focused community that serves as a brokerage service. NARA is interested in a history community and Data.gov is being taught in the classroom. “Learn at Data.gov” is a K-12 and University community.

Data.gov continues to work on semantic technologies to provide developers the tools and raw data formats to develop new capabilities. There has been a close partnership to date with Rensselaer Polytechnic Institute (RPI) and Jim Hendler’s group. They are also connected to other open data efforts around the world including W3C’s Open Linked Data in Government initiative.

The E-government budget was slashed 75 percent. However, Data.gov actually started saving funds last year and prepaid the infrastructure in anticipation of budget problems. There are five full- and part-time staff. The largest portion of the amount that E-government was budgeted went to Data.gov.

While metrics such as the number of downloads are important, it is difficult to determine the real impact. Social media such as Twitter and Google alerts are used to follow where the data are used. Data.gov looks for opportunities to engage directly with users through social media to ask how the data is being used and to collect anecdotes.

The path ahead is to get the data up and out of all the agencies and into the public eye by making data accessible in a variety of formats and through visualization and other tools. They are creating communities that understand and can apply this data, and are connecting and collaborating with others.

Ms. Holm also mentioned an expert system that was developed for NASA and is available for government-wide licensing. She will send the link to the Secretariat for distribution.

ACTION: Ms. Holm will provide the Secretariat with the URL for the NASA expert system.


Jewels of the USGS Library - Richard Huffine [web demo only]

The USGS Library, now part of Core Science, was established in 1882 and is the world’s largest earth and natural science library. There are four libraries in the system that serve more than 400 physical USGS locations throughout the US.

Mr. Huffine showed some of the rarest holdings of the library, with the oldest holding dating back to 1502.

The Library is working to digitize holdings. It has a large number of photographs online and more that are waiting to be added. More than 300,000 topographic maps would be added, with every edition, scale, and timeframe available in a single place. The Library now includes the Publications Group, providing a more direct conduit for the publications to be stored and accessed. A Fedora Commons repository is ready to be deployed in the next year.

There are more than 1600 downloads per day by scientists. Because of the size and uniqueness of the collection, efforts are underway to name the library as a National Library of Earth Science.



New Advances in the Integrated Taxonomic Information System - Gerald “Stinger” Guala [presentation .pdf]

ITIS is an interagency activity to provide authoritative names for species. The USGS is the lead agency. The National Park Service and the Smithsonian are major content providers. ITIS contributes half of the international Catalog of Life entries and currently includes 538,000 scientific names and 110,000 common names. Even though the focus is on North America, many of the scientific names have global treatments. The preferred names are included along with synonyms.

A key component of ITIS is the Taxonomic Serial Number which serves as a unique identifier. ITIS also includes references to the literature, though they are looking for a better way to database these citations. Persistent identifiers such as the DOI/Handle have been discussed, but a large amount of the literature is not in journal or book publications.

Multilingual interfaces have been developed over the years. While there is an online look-up capability, it has not been the focus. Most users download the whole file. Web services, including customized ones, are also available.

The Taxonomic Workbench is an online system for taking submissions and then providing candidates to the data manager for approval. Recent activity has included a major update to plants, many invertebrate groups, and all vertebrates. Focus areas for 2011 were also described.


Raptor Search Engine and Data Visualization - Tim Woods [presentation .pdf]

The Raptor search engine received an Honorable Mention by the Government Computer News. It is based on Vivisimo 7.0 and is used as a traditional search engine with a central index. In addition, multiple instances can be created for specific projects. The search results are able to be passed to other applications to enhance the functionality. Raptor currently includes over 50 content sources from USGS, other agencies, and not-for-profit organizations. The main audiences are conservation managers, biologists in the field, and educators.

The interface allows for search refinement, sorting, geospatial searching by a bounding box, dynamic clusters which integrate the Biocomplexity Thesaurus, the integration of the LIFE image search, and document preview features. Approximately six of the resources have adequate geospatial referencing to allow for geospatial searching.

Search results can be moved to data visualization tools including Data Explorer. There are several existing USGS data products such as GAP and two protected area sources. The development team is looking for other data layers to add. The results can be saved as XML so the user can bring them to a GIS application on the desktop.