NSF Funding & New Data Initiatives: Library Repositories on the Leading Edge Panel

Research Data Management Services at the MIT Libraries
Amy Stout

“Science changes the tools and the tools change science.”

“Our ability to create data has outpaced our ability to organize and store it.”

What can librarians do? Stout suggests we learn as much as possible about our departments and their data. We can respond to these changing environments, we can understand the fields we support, we know how to organize, make accessible, and preserve data. You can have an understanding of how to deal with the data, without really understanding the specific data.

Since 2006, study group formed at MIT Libraries, in 2008 brought in a social sciences data librarian and geosciences/GIS expert. For Stout, this is 30% of her job. Services offered:

  • Web site:  Data Management and Publishing 
  • Education: Managing Research Data 101 (4-5 times per year), new presentation coming soon 
  • Bioinformatics for Beginners (team taught with bioinformatics librarian) – using NCBI resources, especially BLAST 
  • One-on-one consulting: format migration, DM plans, working on template which will be on their site soon
  • Radish: dspace.mit.edu/handle/1721.1/62236 – data set collection example. Small pilot which libraries helped faculty member bring data from another institution. Raised questions such as how to handle non-MIT contributors. Still working on this issue! Also, brought up file type issues: multiple/zip file issues (need software to unpack on server and repack on server, not yet integrated with IR). Inconsistent metadata, much of it esoteric, what is needed? Working through this issue too. 
  • Creating data profiles of individual researchers and data audits of entire departments.  
  • Developing service model for assisting researchers in the lab. 
  • Liaison librarian outreach: developing discipline-specific knowledgebase
Stout suggests librarians “try new things, just call them pilot."

Active Data Curation in Libraries: Issues and Challenges William H. Mischo & Mary C. Schlembach At University of Illinois Urbana-Champaign, Mischo and others are working to embed data curation within the scientific workflow of researchers. Solutions that library IT and others are campus will have a great impact on librarians and libraries, at UIUC librarians are focusing on connecting data to literature, determining their role within the knowledge creation process, and creating GrIPs (Group Information Profiles) on faculty centers. These profiles are online and linked to Scopus, Google news, as well as specific faculty publications and links to searches in their focused areas of research. They also integrate their custom metasearch box within these search profiles.

What data should be curated? They suggest librarians check out federally funded projects such as DataNet, Data Conservancy, DataONE, and Purdue Data Curation Profiles. What levels of data and streams need to be saved? Raw, calibrated, image products which visualize data, derived data, all of this or only some. Also, instrumentation data and metadata must be saved.

For NSF Data Management Plans (DMP) be sure to see varying requirements for engineering directorate, raw data not required to be archived for instance. Check out UIUC Grainger library website and template for DMPs. They are strongly encouraging use of the institutional repository to deposit data. Recent grant was funded, the NSF Ethics CORE Digital Library so stay tuned for more information on this. Mischo and others are working on Responsible Conduct of Research requirement database and wizard to help researchers.

Developing a Data Program at Stanford University
Bob Schwartzwalder pointed out there’s been a surge in interest in reusing data and there is a great economic value in doing this. Librarian’s jobs are changing as there is a packaged approach to information acquisition. At Stanford, there is a wonderful opportunity where librarians “can provide value and benefit not only to communities but society at large.” Leveraging current work with digital repository, partnerships with faculty, and building on existing expertise. Recently, librarians have expanded expertise in the geospatial area.

Establishing integrated data service meets needs of their organization. Metadata issues are critical, especially with the potential of data reuse. Metadata standards are a “mixed playing field.” Some arenas have advanced metadata protocols, while others have none. Data is a “collection issue” and at Stanford revamping collection development policy to support storage and reuse of data. Context may be needed to translate and utilize these data.

For NSF DMPs, librarians at Stanford right now are offering one-on-one consultations to learn needs. Changes in staffing are underway, for instance in 2010 created Associate Director position for STEM data, also a Data Librarian in 2011, and other future plans for staffing shifts are underway.

SUL’s technical infrastructure has three layers. The Stanford Digital Repository as the base with  users/librarians getting info in through the digital object registry (hydra), as well as get info out though the digital delivery system (SUL use Blacklight, searchworks).

Schwartzwalder’s crystal ball: he sees more changes in staffing and focus, a need to build program to assess faculty practices, design technology, need to develop pilot projects and new polices, as well as a need to develop tool sets to “use” data. Tool sets are still an unexplored area with much potential.

Conversations with scientific publishers are also needed, could be assumptions on whether (or not) data included are also peer reviewed.

Q & A
Role for librarians who don’t have institutional repositories?
Promoting the inclusion of data into public repositories. Offer distributed data services, education, not storing data. In the future more collaborative portals will be available for researchers to archive data.

 See ICPSR for example for Social Science data, this data is more homogenous so it’s easier.

Some confusion expressed over goals/commitment involved with ARL eScience Initiative which kicks off in July with a webinar, some uncertainty of what level of commitment, time and outcomes for this program, but this requires a lot of staff commitment.
