Data Management for Undergraduate Researchers

Понравилась презентация – покажи это...

Слайд 0

Data Management for Undergraduate Researchers Office of Undergraduate Research Seminar and Workshop Series Rebekah Cummings, Research Data Management Librarian J. Willard Marriott Library, University of Utah June 18, 2015

Слайд 1

Introductions What are data? Why manage data? Data Management Plans File Naming Metadata Storage and Archiving Questions

Слайд 2

Name Major Research Project

Слайд 3

What are data? “The recorded factual material commonly accepted in the research community as necessary to validate research findings.” - U.S. OMB Circular A-110

Слайд 4

Data are diverse

Слайд 5

Data are messy

Слайд 6

Why manage data? Your best collaborator is yourself six months from now, and your past self doesn’t answer emails.

Слайд 7

Why else manage data? Save time and efficiency Meet grant requirements Promote reproducible research Enable new discoveries from your data Make the results of publicly funded research publicly available

Слайд 8

We are trying to avoid this scenario…

Слайд 9

Two bears data management problems Didn’t know where he stored the data Saved one copy of the data on a USB drive Data was in a format that could only be read by outdated, proprietary software No codebook to explain the variable names Variable names were not descriptive No contact information for the co-author Sam Lee

Слайд 10

Data Management Plan PLANNING Courtesy of the UK Data Archive http://www.data-archive.ac.uk/create-manage/life-cycle

Слайд 11

Scenario You develop a research project during your undergraduate experience. You write up the results, which are accepted by a reputable journal. People start citing your work! Three years later someone accuses you of falsifying your work. Scenario adapted from MANTRA training module

Слайд 12

Would you be able to prove you did the work as you described in the article? What would you need to prove you hadn’t falsified the data? What should you have done throughout your research study to be able to prove you did the work as described?

Слайд 13

Elements of a DMP Types of data, including file formats Data description Data storage Data sharing, including confidentiality or security restrictions Data archiving and responsibility Data management costs

Слайд 14

File naming

Слайд 15

File naming best practices Be descriptive Don’t be generic Appropriate length Be consistent

Слайд 16

PLPP_EvaluationData_Workshop2_2014.xlsx MyData.xlsx publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx Who filed better?

Слайд 17

File naming best practices Files should include only letters, numbers, and underscores. No special characters (%@#*?!) No spaces Lowercase or camel case (LikeThis) Not all systems are case sensitive. Assume this, THIS, and tHiS are the same.

Слайд 18

Dates and numbering… 1. Use leading zeros for scalability 001 002 009 019 999 2. If using dates use YYYYMMDD June2015 = BAD! 06-18-2015 = BAD! 20150618 = GREAT! 2015-06-18 = This is fine too ?

Слайд 19

Who filed better? July 24 2014_SoilSamples%_v6 20140724_NSF_SoilSamples_Cummings SoilSamples_FINAL

Слайд 20

File organization best practices Top level folder should include project title and date. Sub-structure should have a clear and consistent naming convention. Document your structure in a README text file.

Слайд 21

File organization exercise

Слайд 22

Metadata Unstructured Data Structured Data There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells.

Слайд 23

Why create metadata?

Слайд 24


Слайд 25

Data documentation includes… Questionnaires Interview protocols Lab notebooks Code or scripts Consent forms Samples, weights, methods Read me files

Слайд 26

Data Storage

Слайд 27

LOCKSS (Lots of Copies Keeps Stuff Safe)

Слайд 28

Options for data storage Personal computers or laptops Networked drives External storage devices

Слайд 29

Storing sensitive data If possible, collect the necessary data without using direct identifiers Otherwise, de-identify your data upon collection or immediately afterwards Do not store or share sensitive data on unencrypted devices Talk to IRB

Слайд 30

Thinking long-term

Слайд 31

Archiving options Public repository – FigShare Domain-specific repository Institutional repository

Слайд 32

Major takeaways Data management starts at the beginning of a project Document your data so that someone else could understand it Have more than one copy of your data Consider archiving options when you are done with your project

Слайд 33

Questions? rebekah.cummings@utah.edu (801) 581-7701 Marriott Library, 1705Y …or ask now!