Data Guidelines

General Guidelines

  • Use consistent code(s) throughout files to indicate missing data values (e.g., “NODATA”)
  • Always include units of measurements
  • Use consistent capitalization/punctuation/abbreviations throughout (e.g., don’t switch between “in”, “IN”, and “inches”)
  • Use descriptive file names (see section below)
  • Give data set descriptive title
    Examples:
    Bad: “Respiration Data”, “The Aerostar 100 Data Set”
    Good: “LBA Respiration Data for Broadleaf Evergreen Trees in Rondonia, Brazil, 1999-2000”
  • Include a README.txt file with the data set that describes the data: how it was collected, how the directory structure is organized, which file formats are being used, etc.
  • Always preserve the raw data collected - if using a script to clean data, save processed data as new file

Spreadsheets

  • Use a non-proprietary format such as csv (comma separated values) or txt (ASCII plain text) when possible
    Note: MS Excel files can be exported to csv format using “Save As…” from File menu
  • Use separate spreadsheets for separate data rather than a single tabbed spreadsheet
  • Use text only to convey information - no color coding, special fonts, etc.
  • Make sure that the software isn’t using precision higher than that of the collected data, e.g., Excel cells set to display a certain number of decimal places

Filename Recommendations

  • Use descriptive filenames:
    • Project acronym
    • Study title
    • Location
    • Investigator(s)
    • Year(s) of study
    • Data type
    • Version number
    • File type
      Examples:
      Bad: mydata.dat, 1998.csv
      Good: narsto_texas_pm2.5_study_1997-1998.csv
  • Dates: begin the filenames with the date in Year-Month-Day format (e.g., YYYYMMDD) so that files are automatically listed in chronological order
  • Date and time: similarly, use Year-Month-Day-Hour-Minute-Second format (e.g., YYYYMMDD-HHMMSS) at beginning of filename
    • Use 24-hour clock
    • Include time zone
  • Avoid using spaces in filenames; use “_” or “-“ as delimiters
  • Use file extensions to indicate data format (e.g., .txt, .csv, .png, .nc)

Further Reading

DataONE Best Practices

Source: Strasser, C., Cook, R., Michener, W., & Budden, A. (2012). Primer on data management: What you always wanted to know. A DataONE publication. http://dx.doi.org/doi:10.5060/D2251G48 (available via CDL)