Data Guidelines
General Guidelines
- Use consistent code(s) throughout files to indicate missing data values (e.g., ?NODATA?)
- Always include units of measurements
- Use consistent capitalization/punctuation/abbreviations throughout (e.g., don?t switch between ?in?, ?IN?, and ?inches?)
- Use descriptive file names (see section below)
- Give data set descriptive title
Examples:
Bad: ?Respiration Data?, ?The Aerostar 100 Data Set?
Good: ?LBA Respiration Data for Broadleaf Evergreen Trees in Rondonia, Brazil, 1999-2000? - Include a README.txt file with the data set that describes the data: how it was collected, how the directory structure is organized, which file formats are being used, etc.
- Always preserve the raw data collected - if using a script to clean data, save processed data as new file
Spreadsheets
- Use a non-proprietary format such as csv (comma separated values) or txt (ASCII plain text) when possible
Note: MS Excel files can be exported to csv format using ?Save As?? from File menu - Use separate spreadsheets for separate data rather than a single tabbed spreadsheet
- Use text only to convey information - no color coding, special fonts, etc.
- Make sure that the software isn?t using precision higher than that of the collected data, e.g., Excel cells set to display a certain number of decimal places
Filename Recommendations
- Use descriptive filenames:
- Project acronym
- Study title
- Location
- Investigator(s)
- Year(s) of study
- Data type
- Version number
- File type
Examples:
Bad: mydata.dat, 1998.csv
Good: narsto_texas_pm2.5_study_1997-1998.csv
- Dates: begin the filenames with the date in Year-Month-Day format (e.g., YYYYMMDD) so that files are automatically listed in chronological order
- Date and time: similarly, use Year-Month-Day-Hour-Minute-Second format (e.g., YYYYMMDD-HHMMSS) at beginning of filename
- Use 24-hour clock
- Include time zone
- Avoid using spaces in filenames; use ?_? or ?-? as delimiters
- Use file extensions to indicate data format (e.g., .txt, .csv, .png, .nc)
Further Reading
Source: Strasser, C., Cook, R., Michener, W., & Budden, A. (2012). Primer on data management: What you always wanted to know. A DataONE publication. http://dx.doi.org/doi:10.5060/D2251G48 (available via CDL)