Selecting the optimal file format(s) for your data will help ensure that your data will be accessible for future use (your own, and for others). When selecting tools for your data, pay special attention to the output formats of your data.
Use open, non-proprietary formats
Open, non-proprietary formats are far more likely to remain usable even if the software that created them is not available or no longer functional. Formats whose documentation is complete and freely available also have a higher likelihood of long-term preservation. If the program that created the file is the only option for reading or accessing the data, it is likely to be a proprietary, non-open format. As a general rule, plain text formats, such as comma- or tab- delimited files, are open formats and are typically better for re-use and long-term preservation.
Image file examples:
|proprietary format||open format|
|.psd file||.tiff image file|
Use “lossless” formats
Formats that compress the information in a file are often smaller, but the compression often permanently removes data from the file. These formats are “lossy,” while formats that do not result in the loss of information when uncompressed are “lossless.”
Audio and image file examples:
|lossy formats||lossless formats|
|.mp3 audio file, .jpeg image file||.wav audio file, .tiff image file|
Use unencrypted and uncompiled formats
If the encryption key, passphrase, or password to a file is lost, there may be no way to retrieve the data from the file later, rendering it unusable to others.
Uncompiled source code is more readily re-usable by others and has a far greater likelihood of remaining usable over time since recompiling is possible on different architectures and platform
For data management plans, or when you are sharing/publishing the data, make sure to describe:
- Software (and version) necessary to view the data (e.g. SPSS v.3; Microsoft Excel 97-2003)
- How you will manage versions of the files themselves (version control)
- If data will be stored in one format during collection and analysis and then transferred to another format for preservation. List what may be lost in data conversion, such as embedded metadata.
Recommended file formats
The UK Data Service provides a table listing recommended formats.
Data repositories in your discipline may have guidance or requirements for file formats as well.