Organising your data files
Good file and folder organisation will help you to locate, identify and retrieve your data quickly and accurately, thereby making it easier for you to manage your data. To achieve this, you need to do at least two things:
- Use folders to sort your files into a series of meaningful and useful groups;
- Use naming conventions to give your files and folders meaningful names according to a consistent pattern.
You should establish a file organisation scheme at the start of your project, to avoid having to apply one retrospectively:
- If you are new to a group, or are working with a research facility, check whether there is an established procedure to follow.
- If you are working within a research group, it is essential that the whole group agrees on a file organisation scheme so that everyone can find data within the groups shared storage area.
- If you are working alone it is still important for you to set up a scheme for yourself.
Once you have set up a file organisation scheme, you should document it: write down what should go in each folder and the naming convention you are using, along with any codes and abbreviations you are using. Save it in a 'readme' file, preferably in plain text, and store it in the top level folder for your project where you (or anyone in your group) will be able to access it easily.
Consider scheduling a regular review of your file organisation scheme:
- Make sure your files and folders conform to your scheme. It is easy to forget certain details, or to skip them when in haste, but if you tidy up your files regularly you can avoid problems.
- Make sure your scheme is working for you.. If you find you need to change or refine it, do not be afraid to do so. Just make sure you apply the changes consistently and update your 'readme' file accordingly.
Although these principles are aimed at digital files and folders, it is just as important to organise physical files, folders and other materials in a meaningful, consistent and documented fashion.
Structuring files and folders
There are many "right" ways of organising your files so think about what makes sense for your research.
If you are doing experimental work, for example, you might want to organise the results into folders by the date you did the experiment, or by a key experimental condition.
The following suggestions will help you to organise your data:
- Use folders to group files with common properties: Think about how you might want to browse for your files in future. Are you more likely to want files from a particular day or a particular instrument? You should avoid grouping files by the individual responsible as this can cause confusion when group members leave or join.
- Apply meaningful folder names: Ensure that you use clear and appropriate folder names that concisely convey the common property of the files inside.
- Keep group numbers manageable: If you end up with only one or two files in each folder, you may find your structure tedious to navigate, but if you have hundreds it can be time consuming to look through them all for the file you want.
- Structure folders hierarchically: Design a folder structure with broad topics at the highest level and specific folders within these. Try to avoid nesting folders too deeply, however, as this may cause problems with path lengths.
- Separate current and completed work: You may find it helpful to move temporary drafts and completed work into separate folders. This will also make it easier to review what you need to keep as you go along.
- Control access at the highest level: It is easier to set access permissions near the top of your folder structure than to control permissions for multiple deeply nested folders. This is particularly important if you need to grant someone access to only a subset of your data, in which case you could move these data to a new, higher-level folder.
Unlike with physical records, it is possible for digital folders to appear in more than one place in the hierarchy by means of shortcut links. This can help if different members of a group need the files to be organised in different ways, but the technique should be used sparingly.
Further guidance on structuring files and folders
You may find the following external guidance useful.
- Guidance on organising data – UK Data Service
- Filing structures, in Managing digital records without an electronic record management system – National Archives (PDF, pp 14–22)
Naming files and folders
Below is a list of 8 basic rules for naming files that should be applied to all documents created
1. File and Folder names should be short, consistent and meaningful
They should mean something to anyone who is looking for them, and not just the person who created them or is responsible for them.
2. Avoid unnecessary repetition in file names and paths
For instance Application Forms
NOT Application Forms
\FST Application Forms
\2016 FST Application Forms
\2017 FST Application Forms
3. Use capital letters to delimit words, not spaces or underscores
Avoid using spaces and underscores in file names. Some software packages have difficulty recognising file names with spaces. This causes difficulty for files when they are published on the intranet or website.
4. Dates should appear as YYYY-MM-DD
A date should be used at some point in the folder structure, and/or in the file name where appropriate. Putting them front to back means the chronological order of the records is maintained when listed in a file directory.
5. When including a number in a file name always give it as a two-digit number
For instance 01, 02 unless it is a year or another number with more than two digits.
6. Version control
Where application, indicated where a document is ‘Draft’ or ‘Final’. You can also indicate the version of a document by the inclusion of a ‘V’ followed by the version number.
eg TenderDocumentV01 to indicate approved document
eg TenderDocumentV01.1 to indicated document under review.
7. The most important element for finding a document should appear first
Elements of file/folder names should be ordered according to the way in which the record will be retrieved during the course of everyday business. For example, if records are retrieved according to their date, then the date element should appear first.
8. Avoid using non-alphanumeric characters in file names
Avoid: * : \ / < > | " ? [ ] ; = + & £ $ , .
The use of these characters can cause problems. Even if your operating system allows you to save the file, you may not be able to transport it to another operating system. For example, if you send it to someone else externally they may not be able to open it.
Keeping track of versions
As you work with your data it is important to distinguish between different versions or drafts of your files. Version control can help you to easily identify the current version of your data so that you avoid working on older or outdated copies. If you are working with others it can also help to link versions of the data to the time and author of the change.
There are a number of ways that different versions of data can be managed:
File naming: A simple method of version control is to create a duplicate copy and then update version information to create a unique file or folder name.
- Successive versions can be numbered sequentially using one to three integers:
- If you only expect to generate a small number of versions, a single integer may suffice (i.e. 1, 2, 3).
- If you expect a moderate number of revisions, a two-integer scheme will be more useful (e.g. 1-0, 1-1, 1-2, 2-0, 2-1). The first number is used for major revisions and the second for minor edits.
- For more complex files, a three-integer scheme might be needed (e.g. 1-0-0, 1-0-1, 1-1-0, 2-0-0). The first number is increased if the change might 'break' references from other files. The second and third numbers are increased for additions and fixes, respectively, that would not 'break' any incoming references.
- Major version number 0 (e.g. 0-1, 0-2-1) is sometimes used to indicate an incomplete initial draft.
- If you are working as part of a group it may help to include the initials of the person who made the change e.g. v1-0jm, v1-1ke, v2-0gb.
Version control tables: These are included within documents and can capture more information than using file naming conventions. Version control tables typically include the new version number, date of the change, person who made the change and the nature or purpose of the change.
Version control systems: There are many automated systems available that can store a repository of files and monitor access to them, logging who made what change and when. Version control systems are particularly useful for collaborative development of code or software (eg Github).
Further guidance on version control
You may find the following external guidance useful.
- Version control and authenticity – UK Data Service
- Semantic version numbering is a three-integer scheme devised for software on which other software might depend
Example case of organising data files
The principal investigator (PI) of a large multi-institutional project was faced with the issue that each partner would hold an overlapping subset of the project data, with files shared with other partners as needed. In order to ensure consistency and coordination across the partner institutions, the PI drafted a folder structure and file naming convention.
The research workflow would involve taking detailed measurements of samples, and analysing the raw data in a variety of ways to generate different characterisations. The data might therefore need to be accessed by sample, characterisation technique, or characterisation purpose. The raw data would also need to be kept separate from derived data, to protect the raw data from inadvertent changes and permit timely sharing between partners.
The convention chosen was as follows. Within the project folder, subfolders were created for each work package ('WP1', 'WP2', etc.), for raw data ('Raw_Data'), and for characterisation templates ('Templates'):
- Within the work package folders, derived data files were organised into folders named after the sample number. These folders also contained filesystem shortcut links to corresponding raw data folders.
- Within the top raw data folder, files were organised in a two-level folder structure, with the first level folders named after the session identifier (consisting of a data type code, an underscore, and the date in YYYYMMDD[HHMM] format) and the second level folders after the sample number, for example SEM_20150609\SEM_20150609_AB123-4-5-6. The latter folder names had the session ID prepended so that the shortcut links mentioned above would have unique names even if the same sample was characterised across several sessions.
- Templates were stored directly within the templates folder, named after the characterisation type to which they referred.
We acknowledge the work of the UK Data Service, the University of Glasgow, the University of Leicester and the University of Southampton in the development of this guidance.