Open Data, Software and Code Guidelines (PSE)
Open Data, Software and Code Guidelines (PSE)
These guidelines relate to the F1000Research policy on data availability, which requires all authors to share the underlying data which relates to their article. The policy text can be read here.
For more information on each of the requirements, please see Further Guidance.
Exceptions: We recognize that openly sharing data may not always be feasible. For example, where third party proprietary software has been used and no open-source alternative is available, or if the underlying data is subject to export control legislation. If your data must be restricted for legal, security, ethical, or other reasons, please see below for further information on what should be included in your data availability statement.
What is Open Data?
The data underpinning your article may consist of different quantitative and qualitative types of data. Depending on your study design, you may have reused existing data or generated new datasets.
Listed below are some common data examples from physical sciences and engineering (PSE) disciplines.
DISCIPLINE | DATA EXAMPLE |
Computer and information sciences | Source code, software, training and benchmarking data for machine learning models |
Chemical sciences | Crystallographic information files, 3D protein structures, chemical structures, spectroscopy data |
Earth and related environmental sciences | Georeferenced observational and experimental data, earth system data, geochemical data, species occurrences |
Engineering and technology | Energy modelling data, 3D printing models, life cycle assessments |
Mathematics | Computer-algebra systems, computational proofs, algorithms |
Physical sciences | Material properties, ab initio calculations, photometry data, experimental particle physics data |
F1000Research requires you to provide access to all of the data you have generated or reused in your research. This is a key step to ensure that your research and methods are transparent and that your results can be reproduced (where relevant).
- If you generated new datasets (for example computational results, mathematical models, spectroscopic data), you must deposit them into an appropriate data repository and describe in your Data Availability Statement how they can be accessed and reused by others.
- If you reused existing datasets (for example NASA’s Climate Data Services, Materials Cloud datasets, or data generated by another researcher), you must describe in your Data Availability Statement how they can be accessed and reused by others.
We understand researchers working in physical sciences and engineering disciplines face challenges when sharing their data, software and code. These can range from concerns around intellectual property to choosing the most appropriate file formats and licenses for your data. The Open Data, Software and Code Guidelines (PSE) aim to assist you in complying with the F1000Research data policy and to address any data issues you may have prior to submitting your article.
If you have any questions that are not answered in these guidelines, please contact our editorial team for assistance.
What is required when submitting an article
- Your dataset(s) must be deposited in an appropriate data repository.
- Your dataset(s) must have a license applied which allows reuse by others (CC0 or CC-BY).
- Your dataset(s) must have a persistent identifier (e.g. a DOI) allocated by a data repository.
- You must provide a data availability statement as a section at the end of your article, including elements 1-3.
- You must include a data citation and add a reference to data to your reference list.
- Your dataset(s) must not contain any sensitive information, for example in relation to human research participants.
- You should share any related software and code.
- Your dataset(s) must be useful and reusable by others, adhere to any relevant data sharing standards in your discipline and align with the FAIR Data Principles.
- Your dataset(s) should link back to your article, if possible.
If you fail to adhere to these guidelines when submitting, the publication of your article may be delayed, and your article may ultimately be rejected.
Further Guidance
1. Your dataset(s) must be deposited in an appropriate data repository
Before submission, you should deposit your data in an appropriate data repository and ensure that the dataset is published openly on the web. The data should be stored in an Open file format. The repository you choose must supply you with a persistent identifier (for example a DOI or accession code) and allow you to apply an open license, which must be CC0, CC-BY 4.0 or equivalent. Please include descriptive legends and where applicable, coding schemas alongside your datasets.
Most repositories do not charge a fee for deposit; however, a fee may apply if the repository provides data checking or curation services; or if you are storing very large datasets (for example over 100GB).
Discipline-specific repositories
F1000Research strongly encourages the use of community-recognized and discipline-specific repositories where they are available.
For some data types such as crystallographic data, depositing data into a specific data repository is mandatory. A list of appropriate data repositories for disciplinary data is available below.
Generalist repositories
If there is no appropriate discipline-specific repository available, please deposit your data in a generalist data repository, an institutional data repository (for example provided by your university), or a national data repository.
Controlled access repositories
If you cannot share your data openly, for example due to security considerations, you may choose to use a repository which restricts or controls who can access your data and for what purposes. Information on who may be eligible to access this data and how they should request access must be included in your Data Availability Statement.
2. Your dataset(s) must be openly licensed
To allow the maximum possible reuse, your dataset(s) should be published with a CC0 Public Domain Dedication, which does not retain any rights to the data. Alternatively, a CC-BY 4.0 Creative Commons Attribution Only license, which requires others to attribute you when using the data, is acceptable. Your chosen repository should allow you to apply a CC0 Public Domain Dedication, CC-BY 4.0 license or equivalent to your data.
For software and source code, we strongly advise you to use an OSI-approved license.
3. Your dataset(s) must have a persistent identifier
Persistent identifiers allow datasets to be uniquely identified on the web. Commonly used persistent identifiers include DOIs and accession numbers. Your chosen data repository should provide you with a persistent identifier for each dataset that you deposit.
We also recommend that you use an appropriate Research Resource Identifier (RRID) to unambiguously identify any tools such as software, databases or services which you used in your research. RRIDs can be found on the Resource Identification Portal and should be included in your Methods section.
4. You must provide a data availability statement
You must include a data availability statement at the end of your article, before the reference list, describing each dataset and including a link to the relevant repository and the dataset’s persistent identifier.
When drafting the statement, please include:
- The name of the repository used;
- A brief description of the contents of each dataset;
- A statement that the dataset has a CC0 Public Domain Dedication or CC-BY 4.0 license applied.
If your data must be restricted for legal, ethical, or other reasons, please see below for further information on what should be included in your data availability statement.
Examples:
Data Type | Data Availability Statement Example | Data Citation Example |
Data deposited into a generalist repository |
Figshare: Superradiant_laser_Figures, https://doi.org/10.6084/m9.figshare.15321819 (Bychek 2021). This project contains the following underlying data:
|
Bychek A: Superradiant_laser_Figures. Figshare. Dataset. 2021. Bychek A, Hotter C, Plankensteiner D and Ritsch H. Superradiant lasing in inhomogeneously broadened ensembles with spatially varying coupling. Open Res Europe 2021, 1:73 (https://doi.org/10.12688/openreseurope.13781.2) |
Data deposited into a repository with accession codes | The underlying data has been deposited in the ProteomeXchange Consortium via the PRIDE partner repository, accession number PXD027611: https://identifiers.org/pride.project:PXD027611. |
Wright, J and Choudhary, J. Identifying and characterizing Thrap3, Bclaf1 and Erh direct interactions using cross-linking mass spectrometry. PRIDE. 2021. https://identifiers.org/pride.project:PXD027611. Example taken from: Shcherbakova L, Pardo M, Roumeliotis T and Choudhary J. Identifying and characterising Thrap3, Bclaf1 and Erh interactions using cross-linking mass spectrometry. Wellcome Open Res 2021, 6:260 (https://doi.org/10.12688/wellcomeopenres.17160.1) |
Data with access restrictions |
Zenodo: robfairh/2023_nstor_sdr: Published version of the dataset: https://doi.org/10.5281/zenodo.8388146.
The data not included in the repository were generated with export control software, and they should also be treated as export control. However, the repository includes all the input files necessary to reproduce this work. The export control data may be released to people holding the right licenses, and any release will be determined on a case-by-case basis. |
Fairhurst R: robfairh/2023_nstor_sdr: Published version of the dataset. 2023. http://www.doi.org/ 10.5281/zenodo.8388146. Example taken from: Fairhurst-Agosta R and Kozlowski T. Shutdown dose rate calculations in high-temperature gas-cooled reactors using the MCNP-ORIGEN activation automation tool. Nucl Sci Technol Open Res 2023, 1:20 (https://doi.org/10.12688/nuclscitechnolopenres.17447.1) |
Articles without data | No data associated with this article | None required |
Articles where the data consists of bibliographic references | The data for this article consists of bibliographic references, which are included in the References section. | Standard bibliographic references |
5. You must include a data citation and add a reference to data to your reference list
Your dataset should be cited in the body of your article, and you should add the dataset to your reference list as you would any other bibliographic citation.
You may use your preferred referencing style but should include, at a minimum:
Dataset creator; Publication year; Dataset title; Name of repository where the data is located; Persistent Identifier (e.g. DOI).
Dataset creator; Publication year; Dataset title; Name of repository where the data is located; Persistent Identifier (e.g. DOI).
Please add [Dataset] to the reference to denote its type.
6. Your dataset(s) must not contain any sensitive information
It is your responsibility to share data ethically and, where relevant, protect the privacy of your research participants. You should ensure that your datasets have been de-identified in accordance with the Safe Harbor method before submission.
Data sensitivity is not only connected to human research participants, so please check your datasets for other sensitive elements, for example the locations of endangered species or data with national security implications.
All articles should include details of any software and code that are required to view the datasets described or to replicate the analysis.
For software
For all software used, please state the version, details of where the software can be accessed, and any variable parameters that could impact the outcome of the results. If you have coded software in-house, the source code should be written in (or be compatible with) an Open Source programming language, and should be archived under an open license and shared. For code stored in GitHub, you should create a ‘public registration’ for your project to obtain a DOI.
Information about software should be included in a software availability statement, which you can add to the end of your article, before the references list.
When drafting the statement, please include:
- Software available from: URL for the website where software can be downloaded from, if applicable.
- Source code available from: URL for versioning control system (for example GitHub).
- Archived source code at time of publication: DOI and citation for project in Zenodo (please select the appropriate DOI for the version which underlies your article).
- License: Must be an open license and preferably an OSI-approved license.
Where third-party proprietary software has been used, a non-proprietary, Open Source alternative software should be suggested by the author to allow for the replication of the analysis or research by all readers. We recognize that there may be cases where this may not be feasible. Please see the limited exceptions to these guidelines for more information.
If there are ethical or privacy considerations as to why the source code may not be made available, please contact the editorial team.
For analysis code
If you have created custom analysis code, this should be archived under an open license and shared. For analysis code stored in GitHub, you should create a ‘public registration’ for your project to obtain a DOI. We recommend using an OSI-approved license, but CC-BY 4.0 is also acceptable.
Information about your archived analysis code should be included in your data availability statement, which you can add to the end of your article, before the references list.
When drafting the statement, please include, under the heading “Extended Data”:
- Analysis code available from: URL for versioning control system (for example GitHub)
- Archived analysis code as at time of publication: DOI and citation, e.g. from Zenodo (please select the appropriate DOI for the version which underlies your article).
- License: Must be an open license and preferably an OSI-approved license or CC-BY 4.0.
Code and software should be cited in the body of your article, be added to your reference list as you would any other bibliographic citation.
You may use your preferred referencing style but should include, at a minimum:
Creator(s); Publication year; Title; Publication venue; Publication date; Persistent Identifier (e.g. DOI); Version.
Creator(s); Publication year; Title; Publication venue; Publication date; Persistent Identifier (e.g. DOI); Version.
Please add either [Software] or [Code] as part of the reference to denote its type
8. Your dataset(s) must be useful and reusable by others, adhere to any relevant data sharing standards in your discipline and align with the FAIR Data Principles
The FAIR Data Principles: F1000Research endorses the FAIR Data Principles as a framework to promote the broadest reuse of research data. Datasets which are “FAIR” are Findable, Accessible, Interoperable and Reusable. More information on the FAIR Data Principles and how you can align your data sharing methods with them is available here.
Relevant data sharing standards: Data standards help you to align with commonly used data sharing practices in your field, for example how your data should be structured, formatted and annotated. Please check FAIRSharing.org for details of data standards specific to the topic of your research.
9. Your dataset(s) should link back to your article
Some data repositories provide functionality which allows you to add links to any published articles associated with your dataset. If possible, we recommend that you update your metadata record in the data repository to include a link to your published article. You can link to the article using your article DOI, which will be emailed to you when your article is published.
Limited exceptions to these guidelines
Ethical or security considerations
If data access is restricted for ethical or security reasons, such as being subject to export control regulations, please use your data availability statement to include a description of the restrictions on the data and all necessary information required for a reader or reviewer to apply for access to the data and the conditions under which access will be granted.
Data protection and participant privacy
Where human data cannot be sufficiently de-identified to protect participant privacy, we recommend depositing the data into a controlled access repository, if your ethical approval and participant consent permits you to do so.
If you cannot share the data in a repository, please include in your data availability statement: an explanation of the data protection concern; what, if anything, the relevant Institutional Review Board (IRB) or equivalent said about data sharing; and, where applicable, all necessary information required for a reader or reviewer to apply for access to the data and the conditions under which access will be granted.
Large data
Where data is too large to be feasibly hosted by a F1000Research-approved repository, please include all necessary information required for a reader or reviewer to access the data with a description of the access process as part of your data availability statement.
Data under license or provided by a third party
In cases where data has been obtained from a third party and restrictions apply to the availability of the data, the data availability statement must include all necessary information required for a reader or reviewer to access the data by the same means as the authors, in addition to details of any publicly available data that is representative of the analysed dataset, which can be used to apply the methodology described in the article.
Proprietary software
Where third party proprietary software has been used, an open source alternative must be provided in the article to allow for the replication of the analysis or research by all readers. Exceptions may be made if the chosen proprietary software performs specific functions and there is no open source alternative that can carry out these functions in the same manner.
If this applies to your article, your data availability statement should include a clear description of the third party proprietary software used, including the name and version number, and what it was used for in the research. The article must also include a detailed Methods section that allows for replication of; for example, the mathematics underpinning any of the simulations or calculations run using the proprietary software. You must also share any output data or analysis code generated during the research, openly and ideally in an open file format, and these must also be described in the data availability statement.
If you are unable to share your data, software or code for any reason not included here, or have additional questions about data sharing, please let our editorial team know and we will be happy to advise.
The FAIR Data Principles
F1000Research endorses the FAIR Data Principles as a framework to promote the broadest reuse of research data.
Additional, practical guidance can be found on the GoFAIR website.
For research software, please consult the FAIR4RS Principles.
Findable
Findable data should be easy for both humans and machines to find.
Findable data requires that:
- F1. (Meta)data are assigned a globally unique and persistent identifier.
- F2. Data are described with rich metadata (defined by R1 below).
- F3. Metadata clearly and explicitly include the identifier of the data they describe.
- F4. (Meta)data are registered or indexed in a searchable resource.
The best way to achieve Findable data is by:
- Depositing your dataset into a recognized data repository which assigns globally unique persistent identifiers (such as DOIs).
- Add as much contextual information (metadata) as possible when depositing your dataset into the repository.
Accessible
Accessible data refers to data that can be accessed once found; this may involve authentication of the user and authorization of access.
Accessible data requires that:
-
A1. (Meta)data are retrievable by their identifier using a standardized communications protocol
- A1.1 The protocol is open, free, and universally implementable
- A1.2 The protocol allows for an authentication and authorization procedure, where necessary
- A2. Metadata are accessible, even when the data are no longer available
The best way to achieve Accessible data is by:
- Depositing your dataset into a recognized data repository which uses standard communications protocols like http://.
- Ensuring that the data repository you choose gives continued access to metadata even when datasets are removed.
Interoperable
Interoperable data refers to data that can be compared and combined with data from different sources, by both humans and machines.
Interoperable data requires that:
- I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- I2. (Meta)data use vocabularies that follow FAIR principles
- I3. (Meta)data include qualified references to other (meta)data
The best way to achieve Interoperable data is by:
- Checking FAIRsharing.org for the standards that apply to your data type and using them.
- Ensuring that the data repository you choose allows you to include links or references to other related data.
- Using open, non-proprietary file formats for your data.
Reusable
Sharing data which can be reused by others is the main goal of the FAIR Principles.
Reusable data requires that:
-
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
- R1.1. (Meta)data are released with a clear and accessible data usage license
- R1.2. (Meta)data are associated with detailed provenance
- R1.3. (Meta)data meet domain-relevant community standards
The best way to achieve Reusable data is by:
- Adding as much contextual information (metadata) as possible when depositing your dataset into a repository.
- Applying an open license to your data, preferably CC0 or CC-BY 4.0.
- Checking FAIRsharing.org for the standards that apply to your data type and using them.
F1000Research-approved repositories
Below is a list of repositories that have already been approved for hosting data alongside an F1000Research article.
If you are an author who wishes to use a repository not already on this list, including institutional data repositories, please contact us.
If you manage a repository and would like to be included on the list, please complete our Repository Evaluation form and return it to us.
In addition to your research data, you should ensure that your research materials and supporting documents are also deposited into an appropriate repository.
Some types of data benefit from visualization within the article. F1000Research welcomes the submission of articles featuring Plot.ly interactive figures and Code Ocean compute capsules. Videos and images can be displayed through a widget provided by Figshare. If you think your dataset would benefit from visualization, please contact us.
Datasets for which there is no discipline-specific repository; research materials and supporting documents
Data Type | Where To Submit* | What To Include In The Data Availability Section Of Your Article |
Any | Figshare$ | Title, DOI |
Any, but especially deposits with mixed data and code | Zenodo | Title, DOI |
Any | Dryad | Title, DOI |
Any, but especially data in SAV and POR formats | Dataverse | Title, DOI |
Any, but especially deposits with mixed data, materials and documents | Open Science Framework† | Title, DOI |
Deposits of mixed data and code | Code Ocean | Title, DOI, embed code for interactive reanalysis tool |
Any biological data, but especially data linked to studies in other databases | BioStudies | Title, accession number |
* Please note that many repositories have a limit on the size (usually 2 or 5 GB) of single file uploads and charge for larger data files.
$ If you think your data are suitable for visualization within your article through the Figshare viewer, please contact us.
† Deposits must be made public and your project must be registered to ensure that a record will remain persistent and unchangeable.
$ If you think your data are suitable for visualization within your article through the Figshare viewer, please contact us.
† Deposits must be made public and your project must be registered to ensure that a record will remain persistent and unchangeable.
Software & source code
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
Latest source code | GitHub or BitBucket | URL |
Archived source code | Zenodo | Title, DOI and license* used |
Deposits of mixed data and code | Code Ocean | Title, DOI, embed code for interactive reanalysis tool |
Software | Authors may host software where they wish, though it is strongly recommended to use a stable URL | URL |
* An open license must be assigned and we strongly advise authors to use an OSI-approved license.
Chemical and macromolecular structures
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
X-ray Crystallographic Information Files (CIFs), structure factors and checkCIF reports* | Cambridge Crystallographic Data Centre | Compound name, CCDC deposition number |
3D protein structures | Protein Data Bank | PDB number |
Crystallography* | Crystallography Open Database | COD ID |
X-ray images | Coherent X-ray Imaging Data Bank | Title, DOI |
Electron Microscopy | Electron Microscopy Data Resource (EMDB) | Accession number(s) |
NMR Spectroscopy | Biological Magnetic Resonance Data Bank (BMRB) | Accession number(s) |
Chemical structures, annotations and associated bioassay test results | PubChem | CID(s) |
Chemical structures, spectra and syntheses | ChemSpider | ChemSpider ID |
* X-ray crystallography validation reports should be submitted (as a PDF) directly to F1000Research via the submission system.
Physics
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
High Energy Physics | HEPData | Title, DOI |
Materials Science
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
Ab initio electronic structures | NOMAD Repository | Title, DOI |
Computational, but especially calculations with full provenance | Materials Cloud | Title, DOI |
Environmental and ecological data
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
Complex environmental and ecological data | The Knowledge Network for Biocomplexity* | Title, DOI |
Environmental data collected by NERC-funded researchers | NERC data centres | Data centre name, title and DOI |
Geospatial | PANGAEA | Title, DOI |
Geochemical | EarthChem | Title, DOI |
Climate data | World Data Center for Climate (WDCC) | Title, DOI |
* Data entries must be made public.
3D-printable models
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
All 3D-printable models (including molecular, cellular, medical/anatomical and labware models) | NIH 3D Print Exchange | Title, model ID, URL |
Transcript data
Qualitative data resulting from recordings of interviews or focus group discussions should be anonymised by redaction and uploaded to a general data repository (see above). If it is not possible to anonymise the data sufficiently by redaction, a restricted route of data access should be provided by the authors and a comprehensive statement must be added to the Data Availability section of the article (see above for data that cannot be shared). If the transcript data cannot be shared under any circumstances, please contact the editorial team, who will be able to advise you.
Sequence and omics data
Data Type | Where To Submit | What To Include In The Data Availability Section Of Your Article |
Expression and sequence data (including Nucleotide/protein sequence, microarray, SNP/SNV, GWAS, phenotype or sequence-based reagent data) Systems and chemical biology data (including chemical entities, chemical reactions, computational models, metabolic profiles, or molecular interactions) |
Any appropriate INSDC member repository, e.g. DDBJ, ENA or NCBI repositories.* The GSA, which is working towards INSDC membership, is also acceptable. Researchers in China may alternatively use the CNGB Sequence Archive. |
Accession number(s). For SNP/SNV data please provide HGVS name(s), local ID(s) and rs/ss number(s) |
Metabolomic data | Metabolomics Workbench$ | Project DOI, Study ID |
Proteomic data | Any appropriate ProteomeXchange member repository | Accession number(s) |
* Some higher-level repositories, such as BioProject and BioStudies, provide access to data deposited in various archival databases. In these cases, please cite the accession numbers that are assigned to the data submissions by the archival databases in addition to the higher-level identifier.
$ Or any appropriate INSDC member repository, see above.
$ Or any appropriate INSDC member repository, see above.