Creating Data Archives

Archives are the basic unit of a DataFS filesystem. They are essentially files, metadata, history, versions, and dependencies wrapped into a single object.

You can create archives from within python or using the command line interface.

View the source for the code samples on this page in Python API: Creating Archives.

Naming Archives

Archives can be named anything, as long as the data service you use can handle the name. If create an archive with a name illegal for the corresponding data service, you will receive an error on write (rather than on archive creation). Since this is an error specific to the storate service, we do not catch this error on creation.

Create an archive using the datafs.DataAPI.create() command.

>>> archive = api.create('my_archive_name')

Specifying an Authority

If you have more than one authority, you will need to specify an authority on archive creation:

>>> archive = api.create('my_archive_name')
Traceback (most recent call last):
...
ValueError: Authority ambiguous. Set authority or DefaultAuthorityName.

This can be done using the authority_name argument:

>>> archive = api.create(
...     'my_archive_name',
...     authority_name='my_authority')
...

Alternatively, you can set the DefaultAuthorityName attribute:

>>> api.DefaultAuthorityName = 'my_authority'
>>> archive = api.create('my_archive_name')

Adding Metadata

Arbitrary metadata can be added using the metadata dictionary argument:

>>> archive = api.create(
...     'my_archive_name',
...     metadata={
...         'description': 'my test archive',
...         'source': 'Burke et al (2015)',
...         'doi': '10.1038/nature15725'})
...

Required Metadata

Administrators can set up metadata requirements using the manager’s Administrative Tools tools. If these required fields are not provided, an error will be raised on archive creation.

For example, when connected to a manager requiring the ‘description’ field:

>>> archive = api.create(
...     'my_archive_name',
...     metadata = {
...         'source': 'Burke et al (2015)',
...         'doi': '10.1038/nature15725'})
... 
Traceback (most recent call last):
...
AssertionError: Required value "description" not found. Use helper=True or
the --helper flag for assistance.

Trying again with a “description” field will work as expected.

Using the Helper

Instead of providing all fields in the create call, you can optionally use the helper argument. Setting helper=True will start an interactive prompt, requesting each required item of metadata:

>>> archive = api.create(
...     'my_archive_name',
...     metadata={
...         'source': 'Burke et al (2015)',
...         'doi': '10.1038/nature15725'},
...         helper=True) 
...

Enter a description: