Managing Data Dependencies

Dependency graphs can be tracked explicitly in datafs, and each version can have its own dependencies.

You specify dependencies from within python or using the command line interface.

Note

Dependencies are not currently validated in any way, so entering a dependency that is not a valid archive name or version will not raise an error.

View the source for the code samples on this page in Python API: Dependencies.

Specifying Dependencies

On write

Dependencies can be set when using the dependencies argument to DataArchive’s update(), open(), or get_local_path() methods.

dependencies must be a dictionary containing archive names as keys and version numbers as values. A value of None is also a valid dependency specification, where the version is treated as unpinned and is always interpreted as the dependency’s latest version.

For example:

>>> my_archive = api.create('my_archive')
>>> with my_archive.open('w+',
...     dependencies={'archive2': '1.1', 'archive3': None}) as f:
...
...     res = f.write(u'contents depend on archive 2 v1.1')
...
>>> my_archive.get_dependencies() 
{'archive2': '1.1', 'archive3': None}

After write

Dependencies can also be added to the latest version of an archive using the set_dependencies() method:

>>> with my_archive.open('w+') as f:
...
...     res = f.write(u'contents depend on archive 2 v1.2')
...
>>> my_archive.set_dependencies({'archive2': '1.2'})
>>> my_archive.get_dependencies() 
{'archive2': '1.2'}

Using a requirements file

If a requirements file is present at api creation, all archives written with that api object will have the specified dependencies by default.

For example, with the following requirements file as requirements_data.txt:

1
2
dep1==1.0
dep2==0.4.1a3

Archives written while in this working directory will have these requirements:

>>> api = datafs.get_api(
...     requirements = 'requirements_data.txt')
... 
>>>
>>> my_archive = api.get_archive('my_archive')
>>> with my_archive.open('w+') as f:
...     res = f.write(u'depends on dep1 and dep2')
...
>>> my_archive.get_dependencies() 
{'dep1': '1.0', 'dep2': '0.4.1a3'}

Using Dependencies

Retrieve dependencies with DataArchive’s get_dependencies() method:

>>> my_archive.get_dependencies() 
{'dep1': '1.0', 'dep2': '0.4.1a3'}

Get dependencies for older versions using the version argument:

>>> my_archive.get_dependencies(version='0.0.1') 
{'archive2': '1.1', 'archive3': None}