datafs.datafs module

Module contents

class datafs.DataAPI(default_versions=None, **kwargs)[source]

Bases: object

DefaultAuthorityName = None
attach_authority(service_name, service)[source]
batch_get_archive(archive_names, default_versions=None)[source]

Batch version of get_archive()

  • archive_names (list) – Iterable of archive names to retrieve
  • default_versions (str, object, or dict) – Default versions to assign to each returned archive. May be a dict with archive names as keys and versions as values, or may be a version, in which case the same version is used for all archives. Versions must be a strict version number string, a StrictVersion, or a BumpableVersion object.

archives – List of DataArchive objects. If an archive is not found, it is omitted (batch_get_archive does not raise a KeyError on invalid archive names).

Return type:


create(archive_name, authority_name=None, versioned=True, raise_on_err=True, metadata=None, tags=None, helper=False)[source]

Create a DataFS archive

  • archive_name (str) – Name of the archive
  • authority_name (str) – Name of the data service to use as the archive’s data authority
  • versioned (bool) – If true, store all versions with explicit version numbers (defualt)
  • raise_on_err (bool) – Raise an error if the archive already exists (default True)
  • metadata (dict) – Dictionary of additional archive metadata
  • helper (bool) – If true, interactively prompt for required metadata (default False)

Delete an archive

Parameters:archive_name (str) – Name of the archive to delete
filter(pattern=None, engine='path', prefix=None)[source]

Performs a filtered search on entire universe of archives according to pattern or prefix.

  • prefix (str) – string matching beginning characters of the archive or set of archives you are filtering. Note that authority prefixes, e.g. local://my/archive.txt are not supported in prefix searches.
  • pattern (str) – string matching the characters within the archive or set of archives you are filtering on. Note that authority prefixes, e.g. local://my/archive.txt are not supported in pattern searches.
  • engine (str) – string of value ‘str’, ‘path’, or ‘regex’. That indicates the type of pattern you are filtering on

Return type:


get_archive(archive_name, default_version=None)[source]

Retrieve a data archive

  • archive_name (str) – Name of the archive to retrieve
  • default_version (version) – str or StrictVersion giving the default version number to be used on read operations

archive – New DataArchive object

Return type:



KeyError: – A KeyError is raised when the archive_name is not found

static hash_file(f)[source]

Utility function for hashing file contents

Overload this function to change the file equality checking algorithm

Parameters:f (file-like) – File-like object or file path from which to compute checksum value
Returns:checksum – dictionary with {‘algorithm’: ‘md5’, ‘checksum’: hexdigest}
Return type:dict
listdir(location, authority_name=None)[source]

List archive path components at a given location


When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.

  • location (str) –

    Path of the “directory” to search

    location can be a path relative to the authority root (e.g /MyFiles/Data) or can include authority as a protocol (e.g. my_auth://MyFiles/Data). If the authority is specified as a protocol, the authority_name argument is ignored.

  • authority_name (str) –

    Name of the authority to search (optional)

    If no authority is specified, the default authority is used (if only one authority is attached or if DefaultAuthorityName is assigned).


Archive path components that exist at the given “directory” location on the specified authority

Return type:



ValueError – A ValueError is raised if the authority is ambiguous or invalid

search(*query, **kwargs)[source]

Searches based on tags specified by users

  • query (str) – tags to search on. If multiple terms, provided in comma delimited string format
  • prefix (str) – start of archive name. Providing a start string improves search speed.
datafs.get_api(profile=None, config_file=None, requirements=None)[source]

Generate a datafs.DataAPI object from a config profile

get_api generates a DataAPI object based on a pre-configured datafs profile specified in your datafs config file.

To create a datafs config file, use the command line tool datafs configure --helper or export an existing DataAPI object with datafs.ConfigFile.write_config_from_api()

  • profile (str) – (optional) name of a profile in your datafs config file. If profile is not provided, the default profile specified in the file will be used.
  • config_file (str or file) – (optional) path to your datafs configuration file. By default, get_api uses your OS’s default datafs application directory.


The following specifies a simple API with a MongoDB manager and a temporary storage service:

>>> try:
...   from StringIO import StringIO
... except ImportError:
...   from io import StringIO
>>> import tempfile
>>> tempdir = tempfile.mkdtemp()
>>> config_file = StringIO("""
... default-profile: my-data
... profiles:
...     my-data:
...         manager:
...             class: MongoDBManager
...             kwargs:
...                 database_name: 'MyDatabase'
...                 table_name: 'DataFiles'
...         authorities:
...             local:
...                 service: OSFS
...                 args: ['{}']
... """.format(tempdir))
>>> # This file can be read in using the datafs.get_api helper function
>>> api = get_api(profile='my-data', config_file=config_file)
>>> api.manager.create_archive_table(
...     'DataFiles',
...     raise_on_err=False)
>>> archive = api.create(
...     'my_first_archive',
...     metadata = dict(description = 'My test data archive'),
...     raise_on_err=False)
>>> with'w+') as f:
...     res = f.write(u'hello!')
>>> with'r') as f:
...     print(
>>> # clean up
>>> archive.delete()
>>> import shutil
>>> shutil.rmtree(tempdir)
datafs.to_config_file(api, config_file=None, profile='default')[source]