DataFS API

Submodules

datafs.datafs module

Module contents

class datafs.DataAPI(default_versions=None, **kwargs)[source]

Bases: object

DefaultAuthorityName = None
attach_authority(service_name, service)[source]
attach_cache(service)[source]
attach_manager(manager)[source]
batch_get_archive(archive_names, default_versions=None)[source]

Batch version of get_archive()

Parameters:
  • archive_names (list) – Iterable of archive names to retrieve
  • default_versions (str, object, or dict) – Default versions to assign to each returned archive. May be a dict with archive names as keys and versions as values, or may be a version, in which case the same version is used for all archives. Versions must be a strict version number string, a StrictVersion, or a BumpableVersion object.
Returns:

archives – List of DataArchive objects. If an archive is not found, it is omitted (batch_get_archive does not raise a KeyError on invalid archive names).

Return type:

list

cache
close()[source]
create(archive_name, authority_name=None, versioned=True, raise_on_err=True, metadata=None, tags=None, helper=False)[source]

Create a DataFS archive

Parameters:
  • archive_name (str) – Name of the archive
  • authority_name (str) – Name of the data service to use as the archive’s data authority
  • versioned (bool) – If true, store all versions with explicit version numbers (defualt)
  • raise_on_err (bool) – Raise an error if the archive already exists (default True)
  • metadata (dict) – Dictionary of additional archive metadata
  • helper (bool) – If true, interactively prompt for required metadata (default False)
default_authority
default_authority_name
default_versions
delete_archive(archive_name)[source]

Delete an archive

Parameters:archive_name (str) – Name of the archive to delete
filter(pattern=None, engine='path', prefix=None)[source]

Performs a filtered search on entire universe of archives according to pattern or prefix.

Parameters:
  • prefix (str) – string matching beginning characters of the archive or set of archives you are filtering. Note that authority prefixes, e.g. local://my/archive.txt are not supported in prefix searches.
  • pattern (str) – string matching the characters within the archive or set of archives you are filtering on. Note that authority prefixes, e.g. local://my/archive.txt are not supported in pattern searches.
  • engine (str) – string of value ‘str’, ‘path’, or ‘regex’. That indicates the type of pattern you are filtering on
Returns:

Return type:

generator

get_archive(archive_name, default_version=None)[source]

Retrieve a data archive

Parameters:
  • archive_name (str) – Name of the archive to retrieve
  • default_version (version) – str or StrictVersion giving the default version number to be used on read operations
Returns:

archive – New DataArchive object

Return type:

object

Raises:

KeyError: – A KeyError is raised when the archive_name is not found

static hash_file(f)[source]

Utility function for hashing file contents

Overload this function to change the file equality checking algorithm

Parameters:f (file-like) – File-like object or file path from which to compute checksum value
Returns:checksum – dictionary with {‘algorithm’: ‘md5’, ‘checksum’: hexdigest}
Return type:dict
listdir(location, authority_name=None)[source]

List archive path components at a given location

Note

When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.

Parameters:
  • location (str) –

    Path of the “directory” to search

    location can be a path relative to the authority root (e.g /MyFiles/Data) or can include authority as a protocol (e.g. my_auth://MyFiles/Data). If the authority is specified as a protocol, the authority_name argument is ignored.

  • authority_name (str) –

    Name of the authority to search (optional)

    If no authority is specified, the default authority is used (if only one authority is attached or if DefaultAuthorityName is assigned).

Returns:

Archive path components that exist at the given “directory” location on the specified authority

Return type:

list

Raises:

ValueError – A ValueError is raised if the authority is ambiguous or invalid

lock_authorities()[source]
lock_manager()[source]
manager
search(*query, **kwargs)[source]

Searches based on tags specified by users

Parameters:
  • query (str) – tags to search on. If multiple terms, provided in comma delimited string format
  • prefix (str) – start of archive name. Providing a start string improves search speed.
datafs.get_api(profile=None, config_file=None, requirements=None)[source]

Generate a datafs.DataAPI object from a config profile

get_api generates a DataAPI object based on a pre-configured datafs profile specified in your datafs config file.

To create a datafs config file, use the command line tool datafs configure --helper or export an existing DataAPI object with datafs.ConfigFile.write_config_from_api()

Parameters:
  • profile (str) – (optional) name of a profile in your datafs config file. If profile is not provided, the default profile specified in the file will be used.
  • config_file (str or file) – (optional) path to your datafs configuration file. By default, get_api uses your OS’s default datafs application directory.

Examples

The following specifies a simple API with a MongoDB manager and a temporary storage service:

>>> try:
...   from StringIO import StringIO
... except ImportError:
...   from io import StringIO
...
>>> import tempfile
>>> tempdir = tempfile.mkdtemp()
>>>
>>> config_file = StringIO("""
... default-profile: my-data
... profiles:
...     my-data:
...         manager:
...             class: MongoDBManager
...             kwargs:
...                 database_name: 'MyDatabase'
...                 table_name: 'DataFiles'
...
...         authorities:
...             local:
...                 service: OSFS
...                 args: ['{}']
... """.format(tempdir))
>>>
>>> # This file can be read in using the datafs.get_api helper function
...
>>>
>>> api = get_api(profile='my-data', config_file=config_file)
>>> api.manager.create_archive_table(
...     'DataFiles',
...     raise_on_err=False)
>>>
>>> archive = api.create(
...     'my_first_archive',
...     metadata = dict(description = 'My test data archive'),
...     raise_on_err=False)
>>>
>>> with archive.open('w+') as f:
...     res = f.write(u'hello!')
...
>>> with archive.open('r') as f:
...     print(f.read())
...
hello!
>>>
>>> # clean up
...
>>> archive.delete()
>>> import shutil
>>> shutil.rmtree(tempdir)
datafs.to_config_file(api, config_file=None, profile='default')[source]