datafs.core package

Submodules

datafs.core.data_api module

class datafs.core.data_api.DataAPI(default_versions=None, **kwargs)[source]

Bases: object

DefaultAuthorityName = None
attach_authority(service_name, service)[source]
attach_cache(service)[source]
attach_manager(manager)[source]
batch_get_archive(archive_names, default_versions=None)[source]

Batch version of get_archive()

Parameters:
  • archive_names (list) – Iterable of archive names to retrieve
  • default_versions (str, object, or dict) – Default versions to assign to each returned archive. May be a dict with archive names as keys and versions as values, or may be a version, in which case the same version is used for all archives. Versions must be a strict version number string, a StrictVersion, or a BumpableVersion object.
Returns:

archives – List of DataArchive objects. If an archive is not found, it is omitted (batch_get_archive does not raise a KeyError on invalid archive names).

Return type:

list

cache
close()[source]
create(archive_name, authority_name=None, versioned=True, raise_on_err=True, metadata=None, tags=None, helper=False)[source]

Create a DataFS archive

Parameters:
  • archive_name (str) – Name of the archive
  • authority_name (str) – Name of the data service to use as the archive’s data authority
  • versioned (bool) – If true, store all versions with explicit version numbers (defualt)
  • raise_on_err (bool) – Raise an error if the archive already exists (default True)
  • metadata (dict) – Dictionary of additional archive metadata
  • helper (bool) – If true, interactively prompt for required metadata (default False)
default_authority
default_authority_name
default_versions
delete_archive(archive_name)[source]

Delete an archive

Parameters:archive_name (str) – Name of the archive to delete
filter(pattern=None, engine='path', prefix=None)[source]

Performs a filtered search on entire universe of archives according to pattern or prefix.

Parameters:
  • prefix (str) – string matching beginning characters of the archive or set of archives you are filtering. Note that authority prefixes, e.g. local://my/archive.txt are not supported in prefix searches.
  • pattern (str) – string matching the characters within the archive or set of archives you are filtering on. Note that authority prefixes, e.g. local://my/archive.txt are not supported in pattern searches.
  • engine (str) – string of value ‘str’, ‘path’, or ‘regex’. That indicates the type of pattern you are filtering on
Returns:

Return type:

generator

get_archive(archive_name, default_version=None)[source]

Retrieve a data archive

Parameters:
  • archive_name (str) – Name of the archive to retrieve
  • default_version (version) – str or StrictVersion giving the default version number to be used on read operations
Returns:

archive – New DataArchive object

Return type:

object

Raises:

KeyError: – A KeyError is raised when the archive_name is not found

static hash_file(f)[source]

Utility function for hashing file contents

Overload this function to change the file equality checking algorithm

Parameters:f (file-like) – File-like object or file path from which to compute checksum value
Returns:checksum – dictionary with {‘algorithm’: ‘md5’, ‘checksum’: hexdigest}
Return type:dict
listdir(location, authority_name=None)[source]

List archive path components at a given location

Note

When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.

Parameters:
  • location (str) –

    Path of the “directory” to search

    location can be a path relative to the authority root (e.g /MyFiles/Data) or can include authority as a protocol (e.g. my_auth://MyFiles/Data). If the authority is specified as a protocol, the authority_name argument is ignored.

  • authority_name (str) –

    Name of the authority to search (optional)

    If no authority is specified, the default authority is used (if only one authority is attached or if DefaultAuthorityName is assigned).

Returns:

Archive path components that exist at the given “directory” location on the specified authority

Return type:

list

Raises:

ValueError – A ValueError is raised if the authority is ambiguous or invalid

lock_authorities()[source]
lock_manager()[source]
manager
search(*query, **kwargs)[source]

Searches based on tags specified by users

Parameters:
  • query (str) – tags to search on. If multiple terms, provided in comma delimited string format
  • prefix (str) – start of archive name. Providing a start string improves search speed.
exception datafs.core.data_api.PermissionError[source]

Bases: exceptions.NameError

datafs.core.data_archive module

class datafs.core.data_archive.DataArchive(api, archive_name, authority_name, archive_path, versioned=True, default_version=None)[source]

Bases: object

add_tags(*tags)[source]

Set tags for a given archive

archive_path
authority
authority_name
cache(version=None)[source]
delete()[source]

Delete the archive

Warning

Deleting an archive will erase all data and metadata permanently. For help setting user permissions, see Administrative Tools

delete_tags(*tags)[source]

Deletes tags for a given archive

desc(version=None, *args, **kwargs)[source]

Return a short descriptive text regarding a path

download(filepath, version=None)[source]

Downloads a file from authority to local path

  1. First checks in cache to check if file is there and if it is, is it up to date
  2. If it is not up to date, it will download the file to cache
exists(version=None, *args, **kwargs)[source]

Check whether a path exists as file or directory

get_default_version()[source]
get_dependencies(version=None)[source]
Parameters:version (str) – string representing version number whose dependencies you are looking up
get_history()[source]
get_latest_hash()[source]
get_latest_version()[source]
get_local_path(*args, **kwds)[source]

Returns a local path for read/write

Parameters:
  • version (str) – Version number of the file to retrieve (default latest)
  • bumpversion (str) – Version component to update on write if archive is versioned. Valid bumpversion values are ‘major’, ‘minor’, and ‘patch’, representing the three components of the strict version numbering system (e.g. “1.2.3”). If bumpversion is None the version number is not updated on write. Either bumpversion or prerelease (or both) must be a non-None value. If the archive is not versioned, bumpversion is ignored.
  • prerelease (str) – Prerelease component of archive version to update on write if archive is versioned. Valid prerelease values are ‘alpha’ and ‘beta’. Either bumpversion or prerelease (or both) must be a non-None value. If the archive is not versioned, prerelease is ignored.
  • metadata (dict) – Updates to archive metadata. Pass {key: None} to remove a key from the archive’s metadata.
get_metadata()[source]
get_tags()[source]

Returns a list of tags for the archive

get_version_hash(version=None)[source]
get_version_path(version=None)[source]

Returns a storage path for the archive and version

If the archive is versioned, the version number is used as the file path and the archive path is the directory. If not, the archive path is used as the file path.

Parameters:version (str or object) – Version number to use as file name on versioned archives (default latest unless default_version set)

Examples

>>> arch = DataArchive(None, 'arch', None, 'a1', versioned=False)
>>> print(arch.get_version_path())
a1
>>>
>>> ver = DataArchive(None, 'ver', None, 'a2', versioned=True)
>>> print(ver.get_version_path('0.0.0'))
a2/0.0
>>>
>>> print(ver.get_version_path('0.0.1a1'))
a2/0.0.1a1
>>>
>>> print(ver.get_version_path('latest')) 
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'manager'
get_versions()[source]
getinfo(version=None, *args, **kwargs)[source]

Return information about the path e.g. size, mtime

getmeta(version=None, *args, **kwargs)[source]

Get the value of a filesystem meta value, if it exists

hasmeta(version=None, *args, **kwargs)[source]

Check if a filesystem meta value exists

is_cached(version=None)[source]

Set the cache property to start/stop file caching for this archive

isfile(version=None, *args, **kwargs)[source]

Check whether the path exists and is a file

log()[source]
open(*args, **kwds)[source]

Opens a file for read/write

Parameters:
  • mode (str) – Specifies the mode in which the file is opened (default ‘r’)
  • version (str) – Version number of the file to open (default latest)
  • bumpversion (str) – Version component to update on write if archive is versioned. Valid bumpversion values are ‘major’, ‘minor’, and ‘patch’, representing the three components of the strict version numbering system (e.g. “1.2.3”). If bumpversion is None the version number is not updated on write. Either bumpversion or prerelease (or both) must be a non-None value. If the archive is not versioned, bumpversion is ignored.
  • prerelease (str) – Prerelease component of archive version to update on write if archive is versioned. Valid prerelease values are ‘alpha’ and ‘beta’. Either bumpversion or prerelease (or both) must be a non-None value. If the archive is not versioned, prerelease is ignored.
  • metadata (dict) – Updates to archive metadata. Pass {key: None} to remove a key from the archive’s metadata.

args, kwargs sent to file system opener

remove_from_cache(version=None)[source]
set_dependencies(dependencies=None)[source]
update(filepath, cache=False, remove=False, bumpversion=None, prerelease=None, dependencies=None, metadata=None, message=None)[source]

Enter a new version to a DataArchive

Parameters:
  • filepath (str) – The path to the file on your local file system
  • cache (bool) – Turn on caching for this archive if not already on before update
  • remove (bool) – removes a file from your local directory
  • bumpversion (str) – Version component to update on write if archive is versioned. Valid bumpversion values are ‘major’, ‘minor’, and ‘patch’, representing the three components of the strict version numbering system (e.g. “1.2.3”). If bumpversion is None the version number is not updated on write. Either bumpversion or prerelease (or both) must be a non-None value. If the archive is not versioned, bumpversion is ignored.
  • prerelease (str) – Prerelease component of archive version to update on write if archive is versioned. Valid prerelease values are ‘alpha’ and ‘beta’. Either bumpversion or prerelease (or both) must be a non-None value. If the archive is not versioned, prerelease is ignored.
  • metadata (dict) – Updates to archive metadata. Pass {key: None} to remove a key from the archive’s metadata.
update_metadata(metadata)[source]
versioned

datafs.core.data_file module

datafs.core.data_file.get_local_path(*args, **kwds)[source]

Context manager for retrieving a system path for I/O and updating on change

Parameters:
  • authority (object) – pyFilesystem filesystem object to use as the authoritative, up-to-date source for the archive
  • cache (object) – pyFilesystem filesystem object to use as the cache. Default None.
  • use_cache (bool) – update, service_path, version_check, **kwargs
datafs.core.data_file.open_file(*args, **kwds)[source]

Context manager for reading/writing an archive and uploading on changes

Parameters:
  • authority (object) – pyFilesystem filesystem object to use as the authoritative, up-to-date source for the archive
  • cache (object) – pyFilesystem filesystem object to use as the cache. Default None.
  • use_cache (bool) – update, service_path, version_check, **kwargs

Module contents