DataFS API¶
Subpackages¶
Submodules¶
datafs.datafs module¶
Module contents¶
-
class
datafs.
DataAPI
(default_versions=None, **kwargs)[source]¶ Bases:
object
-
DefaultAuthorityName
= None¶
-
batch_get_archive
(archive_names, default_versions=None)[source]¶ Batch version of
get_archive()
Parameters: - archive_names (list) – Iterable of archive names to retrieve
- default_versions (str, object, or dict) – Default versions to assign to each returned archive. May be a dict
with archive names as keys and versions as values, or may be a
version, in which case the same version is used for all archives.
Versions must be a strict version number string, a
StrictVersion
, or aBumpableVersion
object.
Returns: archives – List of
DataArchive
objects. If an archive is not found, it is omitted (batch_get_archive
does not raise aKeyError
on invalid archive names).Return type: list
-
cache
¶
-
create
(archive_name, authority_name=None, versioned=True, raise_on_err=True, metadata=None, tags=None, helper=False)[source]¶ Create a DataFS archive
Parameters: - archive_name (str) – Name of the archive
- authority_name (str) – Name of the data service to use as the archive’s data authority
- versioned (bool) – If true, store all versions with explicit version numbers (defualt)
- raise_on_err (bool) – Raise an error if the archive already exists (default True)
- metadata (dict) – Dictionary of additional archive metadata
- helper (bool) – If true, interactively prompt for required metadata (default False)
-
default_versions
¶
-
delete_archive
(archive_name)[source]¶ Delete an archive
Parameters: archive_name (str) – Name of the archive to delete
-
filter
(pattern=None, engine='path', prefix=None)[source]¶ Performs a filtered search on entire universe of archives according to pattern or prefix.
Parameters: - prefix (str) – string matching beginning characters of the archive or set of
archives you are filtering. Note that authority prefixes, e.g.
local://my/archive.txt
are not supported in prefix searches. - pattern (str) – string matching the characters within the archive or set of
archives you are filtering on. Note that authority prefixes, e.g.
local://my/archive.txt
are not supported in pattern searches. - engine (str) – string of value ‘str’, ‘path’, or ‘regex’. That indicates the type of pattern you are filtering on
Returns: Return type: generator
- prefix (str) – string matching beginning characters of the archive or set of
archives you are filtering. Note that authority prefixes, e.g.
-
get_archive
(archive_name, default_version=None)[source]¶ Retrieve a data archive
Parameters: - archive_name (str) – Name of the archive to retrieve
- default_version (version) – str or
StrictVersion
giving the default version number to be used on read operations
Returns: archive – New
DataArchive
objectReturn type: Raises: KeyError: – A KeyError is raised when the
archive_name
is not found
-
static
hash_file
(f)[source]¶ Utility function for hashing file contents
Overload this function to change the file equality checking algorithm
Parameters: f (file-like) – File-like object or file path from which to compute checksum value Returns: checksum – dictionary with {‘algorithm’: ‘md5’, ‘checksum’: hexdigest} Return type: dict
-
listdir
(location, authority_name=None)[source]¶ List archive path components at a given location
Note
When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.
Parameters: - location (str) –
Path of the “directory” to search
location can be a path relative to the authority root (e.g /MyFiles/Data) or can include authority as a protocol (e.g. my_auth://MyFiles/Data). If the authority is specified as a protocol, the authority_name argument is ignored.
- authority_name (str) –
Name of the authority to search (optional)
If no authority is specified, the default authority is used (if only one authority is attached or if
DefaultAuthorityName
is assigned).
Returns: Archive path components that exist at the given “directory” location on the specified authority
Return type: list
Raises: ValueError
– A ValueError is raised if the authority is ambiguous or invalid- location (str) –
-
manager
¶
-
-
datafs.
get_api
(profile=None, config_file=None, requirements=None)[source]¶ Generate a datafs.DataAPI object from a config profile
get_api
generates a DataAPI object based on a pre-configured datafs profile specified in your datafs config file.To create a datafs config file, use the command line tool
datafs configure --helper
or export an existing DataAPI object withdatafs.ConfigFile.write_config_from_api()
Parameters: - profile (str) – (optional) name of a profile in your datafs config file. If profile is not provided, the default profile specified in the file will be used.
- config_file (str or file) – (optional) path to your datafs configuration file. By default, get_api uses your OS’s default datafs application directory.
Examples
The following specifies a simple API with a MongoDB manager and a temporary storage service:
>>> try: ... from StringIO import StringIO ... except ImportError: ... from io import StringIO ... >>> import tempfile >>> tempdir = tempfile.mkdtemp() >>> >>> config_file = StringIO(""" ... default-profile: my-data ... profiles: ... my-data: ... manager: ... class: MongoDBManager ... kwargs: ... database_name: 'MyDatabase' ... table_name: 'DataFiles' ... ... authorities: ... local: ... service: OSFS ... args: ['{}'] ... """.format(tempdir)) >>> >>> # This file can be read in using the datafs.get_api helper function ... >>> >>> api = get_api(profile='my-data', config_file=config_file) >>> api.manager.create_archive_table( ... 'DataFiles', ... raise_on_err=False) >>> >>> archive = api.create( ... 'my_first_archive', ... metadata = dict(description = 'My test data archive'), ... raise_on_err=False) >>> >>> with archive.open('w+') as f: ... res = f.write(u'hello!') ... >>> with archive.open('r') as f: ... print(f.read()) ... hello! >>> >>> # clean up ... >>> archive.delete() >>> import shutil >>> shutil.rmtree(tempdir)