What’s New¶
These are new features and improvements of note in each release.
v0.7.0 (latest)¶
New features¶
listdir()
method allowing listing archive path components given a prefix- new batch get archive method
batch_get_archive()
Using the listdir search method¶
List archive path components given the path of the “directory” to search
Note
When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.
Usage¶
>>> api.listdir('s3://ACP/climate/')
['gcm-modelweights.csv', 'hddcdd', 'smme']
$ datafs listdir s3://ACP/climate/
gcm-modelweights.csv
hddcdd
smme
Bulk archive retrieval with batch_get_archive¶
Batch version of get_archive()
. Accepts
an iterable of archive names to retrieve and optional default versions.
Returns a dict of DataArchive
objects,
indexed by archive name. If an archive is not found, it is omitted
(batch_get_archive does not raise a KeyError
on invalid archive names).
Example¶
>>> api.batch_get_archive(api.search())
{arch1: <DataArchive s3://arch1>, arch2: <DataArchive s3://arch2>, ...}
batch_get_archive
has no equivalent on the Command Line Interface.
Backwards incompatible API changes¶
Performance Improvements¶
Bug Fixes¶
See the issue tracker on GitHub for a complete list.
v0.6.9 (February 21, 2017)¶
New features¶
- archive pattern constraints (GH #168)
- set tags from command line
- add tagging and searching documentation
Archive pattern constraints¶
List of regex patterns that must match archive_name before archive creation is allowed
Create an archive pattern using manager.set_required_archive_patterns. e.g. require only w, ., or / characters:
>>> api = get_api()
>>> api.manager.set_required_archive_patterns([r'^[\w\/\.]+$'])
Now, archives that do not match this will not be supported:
Tagging from CLI¶
Added three new commands which reflect their DataArchive counterparts:
datafs add_tags
datafs get_tags
datafs delete_tags
Additionally, a --tag
option was added to datafs create so that tags could
be supplied on archive creation:
datafs create my_archive --description "my description" --tag tag1 \
--tag tag2 --source "where it's from" --tag tag3
Backwards incompatible API changes¶
- stop coercing underscores to slashes in archive_path
- drop archive_path argument from DataAPI.create
See the issue tracker on GitHub for a complete list.
v0.6.8 (February 7, 2017)¶
This is a patch release primarily improving the documentation and testing of DataFS. There are no backward incompatible changes in v0.6.8
New features¶
- Add command line docstrings (GH #115)
- Add tests for Python API documentation snippets (GH #108)
- Integrate clatter - checks to make sure CLI documentation is accurate
Bug Fixes¶
- More robust
*args
,**kwargs
handling in CLI, with better error messages- Fix click round-trip compatibility issue - print results on CLI using
\n
instead of\r\n
on windows- Raises error when loading a non-existant profile from config (GH #135)
See the issue tracker on GitHub for a complete list.
v0.6.7 (February 1, 2017)¶
New features¶
- Allow tag specification on create
Performance Improvements¶
- Restructure conftest.py: api_with_diverse_archives to be session-scoped
Under the hood¶
- Consolidate manager._create_archive_table and _create_spec_table into one function
- Move archive document creation to separate method in manager (allows batch write in tests)
- Add tests for search and filter queries on very large manager tables
See the issue tracker on GitHub for a complete list.
v0.6.6 (January 20, 2017)¶
New features¶
Introduces search features in command line:
datafs searchand the API:
api.search(query)
See the issue tracker on GitHub for a complete list.
v0.6.5 (January 13, 2017)¶
New features¶
- regex and unix-style filter searches
Backwards incompatible API changes¶
- Prevent update/write of empty files
See the issue tracker on GitHub for a complete list.
v0.6.4 (January 12, 2017)¶
Under the hood¶
- Test/fix handling of multiple read/write of large netCDF datasets
See the issue tracker on GitHub for a complete list.
v0.6.3 (January 11, 2017)¶
New features¶
- dependency handling
Backwards incompatible API changes¶
- raise error when passing non-None versions to unversioned archive methods
- change API method name:
create_archive
–>create()
- change CLI subcommand name:
upload
–>update
Under the hood¶
- improve test coverage
Bug Fixes¶
- prevent users from deleting required metadata elements
See the issue tracker on GitHub for a complete list.
v0.6.2 (January 9, 2017)¶
New features¶
- New template in docs for AWS configuration (GH #73)
Backwards incompatible API changes¶
- Drop DataArchive properties that access manager (GH #72)
- Manager archive listing attribute
versions
changed toversion_history
Manager-calling properties converted to methods¶
latest_version
–>get_latest_version()
versions
–>get_versions()
latest_hash
–>get_latest_hash()
history
–>get_history()
metadata
–>get_metadata()
See the issue tracker on GitHub for a complete list.
v0.6.1 (January 6, 2017)¶
See the issue tracker on GitHub for a complete list.
v0.6.0 (January 4, 2017)¶
New features¶
- Explicit versioning & version pinning (GH #62)
- Explicit dependency tracking (GH #63)
- Update metadata from the command line
- Support for version tracking & management with requirements files (GH #70)
- Configure archive specification on manager table
Set dependencies from Python API on write¶
DataArchive.update¶
def update(
self,
filepath,
cache=False,
remove=False,
bumpversion='patch',
prerelease=None,
dependencies=None,
**kwargs):
...
self._update_manager(
checksum,
kwargs,
version=next_version,
dependencies=dependencies)
DataArchive.open¶
def open(
self,
mode='r',
version=None,
bumpversion='patch',
prerelease=None,
dependencies=None,
*args,
**kwargs):
...
updater = lambda *args, **kwargs: self._update_manager(
*args,
version=next_version,
dependencies=dependencies,
**kwargs)
...
DataArchive.get_local_path¶
similar to DataArchive.open
DataArchive._update_manager¶
def _update_manager(
self,
checksum,
metadata={},
version=None,
dependencies=None):
# by default, dependencies is the last version of dependencies
if dependencies is None:
history = self.history
if len(history) == 0:
dependencies = []
else:
dependencies = history[-1]['dependencies']
....
Under the hood¶
- Table schemas have been moved from the dynamo and mongo modules to the BaseDataManager.
- versions attr is now version_history in table schema and DataArchive method
get_versions
is nowget_version_history()
See the issue tracker on GitHub for a complete list.
v0.5.0 (December 21, 2016)¶
v0.4.0 (December 15, 2016)¶
New features¶
- create API object from config file
See the issue tracker on GitHub for a complete list.
v0.3.0 (December 14, 2016)¶
v0.2.0 (December 12, 2016)¶
See the issue tracker on GitHub for a complete list.
v0.1.0 (November 18, 2017)¶
See the issue tracker on GitHub for a complete list.