What’s New¶
These are new features and improvements of note in each release.
v0.7.1 (May 1, 2017)¶
New features¶
- Archive names are normalized in DataAPI methods. See Normalizing archive names (GH #220 & GH #235).
- Tags are now normalized to lowercase strings. See Normalize tags (GH #243).
Normalizing archive names¶
DataAPI
methods create()
,
get_archive()
, batch_get_archive()
and
listdir()
, and the default_versions()
property,
are normalized using DataAPI._normalize_archive_name()
. This allows users to create and get
archives using leading slashes and authority names interchangably. For example, the following are
all equivalent:
>>> api.create('my/sample/archive.txt')
>>> api.create('/my/sample/archive.txt')
>>> api.create('authority://my/sample/archive.txt')
Furthermore, they can all be found using similarly flexible searches. The following will all return the archive_name or archive created in the above examples:
>>> api.get_archive('my/sample/archive.txt')
>>> api.get_archive('/my/sample/archive.txt')
>>> api.get_archive('authority://my/sample/archive.txt')
>>> api.batch_get_archive(['authority://my/sample/archive.txt'])
>>> api.search(prefix='my/samp')
>>> api.search(prefix='/my/samp')
>>> api.search(pattern='my/samp*')
>>> api.search(pattern='*my/samp*')
>>> api.search(pattern='/my/samp*')
Search patterns do not accept authority names:
>>> api.search(prefix='authority://my') # no results
Normalize Tags¶
On DataAPI
method create()
, and
DataArchive
method add_tags()
and
get_tags()
.
>>> arch1 = api.create('my_archive', tags=['TAG1', 'tag2', 42])
>>> arch1.get_tags()
['tag1', 'tag2', '42']
>>>
>>> arch1.add_tags('tAg4', 21)
>>> arch1.get_tags()
['tag1', 'tag2', '42', 'tag4', '21']
Archive name checking¶
The normalization process catches illegal archive names:
>>> api.create('!!!\\\\~')
Traceback (most recent call last):
...
fs.errors.InvalidCharsInPathError: Path contains invalid characters: !!!\\~
This error checking is done by fs
, using the implementations of
validatepath()
on the relevant authority. Currently, both
fs.osfs.OSFS.validatepath()
and the method on whatever filesystem is used by the authority
are both checked. This dual restriction is used because checking against OSFS restrictions is useful
to prevent errors when using a cache.
Delete¶
Delete archives
>>> api.listdir('tas/global')
[u'0.0.1', u'0.0.2']
>>>
>>> api.listdir('tas')
[u'regional', u'global', u'adm1', u'adm2']
>>>
>>> tasg = api.get_archive('tas/global')
>>> tasg.delete()
>>> api.get_archive('tas/global')
...
KeyError: 'Archive "tas/global" not found'
>>>
>>> api.listdir('tas')
[u'regional', u'adm1', u'adm2']
Archive-level names space is removed using the fs.osfs.OSFS.removedir()
method
Backwards incompatible API changes¶
- Authority names are now limited to names that match
r'[\w\-]+'
. This regex value is set by the module parameter_VALID_AUTHORITY_PATTERNS
indatafs/core/data_api.py
(GH #186).- Introduces a new property
datafs.DataAPI.default_versions()
, which does archive name coersion/alignment.datafs.DataAPI._default_versions()
should no longer be accessed under any circumstances (GH #220 and GH #235).
Performance Improvements¶
Bug Fixes¶
- Implement missing
default_version
handling inget_archive()
andbatch_get_archive()
(GH #240)- Messages are now coerced to strings, and
log()
and the CLIlog
command no longer fail when used on archives with non-string messages (GH #232)examples/ondisk.py
updated to reflect xarray 0.9.5 (GH #249)- Configuration now creates the datafs app directory if it did not previously exist (GH #265)
- Delete will now remove the archive-level namespace in the filesystem as well as the version number namespace
Under the hood¶
- Conda dependencies pinned in
requirements_conda.txt
, and the channelconda-forge
was added to the travis conda environment so we have access to the latest conda builds. (GH #247)- Running the
configure
command not creates an empty ‘default’ profile if no configuration file exists- Additional documentation on tagging files and searching and finding files
See the issue tracker on GitHub for a complete list.
v0.7.0 (March 9, 2017)¶
New features¶
listdir()
method allowing listing archive path components given a prefix- new batch get archive method
batch_get_archive()
Using the listdir search method¶
List archive path components given the path of the “directory” to search
Note
When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.
Usage¶
>>> api.listdir('s3://ACP/climate/')
['gcm-modelweights.csv', 'hddcdd', 'smme']
$ datafs listdir s3://ACP/climate/
gcm-modelweights.csv
hddcdd
smme
Bulk archive retrieval with batch_get_archive¶
Batch version of get_archive()
. Accepts
an iterable of archive names to retrieve and optional default versions.
Returns a dict of DataArchive
objects,
indexed by archive name. If an archive is not found, it is omitted
(batch_get_archive does not raise a KeyError
on invalid archive names).
Example¶
>>> api.batch_get_archive(api.search())
{arch1: <DataArchive s3://arch1>, arch2: <DataArchive s3://arch2>, ...}
batch_get_archive
has no equivalent on the Command Line Interface.
See the issue tracker on GitHub for a complete list.
v0.6.9 (February 21, 2017)¶
New features¶
- archive pattern constraints (GH #168)
- set tags from command line
- add tagging and searching documentation
Archive pattern constraints¶
List of regex patterns that must match archive_name before archive creation is allowed
Create an archive pattern using manager.set_required_archive_patterns. e.g. require only w, ., or / characters:
>>> api = get_api()
>>> api.manager.set_required_archive_patterns([r'^[\w\/\.]+$'])
Now, archives that do not match this will not be supported:
Tagging from CLI¶
Added three new commands which reflect their DataArchive counterparts:
datafs add_tags
datafs get_tags
datafs delete_tags
Additionally, a --tag
option was added to datafs create so that tags could
be supplied on archive creation:
datafs create my_archive --description "my description" --tag tag1 \
--tag tag2 --source "where it's from" --tag tag3
Backwards incompatible API changes¶
- stop coercing underscores to slashes in archive_path
- drop archive_path argument from DataAPI.create
See the issue tracker on GitHub for a complete list.
v0.6.8 (February 7, 2017)¶
This is a patch release primarily improving the documentation and testing of DataFS. There are no backward incompatible changes in v0.6.8
New features¶
Bug Fixes¶
- More robust
*args
,**kwargs
handling in CLI, with better error messages- Fix click round-trip compatibility issue - print results on CLI using
\n
instead of\r\n
on windows- Raises error when loading a non-existant profile from config (GH #135)
See the issue tracker on GitHub for a complete list.
v0.6.7 (February 1, 2017)¶
New features¶
- Allow tag specification on create
Performance Improvements¶
- Restructure conftest.py: api_with_diverse_archives to be session-scoped
Under the hood¶
- Consolidate manager._create_archive_table and _create_spec_table into one function
- Move archive document creation to separate method in manager (allows batch write in tests)
- Add tests for search and filter queries on very large manager tables
See the issue tracker on GitHub for a complete list.
v0.6.6 (January 20, 2017)¶
New features¶
Introduces search features in command line:
datafs searchand the API:
api.search(query)
See the issue tracker on GitHub for a complete list.
v0.6.5 (January 13, 2017)¶
New features¶
- regex and unix-style filter searches
Backwards incompatible API changes¶
- Prevent update/write of empty files
See the issue tracker on GitHub for a complete list.
v0.6.4 (January 12, 2017)¶
Under the hood¶
- Test/fix handling of multiple read/write of large netCDF datasets
See the issue tracker on GitHub for a complete list.
v0.6.3 (January 11, 2017)¶
New features¶
- dependency handling
Backwards incompatible API changes¶
- raise error when passing non-None versions to unversioned archive methods
- change API method name:
create_archive
–>create()
- change CLI subcommand name:
upload
–>update
Under the hood¶
- improve test coverage
Bug Fixes¶
- prevent users from deleting required metadata elements
See the issue tracker on GitHub for a complete list.
v0.6.2 (January 9, 2017)¶
Backwards incompatible API changes¶
- Drop DataArchive properties that access manager (GH #72)
- Manager archive listing attribute
versions
changed toversion_history
Manager-calling properties converted to methods¶
latest_version
–>get_latest_version()
versions
–>get_versions()
latest_hash
–>get_latest_hash()
history
–>get_history()
metadata
–>get_metadata()
See the issue tracker on GitHub for a complete list.
v0.6.1 (January 6, 2017)¶
See the issue tracker on GitHub for a complete list.
v0.6.0 (January 4, 2017)¶
New features¶
Set dependencies from Python API on write¶
DataArchive.update¶
def update(
self,
filepath,
cache=False,
remove=False,
bumpversion='patch',
prerelease=None,
dependencies=None,
**kwargs):
...
self._update_manager(
checksum,
kwargs,
version=next_version,
dependencies=dependencies)
DataArchive.open¶
def open(
self,
mode='r',
version=None,
bumpversion='patch',
prerelease=None,
dependencies=None,
*args,
**kwargs):
...
updater = lambda *args, **kwargs: self._update_manager(
*args,
version=next_version,
dependencies=dependencies,
**kwargs)
...
DataArchive.get_local_path¶
similar to DataArchive.open
DataArchive._update_manager¶
def _update_manager(
self,
checksum,
metadata={},
version=None,
dependencies=None):
# by default, dependencies is the last version of dependencies
if dependencies is None:
history = self.history
if len(history) == 0:
dependencies = []
else:
dependencies = history[-1]['dependencies']
....
Under the hood¶
- Table schemas have been moved from the dynamo and mongo modules to the BaseDataManager.
- versions attr is now version_history in table schema and DataArchive method
get_versions
is nowget_version_history()
See the issue tracker on GitHub for a complete list.
v0.5.0 (December 21, 2016)¶
v0.4.0 (December 15, 2016)¶
New features¶
- create API object from config file
See the issue tracker on GitHub for a complete list.
v0.3.0 (December 14, 2016)¶
v0.2.0 (December 12, 2016)¶
See the issue tracker on GitHub for a complete list.
v0.1.0 (November 18, 2017)¶
See the issue tracker on GitHub for a complete list.