What’s New

These are new features and improvements of note in each release.

v0.7.1 (May 1, 2017)

New features

Normalizing archive names

DataAPI methods create(), get_archive(), batch_get_archive() and listdir(), and the default_versions() property, are normalized using DataAPI._normalize_archive_name(). This allows users to create and get archives using leading slashes and authority names interchangably. For example, the following are all equivalent:

>>> api.create('my/sample/archive.txt')
>>> api.create('/my/sample/archive.txt')
>>> api.create('authority://my/sample/archive.txt')

Furthermore, they can all be found using similarly flexible searches. The following will all return the archive_name or archive created in the above examples:

>>> api.get_archive('my/sample/archive.txt')
>>> api.get_archive('/my/sample/archive.txt')
>>> api.get_archive('authority://my/sample/archive.txt')
>>> api.batch_get_archive(['authority://my/sample/archive.txt'])
>>> api.search(prefix='my/samp')
>>> api.search(prefix='/my/samp')
>>> api.search(pattern='my/samp*')
>>> api.search(pattern='*my/samp*')
>>> api.search(pattern='/my/samp*')

Search patterns do not accept authority names:

>>> api.search(prefix='authority://my') # no results

Normalize Tags

On DataAPI method create(), and DataArchive method add_tags() and get_tags().

>>> arch1 = api.create('my_archive', tags=['TAG1', 'tag2', 42])
>>> arch1.get_tags()
['tag1', 'tag2', '42']
>>>
>>> arch1.add_tags('tAg4', 21)
>>> arch1.get_tags()
['tag1', 'tag2', '42', 'tag4', '21']

Archive name checking

The normalization process catches illegal archive names:

>>> api.create('!!!\\\\~')
Traceback (most recent call last):
...
fs.errors.InvalidCharsInPathError: Path contains invalid characters: !!!\\~

This error checking is done by fs, using the implementations of validatepath() on the relevant authority. Currently, both fs.osfs.OSFS.validatepath() and the method on whatever filesystem is used by the authority are both checked. This dual restriction is used because checking against OSFS restrictions is useful to prevent errors when using a cache.

Delete

Delete archives

>>> api.listdir('tas/global')
[u'0.0.1', u'0.0.2']
>>>
>>> api.listdir('tas')
[u'regional', u'global', u'adm1', u'adm2']
>>>
>>> tasg = api.get_archive('tas/global')
>>> tasg.delete()
>>> api.get_archive('tas/global')
...
KeyError: 'Archive "tas/global" not found'
>>>
>>> api.listdir('tas')
[u'regional', u'adm1', u'adm2']

Archive-level names space is removed using the fs.osfs.OSFS.removedir() method

Backwards incompatible API changes

  • Authority names are now limited to names that match r'[\w\-]+'. This regex value is set by the module parameter _VALID_AUTHORITY_PATTERNS in datafs/core/data_api.py (GH #186).
  • Introduces a new property datafs.DataAPI.default_versions(), which does archive name coersion/alignment. datafs.DataAPI._default_versions() should no longer be accessed under any circumstances (GH #220 and GH #235).

Performance Improvements

Bug Fixes

  • Implement missing default_version handling in get_archive() and batch_get_archive() (GH #240)
  • Messages are now coerced to strings, and log() and the CLI log command no longer fail when used on archives with non-string messages (GH #232)
  • examples/ondisk.py updated to reflect xarray 0.9.5 (GH #249)
  • Configuration now creates the datafs app directory if it did not previously exist (GH #265)
  • Delete will now remove the archive-level namespace in the filesystem as well as the version number namespace

Under the hood

  • Use :issue: and :pull: directives to reference a github issue or pull request (GH #209)
  • The sphinx build is now tested on travis. Run the tests locally with the command sphinx-build -W -b html -d docs/_build/doctrees docs/. docs/_build/html (GH #211)
  • The docs structure has been reorganized
  • Conda dependencies pinned in requirements_conda.txt, and the channel conda-forge was added to the travis conda environment so we have access to the latest conda builds. (GH #247)
  • Running the configure command not creates an empty ‘default’ profile if no configuration file exists
  • Additional documentation on tagging files and searching and finding files

See the issue tracker on GitHub for a complete list.

v0.7.0 (March 9, 2017)

New features

Using the listdir search method

List archive path components given the path of the “directory” to search

Note

When using listdir on versioned archives, listdir will provide the version numbers when a full archive path is supplied as the location argument. This is because DataFS stores the archive path as a directory and the versions as the actual files when versioning is on.

Usage
>>> api.listdir('s3://ACP/climate/')
['gcm-modelweights.csv', 'hddcdd', 'smme']
$ datafs listdir s3://ACP/climate/
gcm-modelweights.csv
hddcdd
smme

Bulk archive retrieval with batch_get_archive

Batch version of get_archive(). Accepts an iterable of archive names to retrieve and optional default versions.

Returns a dict of DataArchive objects, indexed by archive name. If an archive is not found, it is omitted (batch_get_archive does not raise a KeyError on invalid archive names).

Example
>>> api.batch_get_archive(api.search())
{arch1: <DataArchive s3://arch1>, arch2: <DataArchive s3://arch2>, ...}

batch_get_archive has no equivalent on the Command Line Interface.

See the issue tracker on GitHub for a complete list.

v0.6.9 (February 21, 2017)

New features

  • archive pattern constraints (GH #168)
  • set tags from command line
  • add tagging and searching documentation

Archive pattern constraints

List of regex patterns that must match archive_name before archive creation is allowed

Create an archive pattern using manager.set_required_archive_patterns. e.g. require only w, ., or / characters:

>>> api = get_api()
>>> api.manager.set_required_archive_patterns([r'^[\w\/\.]+$'])

Now, archives that do not match this will not be supported:

Tagging from CLI

Added three new commands which reflect their DataArchive counterparts:

datafs add_tags
datafs get_tags
datafs delete_tags

Additionally, a --tag option was added to datafs create so that tags could be supplied on archive creation:

datafs create my_archive --description "my description" --tag tag1 \
    --tag tag2 --source "where it's from" --tag tag3

Backwards incompatible API changes

  • stop coercing underscores to slashes in archive_path
  • drop archive_path argument from DataAPI.create

See the issue tracker on GitHub for a complete list.

v0.6.8 (February 7, 2017)

This is a patch release primarily improving the documentation and testing of DataFS. There are no backward incompatible changes in v0.6.8

New features

  • Add command line docstrings (GH #115)
  • Add tests for Python API documentation snippets (GH #108)
  • Integrate clatter - checks to make sure CLI documentation is accurate

Bug Fixes

  • More robust *args, **kwargs handling in CLI, with better error messages
  • Fix click round-trip compatibility issue - print results on CLI using \n instead of \r\n on windows
  • Raises error when loading a non-existant profile from config (GH #135)

See the issue tracker on GitHub for a complete list.

v0.6.7 (February 1, 2017)

New features

  • Allow tag specification on create

Performance Improvements

  • Restructure conftest.py: api_with_diverse_archives to be session-scoped

Under the hood

  • Consolidate manager._create_archive_table and _create_spec_table into one function
  • Move archive document creation to separate method in manager (allows batch write in tests)
  • Add tests for search and filter queries on very large manager tables

See the issue tracker on GitHub for a complete list.

v0.6.6 (January 20, 2017)

New features

  • Introduces search features in command line:

    datafs search
    

    and the API:

    api.search(query)
    

See the issue tracker on GitHub for a complete list.

v0.6.5 (January 13, 2017)

New features

  • regex and unix-style filter searches

Backwards incompatible API changes

  • Prevent update/write of empty files

See the issue tracker on GitHub for a complete list.

v0.6.4 (January 12, 2017)

Under the hood

  • Test/fix handling of multiple read/write of large netCDF datasets

See the issue tracker on GitHub for a complete list.

v0.6.3 (January 11, 2017)

New features

  • dependency handling

Backwards incompatible API changes

  • raise error when passing non-None versions to unversioned archive methods
  • change API method name: create_archive –> create()
  • change CLI subcommand name: upload –> update

Under the hood

  • improve test coverage

Bug Fixes

  • prevent users from deleting required metadata elements

See the issue tracker on GitHub for a complete list.

v0.6.2 (January 9, 2017)

New features

  • New template in docs for AWS configuration (GH #73)

Backwards incompatible API changes

  • Drop DataArchive properties that access manager (GH #72)
  • Manager archive listing attribute versions changed to version_history

Manager-calling properties converted to methods

See the issue tracker on GitHub for a complete list.

v0.6.1 (January 6, 2017)

See the issue tracker on GitHub for a complete list.

v0.6.0 (January 4, 2017)

New features

  • Explicit versioning & version pinning (GH #62)
  • Explicit dependency tracking (GH #63)
  • Update metadata from the command line
  • Support for version tracking & management with requirements files (GH #70)
  • Configure archive specification on manager table

Set dependencies from Python API on write

DataArchive.update
def update(
    self,
    filepath,
    cache=False,
    remove=False,
    bumpversion='patch',
    prerelease=None,
    dependencies=None,
    **kwargs):

    ...

    self._update_manager(
        checksum,
        kwargs,
        version=next_version,
        dependencies=dependencies)
DataArchive.open
def open(
    self,
    mode='r',
    version=None,
    bumpversion='patch',
    prerelease=None,
    dependencies=None,
    *args,
    **kwargs):
    ...

    updater = lambda *args, **kwargs: self._update_manager(
            *args,
            version=next_version,
            dependencies=dependencies,
            **kwargs)
    ...
DataArchive.get_local_path

similar to DataArchive.open

DataArchive._update_manager
def _update_manager(
        self,
        checksum,
        metadata={},
        version=None,
        dependencies=None):

    # by default, dependencies is the last version of dependencies
    if dependencies is None:
        history = self.history
        if len(history) == 0:
            dependencies = []
        else:
            dependencies = history[-1]['dependencies']

    ....

Under the hood

  • Table schemas have been moved from the dynamo and mongo modules to the BaseDataManager.
  • versions attr is now version_history in table schema and DataArchive method get_versions is now get_version_history()

See the issue tracker on GitHub for a complete list.

v0.5.0 (December 21, 2016)

New features

  • command line download feature

See the issue tracker on GitHub for a complete list.

v0.4.0 (December 15, 2016)

New features

  • create API object from config file

See the issue tracker on GitHub for a complete list.

v0.3.0 (December 14, 2016)

New features

  • cached read/write

See the issue tracker on GitHub for a complete list.

v0.2.0 (December 12, 2016)

See the issue tracker on GitHub for a complete list.

v0.1.0 (November 18, 2017)

See the issue tracker on GitHub for a complete list.