Caching Remote Results Locally¶
Setup¶
>>> from datafs.managers.manager_mongo import MongoDBManager
>>> from datafs import DataAPI
>>> from fs.tempfs import TempFS
>>> from fs.s3fs import S3FS
>>> import os
>>> import tempfile
>>> import shutil
Create an API and attach a manager
>>> api = DataAPI(
... username='My Name',
... contact = 'my.email@example.com')
>>>
>>> manager = MongoDBManager(
... database_name = 'MyDatabase',
... table_name = 'DataFiles')
>>>
>>> manager.create_archive_table('DataFiles', raise_on_err=False)
>>> api.attach_manager(manager)
>>>
For this example we’ll use an AWS S3 store. Any filesystem will work, though:
>>> s3 = S3FS(
... 'test-bucket',
... aws_access_key='MY_KEY',
... aws_secret_key='MY_SECRET_KEY')
>>>
>>> api.attach_authority('aws', s3)
Create an archive
>>> var = api.create(
... 'caching/archive.txt',
... metadata = dict(description = 'My cached remote archive'),
... authority_name='aws')
>>>
>>> with var.open('w+') as f:
... res = f.write(u'hello')
...
>>>
>>> with var.open('r') as f:
... print(f.read())
hello
Let’s peek under the hood to see where this data is stored:
>>> url = var.authority.fs.getpathurl(var.get_version_path())
>>> print(url)
https://test-bucket.s3.amazonaws.com/caching/...AWSAccessKeyId=MY_KEY
Now let’s set up a cache. This would typically be a local or networked directory but we’ll use a temporary filesystem for this example:
>>> cache = TempFS()
>>> api.attach_cache(cache)
Now we can activate caching for our archive:
>>> var.cache()
When we read the data from the cache, it downloads the file for future use:
>>> with var.open('r') as f:
... print(f.read())
hello
Cleanup¶
>>> var.delete()