Using DataFS Locally¶
This tutorial builds a DataFS server with MongoDB and the local filesystem
Running this example¶
To run this example:
- Create a MongoDB server by following the MongoDB’s Tutorial installation and startup instructions.
- Start the MongoDB server (e.g. mongod –dbpath . –nojournal)
- Follow the steps below
Set up the workspace¶
We need a few things for this example:
>>> from datafs.managers.manager_mongo import MongoDBManager
>>> from datafs import DataAPI
>>> from fs.osfs import OSFS
>>> import os
>>> import tempfile
>>> import shutil
>>>
>>> # overload unicode for python 3 compatability:
>>>
>>> try:
... unicode = unicode
... except NameError:
... unicode = str
Additionally, you’ll need MongoDB and pymongo installed and a MongoDB instance running.
Create an API¶
Begin by creating an API instance:
>>> api = DataAPI(
... username='My Name',
... contact = 'my.email@example.com')
Attach Manager¶
Next, we’ll choose an archive manager. DataFS currently supports MongoDB and DynamoDB managers. In this example we’ll use a MongoDB manager. Make sure you have a MongoDB server running, then create a MongoDBManager instance:
>>> manager = MongoDBManager(
... database_name = 'MyDatabase',
... table_name = 'DataFiles')
If this is the first time you’ve set up this database, you’ll need to create a table:
>>> manager.create_archive_table('DataFiles', raise_on_err=False)
All set. Now we can attach the manager to our DataAPI object:
>>> api.attach_manager(manager)
Attach Service¶
Now we need a storage service. DataFS is designed to be used with remote storage (S3, FTP, etc), but it can also be run on your local filesystem. In this tutorial we’ll use a local service.
First, let’s create a temporary filesystem to use for this example:
>>> temp = tempfile.mkdtemp()
>>> local = OSFS(temp)
We attach this file to the api and give it a name:
>>> api.attach_authority('local', local)
>>> api.default_authority
<DataService:OSFS object at ...>
Create archives¶
Next we’ll create our first archive. An archive must
have an archive_name. In addition, you can supply any
additional keyword arguments, which will be stored as
metadata. To suppress errors on re-creation, use the
raise_on_err=False
flag.
>>> api.create(
... 'my_first_archive',
... metadata = dict(description = 'My test data archive'))
<DataArchive local://my_first_archive>
Retrieve archive metadata¶
Now that we have created an archive, we can retrieve it from anywhere as long as we have access to the correct service. When we retrieve the archive, we can see the metadata that was created when it was initialized.
>>> var = api.get_archive('my_first_archive')
We can access the metadata for this archive through the archive’s
get_metadata()
method:
>>> print(var.get_metadata()['description'])
My test data archive
Add a file to the archive¶
An archive is simply a versioned history of data files. So let’s get started adding data!
First, we’ll create a local file, test.txt
, and put
some data in it:
>>> with open('test.txt', 'w+') as f:
... f.write('this is a test')
Now we can add this file to the archive:
>>> var.update('test.txt', remove=True)
This file just got sent into our archive! And we deleted the local copy:
>>> os.path.isfile('test.txt')
False
Reading from the archive¶
Next we’ll read from the archive. That file object returned by
var.open()
can be read just like a regular file
>>> with var.open('r') as f:
... print(f.read())
...
this is a test
Updating the archive¶
Open the archive and write to the file:
>>> with var.open('w+') as f:
... res = f.write(unicode('this is the next test'))
...
Retrieving the latest version¶
Now let’s make sure we’re getting the latest version:
>>> with var.open('r') as f:
... print(f.read())
...
this is the next test
Looks good!
Cleaning up¶
>>> var.delete()
>>> api.manager.delete_table('DataFiles')
>>> shutil.rmtree(temp)
Next steps¶
The next tutorial describes setting up DataFS for remote obejct stores, such as with AWS storage.