Configuring DataFS for your Team¶
This tutorial walks through the process of creating the specification files and setting up resources for use on a large team. It assumes a basic level of familiarity with the purpose of DataFS, and also requires administrative access to any resources you’d like to use, such as AWS.
Set up a connection to AWS¶
To use AWS resources, you’ll need credentials. These are most easily specified in a credentials file.
We’ve provided a sample file
here
:
1 2 3 | [aws-test]
aws_access_key_id=MY_AWS_ACCESS_KEY_ID
aws_secret_access_key=MY_AWS_SECRET_ACCESS_KEY
|
This file is located ~/.aws/credentials
by default, but we’ll tell AWS how
to find it locally using an environment variable for the purpose of this
example:
>>> import os
>>>
>>> # Change this to wherever your credentials file is:
... credentials_file_path = os.path.join(
... os.path.dirname(__file__),
... 'credentials')
...
>>> os.environ['AWS_SHARED_CREDENTIALS_FILE'] = credentials_file_path
Configure DataFS for your organization/use¶
Now that you have a connection to AWS, you can specify how you want DataFS to work. DataFS borrows the idea of profiles, allowing you to have multiple pre-configured file managers at once.
We’ll set up a test profile called “example”
here
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | # Specify a default profile
default-profile: example
# Configure your profiles here
profiles:
# Everything under this key specifies the example profile
example:
api:
# Enter user data for each user
user_config:
contact: me@email.com
username: My Name
# Add multiple data filesystems to use as
# the authoritative source for an archive
authorities:
# The authority "local" is an OSFS
# (local) filesystem, and has the relative
# path "example_data_dir" as it's root.
local:
service: OSFS
args: [example_data_dir]
# The authority "remote" is an AWS S3FS
# filesystem, and uses the "aws-test"
# profile in the aws config file to
# connect to resources on Amazon's us-east-1
remote:
service: S3FS
args: ['test-bucket']
kwargs:
region_name: us-east-1
profile_name: 'aws-test'
# Add one manager per profile
manager:
# This manager accesses the table
# 'my-test-data' in a local instance
# of AWS's DynamoDB. To use a live
# DynamoDB, remove the endpoint_url
# specification.
class: DynamoDBManager
kwargs:
resource_args:
endpoint_url: 'http://localhost:8000/'
region_name: 'us-east-1'
session_args:
profile_name: aws-test
table_name: my-test-data
|
Set up team managers and services¶
Make sure that the directories, buckets, etc. that your services are connecting to exist:
>>> if not os.path.isdir('example_data_dir'):
... os.makedirs('example_data_dir')
Now, boot up an API and create the archive table on your manager that corresponds to the one specified in your
>>> import datafs
>>> api = datafs.get_api(
... profile='example',
... config_file='examples/preconfigured/.datafs.yml')
>>>
>>> api.manager.create_archive_table('my-test-data')
Finally, we’ll set some basic reporting requirements that will be enforced when users interact with the data.
We can require user information when writing/updating an archive.
set_required_user_config()
allows
administrators to set user configuration requirements and provide a prompt to
help users:
>>> api.manager.set_required_user_config({
... 'username': 'your full name',
... 'contact': 'your email address'})
Similarly,
set_required_archive_metadata()
sets the metadata that is required for each archive:
>>> api.manager.set_required_archive_metadata({
... 'description': 'a long description of the archive'})
Attempts by users to create/update archives without these attributes will now fail.
Using the API¶
At this point, any users with properly specified credentials and config files can use the data api.
From within python:
>>> import datafs
>>> api = datafs.get_api(
... profile='example',
... config_file='examples/preconfigured/.datafs.yml')
>>>
>>> archive = api.create(
... 'archive1',
... authority_name='local',
... metadata = {'description': 'my new archive'})
Note that the metadata requirements you specified are enforced. If a user tries to skip the description, an error is raised and the archive is not created:
>>> archive = api.create(
... 'archive2',
... authority_name='local')
...
Traceback (most recent call last):
...
AssertionError: Required value "description" not found. Use helper=True or
the --helper flag for assistance.
>>>
>>> print(next(api.filter()))
archive1
Setting User Permissions¶
Users can be managed using policies on AWS’s admin console. An example policy
allowing users to create, update, and find archives without allowing
them to delete archives or to modify the required metadata specification is
provided
here
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:BatchGetItem",
"dynamodb:BatchWriteItem",
"dynamodb:DescribeTable",
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:Query",
"dynamodb:Scan",
"dynamodb:UpdateItem"
],
"Resource": [
"arn:aws:dynamodb:*:*:table/my-test-data"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:BatchGetItem",
"dynamodb:DescribeTable",
"dynamodb:GetItem",
"dynamodb:Query",
"dynamodb:Scan"
],
"Resource": [
"arn:aws:dynamodb:*:*:table/my-test-data.spec"
]
}
]
}
|
A user with AWS access keys using this policy will see an AccessDeniedException when attempting to take restricted actions:
>>> import datafs
>>> api = datafs.get_api(profile='user')
>>>
>>> archive = api.get_archive('archive1')
>>>
>>> archive.delete()
Traceback (most recent call last):
...
botocore.exceptions.ClientError: An error occurred (AccessDeniedException)
when calling the DeleteItem operation: ...
Teardown¶
A user with full privileges can completely remove archives and manager tables:
>>> api.delete_archive('archive1')
>>> api.manager.delete_table('my-test-data')