Finding Archives¶
In this section we’ll take a look at finding archives via the command line.
You can find archives from the command line interface or from python. This documentation mirrors the python documentation.
Using listdir
¶
In our database we have many archives. We know that impactlab
is a top-level directory-like namespace in our database. Let’s have a look.
$ datafs listdir impactlab # doctest: +SKIP
labor
climate
conflict
mortality
Ok. We see that labor
, climate
, mortality
and conflict
are all directory-like namespaces groupings below impactlab
. Lets have a look at conflict
.
$ datafs listdir impactlab/conflict
global
Let’s see what is in impactlab/conflict/global
.
$ datafs listdir impactlab/conflict/global
conflict_global_daily.csv
$ datafs listdir impactlab/conflict/global/conflict_global_daily.csv
0.0.1
We can see that there is currently only version 0.0.1
of conflict_global_daily.csv
Using filter
¶
DataFS lets you filter so you can limit the search space on archive names. At the command line, you can use the prefix
, path
, str
, and regex
pattern options to filter archives.
Let’s look at using the prefix
project1_variable1_
which corresponds to the prefix
option, the beginning string of a set of archive names.
$ datafs filter --prefix project1_variable1_ # doctest: +SKIP
project1_variable1_scenario5.nc
project1_variable1_scenario1.nc
project1_variable1_scenario4.nc
project1_variable1_scenario2.nc
project1_variable1_scenario3.nc
We can also filter on path
. In this case we want to filter all NetCDF files that match a specific pattern. We need to set our engine
value to path
and put in our search pattern.
$ datafs filter --pattern *_variable4_scenario4.nc --engine path \
# doctest: +SKIP
project1_variable4_scenario4.nc
project2_variable4_scenario4.nc
project3_variable4_scenario4.nc
project5_variable4_scenario4.nc
project4_variable4_scenario4.nc
We can also filter archives with archive names containing a specific string by setting engine
to str
. In this example we want all archives with the string variable2
.
$ datafs filter --pattern variable2 --engine str # doctest: +ELLIPSIS +SKIP
project1_variable2_scenario1.nc
project1_variable2_scenario2.nc
project1_variable2_scenario3.nc
...
project5_variable2_scenario3.nc
project5_variable2_scenario4.nc
project5_variable2_scenario5.nc
Using search
¶
DataFS search
capabilites are enabled via tagging of archives. The arguments of the search
command are tags associated with a given archive. If archives are not tagged, they cannot be searched. Please see this for a reference on how to tag archives.
Our archives have been tagged with team1
, team2
, or team3
Let’s search for some archives with tag team3
.
$ datafs search team3 # doctest: +ELLIPSIS +SKIP
project2_variable2_scenario2.nc
project5_variable4_scenario1.nc
project1_variable5_scenario4.nc
project3_variable2_scenario1.nc
project2_variable1_scenario1.nc
...
project5_variable1_scenario2.nc
project2_variable5_scenario5.nc
project5_variable2_scenario5.nc
project3_variable2_scenario5.nc
Let’s use get_tags
to have a look at one of our archives’ tags.
$ datafs get_tags project2_variable2_scenario2.nc
team3
We can see that indeed it has been tagged with team3
.
For completeness, let’s have a look at archives with tag of team1
.
$ datafs search team1 # doctest: +ELLIPSIS +SKIP
project1_variable1_scenario4.nc
project1_variable2_scenario2.nc
project1_variable2_scenario5.nc
project1_variable3_scenario3.nc
project1_variable4_scenario1.nc
project1_variable4_scenario4.nc
...
project5_variable3_scenario2.nc
project5_variable3_scenario5.nc
project5_variable4_scenario3.nc
project5_variable5_scenario1.nc
project5_variable5_scenario4.nc
And now let’s have a look at one of them to see what tags are associated with it.
$ datafs get_tags project2_variable5_scenario1.nc
team1
We can see clearly that our archive has been tagged with team1
.
We want your feedback. If you find bugs or have suggestions to improve this documentation, please consider contributing.