Finding Archives

In this section we’ll take a look at finding archives via the command line.

You can find archives from the command line interface or from python. This documentation mirrors the python documentation.

Using listdir

In our database we have many archives. We know that impactlab is a top-level directory-like namespace in our database. Let’s have a look.

$ datafs listdir impactlab # doctest: +SKIP
labor
climate
conflict
mortality

Ok. We see that labor, climate, mortality and conflict are all directory-like namespaces groupings below impactlab. Lets have a look at conflict.

$ datafs listdir impactlab/conflict
global

Let’s see what is in impactlab/conflict/global.

$ datafs listdir impactlab/conflict/global
conflict_global_daily.csv
$ datafs listdir impactlab/conflict/global/conflict_global_daily.csv
0.0.1

We can see that there is currently only version 0.0.1 of conflict_global_daily.csv

Using filter

DataFS lets you filter so you can limit the search space on archive names. At the command line, you can use the prefix, path, str, and regex pattern options to filter archives. Let’s look at using the prefix project1_variable1_ which corresponds to the prefix option, the beginning string of a set of archive names.

$ datafs filter --prefix project1_variable1_ # doctest: +SKIP
project1_variable1_scenario5.nc
project1_variable1_scenario1.nc
project1_variable1_scenario4.nc
project1_variable1_scenario2.nc
project1_variable1_scenario3.nc

We can also filter on path. In this case we want to filter all NetCDF files that match a specific pattern. We need to set our engine value to path and put in our search pattern.

$ datafs filter --pattern *_variable4_scenario4.nc --engine path \
# doctest: +SKIP
project1_variable4_scenario4.nc
project2_variable4_scenario4.nc
project3_variable4_scenario4.nc
project5_variable4_scenario4.nc
project4_variable4_scenario4.nc

We can also filter archives with archive names containing a specific string by setting engine to str. In this example we want all archives with the string variable2.

$ datafs filter --pattern variable2 --engine str # doctest: +ELLIPSIS +SKIP
project1_variable2_scenario1.nc
project1_variable2_scenario2.nc
project1_variable2_scenario3.nc
...
project5_variable2_scenario3.nc
project5_variable2_scenario4.nc
project5_variable2_scenario5.nc