Finding Archives

In this section we’ll take a look at finding archives via the command line.

You can find archives from the command line interface or from python. This documentation mirrors the python documentation.

Using listdir

In our database we have many archives. We know that impactlab is a top-level directory-like namespace in our database. Let’s have a look.

$ datafs listdir impactlab # doctest: +SKIP

Ok. We see that labor, climate, mortality and conflict are all directory-like namespaces groupings below impactlab. Lets have a look at conflict.

$ datafs listdir impactlab/conflict

Let’s see what is in impactlab/conflict/global.

$ datafs listdir impactlab/conflict/global
$ datafs listdir impactlab/conflict/global/conflict_global_daily.csv

We can see that there is currently only version 0.0.1 of conflict_global_daily.csv

Using filter

DataFS lets you filter so you can limit the search space on archive names. At the command line, you can use the prefix, path, str, and regex pattern options to filter archives. Let’s look at using the prefix project1_variable1_ which corresponds to the prefix option, the beginning string of a set of archive names.

$ datafs filter --prefix project1_variable1_ # doctest: +SKIP

We can also filter on path. In this case we want to filter all NetCDF files that match a specific pattern. We need to set our engine value to path and put in our search pattern.

$ datafs filter --pattern * --engine path \
# doctest: +SKIP

We can also filter archives with archive names containing a specific string by setting engine to str. In this example we want all archives with the string variable2.

$ datafs filter --pattern variable2 --engine str # doctest: +ELLIPSIS +SKIP