from fairdatanow import RemoteData
import os
Exploring your remote data in a breeze
In order to to make use of fairdatanow
, you will need a Nextcloud server url
, a user
name and a password
to access a folder on a Nextcloud server. The recommended way to use these credentials in Jupyter notebooks is to store username and password as environment variables on your system. You can retrieve them with the os.getenv()
function.
In this way you avoid typing them directly in the notebook, which is not save if you need to share these notebooks with others.
To get started you need to import the RemoteData
class from the package and instantiate it with the Nextcloud configuration
dictionary. Depending on the amount of files in the cloud storage it might take some time to build the interactive file table.
If you know already for which files you are looking, you can provide an optional regular expression search string search_regex=
as an argument. You can adapt this search string in the interactive table to obtain another filtering.
= {
configuration 'url': "https://laboppad.nl/falnama-project",
'user': os.getenv('NC_AUTH_USER'),
'password': os.getenv('NC_AUTH_PASS')
}
= RemoteData(configuration) remote_data
We can now have a look at the contents of the cloud folder using the RemoteData.listdir()
method. This will create an interactive table with all project files. If we already know better what we are looking for we can shorten the table by providing a regular expression search string search_regex=
as an optional argument. As an example, let’s walk through the process of locating a bunch of xray tif files that the Rijksmuseum created for us.
='xray') remote_data.listdir(search_regex
Please wait while scanning all file paths in remote folder...
Ready building file table for 'falnama-project'
Total number of files and directories: 6342
Total size of the files: 194.8 GiB
If we scroll through this first selection of 209 entries we see that the interactive table contains all kinds of files related to the x-ray images. Using the Custom Search Builder
and/or adjusting the regular expression in the search bar, we can interactively narrow down the filter to shown only the specific files we currently need. You can try this yourself.
It turns out that with search_regex='edited.tif'
we obtain all 28 tif files that we need for further processing. They contain top halves and bottom halves for 14 pages that were imaged using x-rays.
We can now select rows by Shift and Ctrl clicking from this interactive table. Rows that are selected will be colored blue. These selected files can then be downloaded with the .download_selected()
method onto your local machine into a local cache folder. Downloading is skipped if the selected files are already present locally. The local file paths in our cache folder are returned in the files
list for further processing.
= remote_data.download_selected() files
Ready with downloading 28 selected remote files to local cache: /home/frank/.cache/fairdatanow /edited pictures/71803-8_bottom_Falnama_grenz_2-2_edited.tif
Ok, we can now start working with this data to see if we can stitch these halves together. This is the topic of the next section.
Using the custom search builder
In some cases we might need a more powerful filter to precisely select the files that we need. Here is an example of such a predefined search query. See the DataTables documentation here for details on the syntax.
= {
searchBuilder_rma_zips "preDefined": {
"criteria": [
"data": "path", "condition": "contains", "value": ["RMA"]},
{"data": "ext", "condition": "=", "value": [".zip"]}
{
]
} }
='xray', searchBuilder=searchBuilder) remote_data.listdir(search_regex
Please wait while scanning all file paths in remote folder...
Ready building file table for 'falnama-project'
Total number of files and directories: 6342
Total size of the files: 194.8 GiB
FUNCTIONS
RemoteData
RemoteData (configuration)
Recursively scan the contents of a remote webdav server as specified by configuration
.