Tutorial¶
This tutorial documents how to use JDMA from the jdma
command line interface
for a typical user workflow. It details how to set up the user to use JDMA,
and examples of data migration and retrieval. All of the functions of JDMA will
be covered in this tutorial.
Setting up the user, user settings and user info¶
Initialisation¶
To use JDMA, first install the JDMA client software by following the instructions in the Installation section.
Once the JDMA client software is installed, and the virtual environment is activated, the user needs to initialise their account with the command:
jdma init <email_address> -w <default_workspace>
which returns, on success:
** SUCCESS ** - user initiliased with:
Username : <user_name>
Email : <email_address>
This will do two things:
- Create a user account on the JDMA server.
- Create a configuration file in the user’s home directory with the path
~/.jdma.json
. See the section Configuration File for full details of this file. The user may wish to change thedefault_storage
anddefault_gws
settings in the configuration file to their preferred defaults.
User info¶
The user can get the details stored on the server with the command:
jdma info
which will return, on success:
** SUCCESS ** - user info:
username: <user_name>
email : <email_address>
notify : off
User settings¶
JDMA has the ability to notify a user, via email, when their data migration or retrieval has completed. These notifications can be switched on or off, and rely on a valid email address for the user. These settings can be changed using two commands of the JDMA client.
The user can change the email address used for notifications with the command:
jdma email <new_email_address>
which returns, on success:
** SUCCESS ** - user email updated to: <new_email_address>
The user can switch notifications on or off with the command:
jdma notify
which returns, on success:
** SUCCESS ** - user notifications updated to: off
Note that this is a toggle. Issuing the jdma notify
command again will
result in:
** SUCCESS ** - user notifications updated to: *on*
Listing storage¶
The storage systems that data can be uploaded to can be listed with the command:
jdma storage
which results in:
Name Short ID
0 FTP ftp
1 Elastic Tape elastictape
2 Object Store objectstore
(Note that not all of these storage systems may be returned when the user issues the same command.)
When performing jdma put
or jdma migrate
commands, the storage system to
upload the data to can be specified with the -s <storage short id>
option.
However, if no -s
option is given then the default_storage
field in the
user’s .jdma.json
configuration file will be used. This file can be edited,
using a text editor, and the Short ID
of the storage system should be set
for the default_storage
field. See the section Configuration File
for full details of this file.
Uploading data to a storage system¶
There are two different methods of uploading data with JDMA: PUT
and
MIGRATE
. PUT
uploads the data and leaves the data where it is, whereas
MIGRATE
uploads the data and then deletes the data from its original
location.
Two different methods of specifying the data to upload are also available. The first is to specify the directory at the command. The second is to specify a list of file-names to upload in a text-file. The file-names in the text-file must contain the full path to the file and be separated by a line break (i.e. each file-name is on a separate line.)
Put¶
To begin a PUT
upload of a directory to a storage system use the command:
jdma -s <storage_short_id> -w <workspace> put <directory>
The -s <storage_short_id>
can be omitted. In this case the data will be
uploaded to the storage system indicated in the default_storage
field in
the ~/.jdma.json
configuration file. Similarly, the -w <workspace>
can
be omitted and the data will be uploaded as part of the workspace named in the
default_workspace
field in the configuration file.
An optional label for the batch can be specified with the -l <label>
option.
If no label is specified then it will default to the name of the directory to be
uploaded
On success the command will output:
** SUCCESS ** - batch (PUT) requested:
Request id : 17
Request type : PUT
Batch id : 8
Workspace : <workspace>
Label : jdma_test
Date : 2018-12-17 16:15
Ex. storage : <storage_short_id>
Stage : PUT_START
Request¶
To query which requests a user has made, use the command:
jdma request
Which returns a list of all the requests a user has made, information about the requests and the stage each request is at:
req id type batch id user workspace batch label storage date stage
17 PUT 8 nrmassey cedaproc jdma_test elastictape 2018-12-17 16:15 PUTTING
A list of stages and what they mean is available in the section Transfer states.
Group workspace managers can list requests for all users in a group workspace by using the -f filter option: -f workspace (the -w <workspace> is optional, and the default workspace will be used if this argument is not supplied):
jdma request -f workspace -w <workspace>
which returns:
req id type batch id user workspace batch label storage date stage
5 PUT 5 mpritcha cedaproc ukmo-mslp-data objectstore 2019-11-07 10:25 PUT_COMPLETED
7 GET 5 nrmassey cedaproc ukmo-mslp-data objectstore 2019-11-07 14:52 GET_COMPLETED
8 PUT 6 nrmassey cedaproc nrmassey objectstore 2019-11-14 15:23 PUT_START
9 GET 1 mpritcha cedaproc plotting_talk objectstore 2019-11-14 15:39 GET_START
10 GET 5 mpritcha cedaproc ukmo-mslp-data objectstore 2019-11-14 15:39 GET_START
To get information about a specific request use the command:
jdma request <req_id>
which returns:
Request for user: nrmassey
Request id : 17
Request type : PUT
Batch id : 8
Workspace : <workspace>
Batch label : jdma_test
Ex. storage : <storage_short_id>
Request date : 2018-12-17 16:15
Stage : PUTTING
This has assigned both a Request id (17
) and a Batch id (8
) to the
request. These ids are used for identifying the request and batch in the
subsequent commands.
If the user is a group workspace manager then they can list requests for other users in the GWS as well, even if they do not own the request, simply by using the request number.
While the data is being transferred, the stage will go through a number of values describing the process that is currently happening. These stages are (in order):
PUT_START
PUT_BUILDING
PUT_PACKING
PUT_PENDING
PUTTING
VERIFY_PENDING
VERIFY_GETTING
VERIFYING
PUT_TIDY
PUT_COMPLETED
When the stage PUT_COMPLETED
is reached the data is successfully stored on
the external storage and can then be downloaded.
To ensure that the data has been written to the external storage without any
corruption, a verification process takes place. This downloads the data from
the external storage system (VERIFY_GETTING
) and, during the VERIFYING
stage compares a checksum to one that was calculated on the data during the
PUT_START
stage.
JDMA treats each upload (PUT) request as a batch. The data transferred via this PUT request is stored on the external storage as a batch, which can be thought of as a collection of files.
If notifications are switched on, and a valid email address is supplied, then the user will get an email notification once their batch has fully uploaded.
The final stage for a put
request is PUT_COMPLETED
. Requests that
have reached this stage will be shown in the list of requests for one day after
completion.
Querying data on a storage system¶
Batches¶
When a user has some data stored on one or more of the external storage systems, as batches, they can list the batches that belong to them with the command:
jdma batch
which results in:
batch id user workspace batch label storage date stage
7 nrmassey cpdn_rapidwatch OXPEWWES_2_calib elastictape 2018-10-05 13:54 ON_STORAGE
8 nrmassey cedaproc jdma_test elastictape 2018-12-17 16:15 PUTTING
For more information about batches, see the Batches section.
Users can filter which workspace they list batches for using the -w <workspace> option:
jdma batch -w <workspace>
batch id user workspace batch label storage date stage
7 nrmassey cpdn_rapidwatch OXPEWWES_2_calib elastictape 2018-10-05 13:54 ON_STORAGE
Group workspace managers can list batches for all users in a group workspace by using the -f filter option: -f workspace (the -w <workspace> is optional, and the default workspace will be used if this argument is not supplied):
jdma batch -f workspace -w cedaproc
batch id user workspace batch label storage date stage
1 nrmassey cedaproc plotting_talk objectstore 2019-11-04 17:05 ON_STORAGE
5 mpritcha cedaproc ukmo-mslp-data objectstore 2019-11-07 10:25 ON_STORAGE
6 nrmassey cedaproc nrmassey objectstore 2019-11-14 15:23 ON_DISK
To get information about a specific batch, the batch id can be supplied to the batch command:
jdma batch <batch_id>
returns:
Batch for user: nrmassey
Batch id : 7
Workspace : cpdn_rapidwatch
Batch label : OXPEWWES_2_calibration
Ex. storage : elastictape
Date : 2018-10-05 13:54
External id : 14048
Stage : ON_STORAGE
If the user is a group workspace manager then they can list batches for other users in the GWS as well, even if they do not own the batch, simply by using the batch number.
Archives¶
To see which archives belong to the user, use the command:
jdma archives
which returns:
b.id user workspace batch label storage archive size
7 nrmassey cpdn_rapidwatch OXPEWWES_2_c elastictape archive_0000000002 71.7 MB
8 nrmassey cedaproc jdma_test elastictape archive_0000000003 25.0 GB
To show just the archives in a particular workspace use the -w <workspace> option:
jdma archives -w <workspace>
b.id user workspace batch label storage archive size
7 nrmassey cpdn_rapidwatch OXPEWWES_2_c elastictape archive_0000000002 71.7 MB
Group workspace managers can list batches for all users in a group workspace by using the -f filter option: -f workspace (the -w <workspace> is optional, and the default workspace will be used if this argument is not supplied):
jdma archives -f workspace -w cedaproc
:
b.id user workspace batch label storage archive size
1 nrmassey cedaproc plotting_tal objectstore archive_0000000001 26.6 MB
archive_0000000002 296.0 MB
5 mpritcha cedaproc ukmo-mslp-da objectstore archive_0000000062 1.2 MB
archive_0000000006 1.0 MB
To list the archives in a single batch:
jdma archives <batch_id>
If the user is a group workspace manager then they can list archives in the batches of other users in the GWS as well, even if they do not own the batch, simply by using the batch number.
Files¶
To get a list of files belonging to the user, use the command:
jdma files
To limit this to a single workspace:
jdma files -w <workspace>
To list all the files for all users in a workspace:
jdma files -f workspace -w <workspace>
This returns all of the files for the user(s), so it is more useful to list the files in a batch:
jdma files <batch_id>
which returns:
b.id workspace user batch label storage archive file size
7 cedaproc nrmassey OXPEWWES_2_c elastictape archive_0000000002 ents/2003_2004/oxfaga_2003-09-01T01-00-00_2004-04-08T07-00-00.nc 4.0 MB
ents/1993_1994/oxfaga_1993-09-01T01-00-00_1994-04-29T19-00-00.nc 3.9 MB
ents/1990_1991/oxfaga_1990-09-01T01-00-00_1991-04-30T19-00-00.nc 3.8 MB
ents/2007_2008/oxfaga_2007-09-01T01-00-00_2008-04-29T19-00-00.nc 3.8 MB
ents/1994_1995/oxfaga_1994-09-01T07-00-00_1995-04-22T19-00-00.nc 3.7 MB
ents/1995_1996/oxfaga_1995-09-02T01-00-00_1996-04-30T19-00-00.nc 3.7 MB
ents/1996_1997/oxfaga_1996-09-01T01-00-00_1997-04-28T07-00-00.nc 3.7 MB
ents/1997_1998/oxfaga_1997-09-01T01-00-00_1998-04-29T07-00-00.nc 3.7 MB
ents/1991_1992/oxfaga_1991-09-01T13-00-00_1992-04-13T01-00-00.nc 3.6 MB
ents/1992_1993/oxfaga_1992-09-01T01-00-00_1993-04-11T13-00-00.nc 3.6 MB
ents/2002_2003/oxfaga_2002-09-01T01-00-00_2003-04-30T01-00-00.nc 3.6 MB
ents/2000_2001/oxfaga_2000-09-01T01-00-00_2001-04-08T01-00-00.nc 3.6 MB
ents/2008_2009/oxfaga_2008-09-01T01-00-00_2009-04-01T07-00-00.nc 3.6 MB
ents/2005_2006/oxfaga_2005-09-01T01-00-00_2006-04-29T13-00-00.nc 3.6 MB
ents/1999_2000/oxfaga_1999-09-01T01-00-00_2000-04-12T13-00-00.nc 3.5 MB
ents/1998_1999/oxfaga_1998-09-01T01-00-00_1999-04-20T19-00-00.nc 3.5 MB
ents/2006_2007/oxfaga_2006-09-01T01-00-00_2007-04-27T01-00-00.nc 3.5 MB
ents/2004_2005/oxfaga_2004-09-01T01-00-00_2005-04-20T13-00-00.nc 3.5 MB
ents/2001_2002/oxfaga_2001-09-01T01-00-00_2002-04-30T19-00-00.nc 3.4 MB
ents/1989_1990/oxfaga_1989-12-01T01-00-00_1990-04-01T19-00-00.nc 2.5 MB
This produces a hierarchical view of the files, showing which batch and archive
each file belongs to, along with a truncated path. To get a simple list of
files with full pathnames, the -t
option can be used:
jdma -t files <batch_id>
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2003_2004/oxfaga_2003-09-01T01-00-00_2004-04-08T07-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1993_1994/oxfaga_1993-09-01T01-00-00_1994-04-29T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1990_1991/oxfaga_1990-09-01T01-00-00_1991-04-30T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2007_2008/oxfaga_2007-09-01T01-00-00_2008-04-29T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1994_1995/oxfaga_1994-09-01T07-00-00_1995-04-22T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1995_1996/oxfaga_1995-09-02T01-00-00_1996-04-30T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1996_1997/oxfaga_1996-09-01T01-00-00_1997-04-28T07-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1997_1998/oxfaga_1997-09-01T01-00-00_1998-04-29T07-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1991_1992/oxfaga_1991-09-01T13-00-00_1992-04-13T01-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1992_1993/oxfaga_1992-09-01T01-00-00_1993-04-11T13-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2002_2003/oxfaga_2002-09-01T01-00-00_2003-04-30T01-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2000_2001/oxfaga_2000-09-01T01-00-00_2001-04-08T01-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2008_2009/oxfaga_2008-09-01T01-00-00_2009-04-01T07-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2005_2006/oxfaga_2005-09-01T01-00-00_2006-04-29T13-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1999_2000/oxfaga_1999-09-01T01-00-00_2000-04-12T13-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1998_1999/oxfaga_1998-09-01T01-00-00_1999-04-20T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2006_2007/oxfaga_2006-09-01T01-00-00_2007-04-27T01-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2004_2005/oxfaga_2004-09-01T01-00-00_2005-04-20T13-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/2001_2002/oxfaga_2001-09-01T01-00-00_2002-04-30T19-00-00.nc
/group_workspaces/jasmin4/cedaproc/nrmassey/OXPEWWES_2/calibration/events/1989_1990/oxfaga_1989-12-01T01-00-00_1990-04-01T19-00-00.nc
If the user is a group workspace manager then they can list files in the batches of other users in the GWS as well, even if they do not own the batch, simply by using the batch number.
Downloading data from a storage system¶
A user can download the files contained in an uploaded batch with the command:
jdma -r <target_path> get <batch_id>
which, on success, produces:
** SUCCESS ** - retrieval (GET) requested:
Request id : 32
Batch id : <batch_id>
Workspace : cedaproc
Label : OXPEWWES_2_calibration
Date : 2018-12-20 13:08
Request type : GET
Stage : GET_START
Target : <target_path>
This will download all the files in the batch to the directory specified in
<target_path>
.
To download a subset of the files, the user can specify which files to download in a filelist file:
jdma -r <target_path> get <batch_id> <filelist_filename>
which, on success, outputs:
** SUCCESS ** - retrieval (GET) requested:
Request id : 33
Batch id : <batch_id>
Workspace : cedaproc
Label : OXPEWWES_2_calibration
Date : 2018-12-20 13:08
Request type : GET
Stage : GET_START
Target : <target_path>
Filelist : events/1989_1990/oxfaga_1989-12-01T01-00-00_1990-04-01T19-00-00.nc...
The best way to generate the filelist file is to use the files
command with
the --simple
(or -t
) option and pipe the output to a file. This file
can then be edited.
jdma -t files <batch_id> > filelist
To check the progress of a download, use the request
command, either on its
own, or with the request id as an option:
jdma request <request_id>
Request for user: nrmassey
Request id : 33
Request type : GET
Batch id : 10
Workspace : cedaproc
Batch label : calibration
Ex. storage : objectstore
Request date : 2018-12-21 09:57
Stage : GET_COMPLETED
If notifications are switched on, and a valid email address is supplied, then the user will get an email notification once their batch (or sub-batch if a filelist is used) has fully downloaded.
The final stage for a get
request is GET_COMPLETED
. Requests that
have reached this stage will be shown in the list of requests for one day after
completion.
Deleting data on a storage system¶
Users can delete batches from a storage system. Deleting a batch will also
delete any requests associated with it. For example, the put
request issued
to upload the batch to the storage system may still be progressing. If a
delete
command is issued for the batch, the upload will stop, any files that
have been uploaded will be deleted and the put
request will also be deleted.
Users can delete a batch with the command:
jdma delete <batch_id>
This will result in the batch information being displayed, and the user being asked whether they really want to delete the batch:
** WARNING ** - this will delete the following batch:
Batch for user: nrmassey
Batch id : 10
Workspace : cedaproc
Batch label : calibration
Ex. storage : objectstore
Date : 2018-12-20 13:08
External id : gws-cedaproc-0000000002
Stage : ON_STORAGE
Do you wish to continue? [y/N]
Answering y
or yes
to this question will result in the output:
** SUCCESS ** - removal (DELETE) requested:
Request id : 35
Batch id : 10
Workspace : cedaproc
Label : calibration
Date : 2018-12-20 13:08
Request type : DELETE
Stage : DELETE_START
If the user does not wish to be asked this question, or the command is issued in
an automated workflow with no user intervention, the -f
(or --force
)
option can be supplied to the command:
jdma delete -f <batch_id>
This will delete the batch without any user intervention so use it carefully!
If notifications are switched on, and a valid email address is supplied, then the user will get an email notification once their batch is deleted.
The final stage for a delete
request is DELETE_COMPLETED
. Requests that
have reached this stage will be shown in the list of requests for one day after
completion.
Data flow¶
PUT or MIGRATE requests¶
When a user instructs JDMA to put
or migrate
some data to JDMA, the
request will go through a number of stages. These stages are listed below,
and more information is available at Transfer states.
PUT_START -> PUT_BUILDING -> PUT_PACKING -> PUT_PENDING -> PUTTING -> VERIFY_PENDING
By this stage (VERIFY_PENDING
) the data has been transferred to the external
storage system. However, to ensure that the data held on the storage is not
corrupt and matches exactly the data before the transfer, a verification stage
is performed. This downloads the data to a temporary area, automatically and
unseen by the user, then verifies the data using checksum information obtained
in the PUT_START
stage. These stages are:
VERIFY_PENDING -> VERIFY_GETTING -> VERIFYING
Once the VERIFYING
stage is passed the data is known to be uncorrupted.
JDMA then cleans up with two stages:
VERIFYING -> PUT_TIDY
-> PUT_COMPLETED
The request will stay in the PUT_COMPLETED
stage for one day after
completion and then the request will be removed from JDMA. The batch upload
will, however, will remain on the storage.
The batch created by the request also has a number of stages. These are:
ON_DISK -> PUTTING -> ON_STORAGE
With ON_STORAGE
being the last stage and recording that the data uploaded
to the storage successfully.
GET requests¶
When a user instructs JDMA to get
some data from JDMA, the
request will go through a number of stages. These stages are listed below,
and more information is available at Transfer states.
GET_START -> GET_PENDING -> GETTING -> GET_UNPACK
At this stage the data has been retrieved from the storage system. JDMA now does some post-download processing, including unpacking if the data was packed by JDMA on upload, verifying the checksums to detect any corrupt data, and restoring the ownership and permissions of the downloaded files. These stages are:
GET_UNPACK -> GET_RESTORE
The data has now been fully restored and so JDMA cleans up with the two stages:
GET_RESTORE -> GET_TIDY -> GET_COMPLETED
The request will stay in the GET_COMPLETED
stage for one day after
completion and then the request will be removed from JDMA.
DELETE requests¶
When a user instructs JDMA to delete
some data from JDMA, the
request will go through a number of stages. These stages are listed below,
and more information is available at Transfer states.
DELETE_START -> DELETE_PENDING -> DELETING -> DELETE_TIDY -> DELETE_COMPLETED
The request will stay in the DELETE_COMPLETED
stage for one day after
completion and then the request will be removed from JDMA.