Server config
The server config file controls the configurable behaviour of the NLDS. It is a
json file split into dictionary sections, with each section delineating
configuration for a specific part of the program. There is an example
server_config in the templates section of the main nlds package
(nlds.templates.server_config
) to get you started, but this page will
demystify the configuration needed for (a) a local development copy of the nlds,
and (b) a production system spread across several pods/virtual machines.
Note
Please note that the NLDS is still being developed and so the following is subject to change in future versions.
Required sections
There are two required sections for every server_config: authentication
and
rabbitMQ
.
Authentication
This deals with how users are authenticated through the OAuth2 flow used in the client. The following fields are required in the dictionary:
"authentication" : {
"authenticator_backend" : "jasmin_authenticator",
"jasmin_authenticator" : {
"user_profile_url" : "{{ user_profile_url }}",
"user_services_url" : "{{ user_services_url }}",
"oauth_token_introspect_url" : "{{ token_introspect_url }}"
}
}
where authenticator_backend
dictates which form of authentication you would
like to use. Currently the only implemented authenticator is the
jasmin_authenticator
, but there are plans to expand this to also work with
other industry standard authenticators like google and microsoft.
The authenticator setup is then specified in a separate dictionary named after
the authenticator, which is specific to each authenticator. The
jasmin_authenticator
requires, as above, values for user_profile_url
,
user_services_url
, and oauth_token_introspect_url
. This cannot be
divulged publicly on github for JASMIN, so please get in contact for the actual
values to use.
RabbitMQ
This deals with how the nlds connects to the RabbitMQ queue and message brokering system. The following is an outline of what is required:
"rabbitMQ": {
"user": "{{ rabbit_user }}",
"password": "{{ rabbit_password }}",
"heartbeat": "{{ rabbit_heartbeat }}",
"server": "{{ rabbit_server }}",
"vhost": "{{ rabbit_vhost }}",
"exchange": {
"name": "{{ rabbit_exchange_name }}",
"type": "{{ rabbit_exchange_type }}",
"delayed": "{{ rabbit_exchange_delayed }}"
},
"queues": [
{
"name": "{{ rabbit_queue_name }}",
"bindings": [
{
"exchange": "{{ rabbit_exchange_name }}",
"routing_key": "{{ rabbit_queue_routing_key }}"
}
]
}
]
}
Here the user
and password
fields refer to the username and password for
the rabbit server you wish to connect to, which is in turn specified with
server
. vhost
is similarly the virtualhost on the rabbit server that
you wish to connect to. heartbeat
is a recent addition which determines the
heartbeats on the BlockingConnection that the NLDS makes with the rabbit
server. This essentially puts a hard limit on when a connection has to be
responsive by before it is killed by the server, see the rabbit docs
for more details.
The next two dictionaries are context specific. All publishing elements of the
NLDS, i.e. parts that will send messages, will require an exchange to publish
messages to. exchange
is determines that exchange, with three required
subfields: name
, type
, and delayed
. The former two are self
descriptive, they should just be the name of the exchange on the virtualhost
and it’s corresponding type e.g. one of fanout, direct or topic. delay
is a
boolean (true
or false
in json-speak) dictating whether to use the
delay functionality utilised within the NLDS. Note that this requires the rabbit
server have the DelayedRabbitExchange plugin installed.
Exchanges can be declared and created if not present on the virtualhost the first time the NLDS is run, virtualhosts cannot and so will have to be created beforehand manually on the server or through the admin interface. If an exchange is requested but incorrect information given about either its type or delayed status, then the NLDS will throw an error.
queues
is a list of queue dictionaries and must be implemented on consumers,
i.e. message processors, to tell pika
where to take messages from. Each
queue dictionary consists of a name
and a list of bindings
, with each
binding
being a dictionary containing the name of the exchange
the queue
takes messages from, and the routing key that a message must have to be accepted
onto the queue. For more information on exchanges, routing keys, and other
RabbitMQ features, please see Rabbit’s excellent documentation.
Generic optional sections
There are 2 generic sections, i.e. those which are used across the NLDS
ecosystem, but are optional and therefore fall back on a default configuration
if not specified. These are logging
, and general
.
Logging
The logging configuration options look like the following:
"logging": {
"enable": boolean,
"log_level": str - ("none" | "debug" | "info" | "warning" | "error" | "critical"),
"log_format": str - see python logging docs for details,
"add_stdout_fl": boolean,
"stdout_log_level": str - ("none" | "debug" | "info" | "warning" | "error" | "critical"),
"log_files": List[str],
"max_bytes": int,
"backup_count": int
}
These all set default options the native python logging system, with
log_level
being the log level, log_format
being a string describing the
log output format, and rollover describing the frequency of rollover for log
files in the standard manner. For details on all of this, see the python docs
for inbuilt logging. enable
and add_stdout_fl
are boolean flags
controlling log output to files and stdout
respectively, and the
stdout_log_level
is the log level for the stdout logging, if you require it
to be different from the default log level.
log_files
is a list of strings describing the path or paths to log files
being written to. If no log files paths are given then no file logging will be
done. If active, the file logging will be done with a RotatingFileHandler, i.e.
the files will be rotated when they reach a certain size. The threshold size is
determined by max_bytes
and the maximum number of files which are kept after
rotation is controlled by backup_count
, both strings. For more information
on this please refer to the python logging docs.
As stated, these all set the default log options for all publishers and
consumers within the NLDS - these can be overridden on a consumer-specific basis
by inserting a logging
sub-dictionary into a consumer-specific optional
section. Each sub-dictionary has identical configuration options to those listed
above.
General
The general config, as of writing this page, only covers one option: the retry_delays list:
"general": {
"retry_delays": List[int]
}
This retry delays list gives the delay applied to retried messages in seconds, with the ``n``th element being the delay for the ``n``th retry. Setting the value here sets a default for all consumers, but the retry_delays option can be inserted into any consumer-specific config section to override this.
Consumer-specific optional sections
Each of the consumers have their own configuration dictionary, named by
convention as {consumername}_q
, e.g. transfer_put_q
. Each has a set of
default options and will accept both a logging dictionary and a retry_delays
list for consumer-specific override of the default options, mentioned above.
Each consumer also has a specific set of config options, some shared, which will
control its behaviour. The following is a brief rundown of the server config
options for each consumer.
NLDS Worker
The server config section is nlds_q
, and the following options are available:
"nlds_q": {
"logging": [standard_logging_dictionary],
"retry_delays": List[int],
"print_tracebacks_fl": boolean
}
Not much specifically happens in the NLDS worker that requires configuration, so
it basically just has the default settings. One that has not been covered yet,
print_tracebacks_fl
, is a boolean flag to control whether the full
stacktrace of any caught exception is sent to the logger. This is a standard
across all consumers. You may set retry_delays if you wish but the NLDS worker
doesn’t retry messages specifically, only in the case of something going
unexpectedly wrong.
Indexer
Server config section is index_q
, and the following options are available:
"index_q": {
"logging": {standard_logging_dictionary},
"retry_delays": List[int],
"print_tracebacks_fl": boolean,
"filelist_max_length": int,
"message_threshold": int,
"max_retries": int,
"check_permissions_fl": boolean,
"check_filesize_fl": boolean,
"max_filesize": int
}
where logging
, retry_delays
, and print_tracebacks_fl
are, as above,
standard configurables within the NLDS consumer ecosystem.
filelist_maxlength
determines the maximum length that any file-list provided
to the indexer consumer during the init (i.e. split) step can be. Any
transaction that is given initially with a list that is longer than this value
will be split down into many sub-transactions with this as a maximum length. For
example, with the default value of 1000, and a transaction with an initial list
size of 2500, will be split into 3 sub-transactions; 2 of them having a
list of 1000 files and the remaining 500 files being put into the third
sub-transaction.
message threshold
is very similar in that it places a limit on the total
size of files within a given filelist. It is applied at the indexing
(nlds.index) step when files have actually been statted, and so will further
sub-divide any sub-transactions at that point if they are too large or are
revealed to contain lots of folders with files in upon indexing. max_retries
control the maximum number of times an entry in a filelist can be attempted to
be indexed, either because it doesn’t exist or the user doesn’t have the
appropriate permissions to access it at time of indexing. This feeds into retry
delays, as each subsequent time a sub-transaction is retried it will be delayed
by the amount specified at that index within the retry_delays
list. If
max_retries
exceeds len(retry_delays)
, then any retries which don’t have
an explicit retry delay to use will use the final element in the retry_delays
list.
check_permissions_fl
and check_filesize_fl
are commonly used boolean
flags to control whether the indexer checks the permissions and filesize of
files respectively during the indexing step. If the filesize is being checked,
max_filesize
determines the maximum filesize, in bytes, of an individual
file which can be added to any given holding. This defaults to 500GB
, but is
typically determined by the size of the cache in front of the tape, which for
the STFC CTA instance is 500GB
(hence the default value).
Cataloguer
The server config entry for the catalog consumer is as follows:
"catalog_q": {
"logging": {standard_logging_dictionary},
"retry_delays": List[int],
"print_tracebacks_fl": boolean,
"max_retries": int,
"db_engine": str,
"db_options": {
"db_name" : str,
"db_user" : str,
"db_passwd" : str,
"echo": boolean
},
default_tenancy: str,
default_tape_url: str
}
where logging
, retry_delays
, and print_tracebacks_fl
are, as above,
standard configurables within the NLDS consumer ecosystem. max_retries
is
similarly available in the cataloguer, with the same meaning as defined above.
Here we also have two keys which control database behaviour via SQLAlchemy:
db_engine
and db_options
. db_engine
is a string which specifies
which SQL flavour you would like SQLAlchemy. Currently this has been tried with
SQLite and PostgreSQL but, given how SQLAlchemy works, we expect few roadblocks
interacting with other database types. db_options
is a further
sub-dictionary specifying the database name (which must be appropriate for
your chosen flavour of database), along with the database username and password
(if in use), respectively controlled by the keys db_name
, db_user
, and
db_password
. Finally in this sub-dictionary echo
, an optional
boolean flag which controls the auto-logging of the SQLAlchemy engine.
Finally default_tenancy
and default_tape_url
are the default values to
place into the Catalog for a new Location’s tenancy
and tape_url
values
if not explicitly defined before reaching the catalog. This will happen if the
user, for example, does not define a tenancy in their client-config.
Transfer-put and Transfer-get
The server entry for the transfer-put consumer is as follows:
"transfer_put_q": {
"logging": {standard_logging_dictionary},
"max_retries": int,
"retry_delays": List[int],
"print_tracebacks_fl": boolean,
"filelist_max_length": int,
"check_permissions_fl": boolean,
"tenancy": str,
"require_secure_fl": false
}
where we have logging
, retry_delays
and print_tracebacks_fl
as their
standard definitions defined above, and max_retries
, filelist_max_length
, and check_permissions_fl
defined the same as for the Indexer consumer.
New definitions for the transfer processor are the tenancy
and
require_secure_fl
, which control minio
behaviour. tenancy
is a
string which denotes the address of the object store tenancy to upload/download
files to/from (e.g. cedadev-o.s3.jc.rl.ac.uk), and require_secure_fl
which specifies whether or not you require signed ssl certificates at the
tenancy location.
The transfer-get consumer is identical except for the addition of config controlling the change-ownership functionality on downloaded files – see Changing ownership of files for details on why this is necessary. The additional config is as follows:
"transfer_get_q": {
...
"chown_fl": boolean,
"chown_cmd": str
}
where chown_fl
is a boolean flag to specify whether to attempt to chown
files back to the requesting user, and chown_cmd
is the name of the
executable to use to chown
said file.
Monitor
The server config entry for the monitor consumer is as follows:
"monitor_q": {
"logging": {standard_logging_dictionary},
"retry_delays": List[int],
"print_tracebacks_fl": boolean,
"db_engine": str,
"db_options": {
"db_name" : str,
"db_user" : str,
"db_passwd" : str,
"echo": boolean
}
}
where logging
, retry_delays
, and print_tracebacks_fl
have the
standard, previously stated definitions, and db_engine
and db_options
are as defined for the Catalog consumer - due to the use of an SQL database on
the Monitor. Note the minimal retry control, as the monitor only retries
messages which failed due to an unexpected exception.
Logger
The server config entry for the Logger consumer is as follows:
"logging_q": {
"logging": {standard_logging_dictionary},
"print_tracebacks_fl": boolean,
}
where the options have been previously defined. Note that there is no special
configurable behaviour on the Logger consumer as it is simply a relay for
redirecting logging messages into log files. It should also be noted that the
log_files
option should be set in the logging sub-dictionary for this to
work properly, which may be a mandatory setting in future versions.
Archive-Put and Archive-Get
Finally, the server config entry for the archive-put consumer is as follows:
"archive_put_q": {
"logging": {standard_logging_dictionary}
"max_retries": int,
"retry_delays": List[int],
"print_tracebacks_fl": boolean,
"tenancy": str,
"check_permissions_fl": boolean,
"require_secure_fl": boolean,
"tape_url": str,
"tape_pool": str,
"query_checksum_fl": boolean,
"chunk_size": int
}
which is a combination of standard configuration, object-store configuration and
as-yet-unseen tape configuration. Firstly, we have the standard options
logging
, max_retries
, retry_delays
, and print_tracebacks_fl
,
which we have defined above. Then we have the object-store configuration options
which we saw previously in the Transfer-put and Transfer-get consumer config, and have
the same definitions.
The latter four options control tape configuration, taoe_url
and
tape_pool
defining the xrootd
url and tape pool at which to attempt to
put files onto - note that these two values are combined together into a single
tape_path
in the archiver. query_checksum
is the next option, is a
boolean flag to control whether the ADLER32 checksum calculated during streaming
is used to check file integrity at the end of a write. Finally chunk_size
is
the size, in bytes, to chunk the stream into when writing into or reading from
the CTA cache. This defaults to 5 MiB as this is the lower limit for
part_size
when uploading back to object-store during an archive-get, but has
not been properly benchmarked or optimised yet.
Note that the above has been listed for the archive-put consumer but are shared by the archive-get consumer. The archive-get does have one additional config option:
"archive_get_q": {
...
"prepare_requeue": int
}
where prepare_requeue
is the prepare-requeue delay, i.e. the delay, in
milliseconds, before an archive recall message is requeued following a negative
read-preparedness query has been made. This defaults to 30 seconds.
Publisher-specific optional sections
There are two, non-consumer, elements to the NLDS which can optionally be configured, listed below.
RPC Publisher
The Remote Procedure Call (RPC) Publisher, the specific rabbit publisher which sits inside the API server and makes RPCs to the databases for quick metadata access from the client, has its own small config section:
"rpc_publisher": {
"time_limit": int,
"queue_exclusivity_fl": boolean
}
where time_limit
is the number of seconds the publisher waits before
declaring the RPC timed out and the receiving consumer non-responsive, and
queue_exclusivity_fl
controls whether the queue declared by the publisher is
exclusive to the publisher. These values default to 30
seconds and True
respectively.
Cronjob Publisher
The Archive-Put process, as described in Archive Put Cronjob, is periodically
initiated by a cronjob which sends a message to the catalog to get the next,
unarchived holding. This requires a small amount of configuration in order to
(a) get access to the object store, (b) change the default tenancy
or
tape_url
, if necessary. As such the allowed config options look like:
"cronjob_publisher": {
"access_key": str,
"secret_key": str,
"tenancy": str,
"tape_url": str
}
where tape_url
is identical to that specified in Archive-Put and Archive-Get, and
access_key
, secret_key
and tenancy
are specified as in the
client config,
referring to the objectstore tenancy located at tenancy
and token
and
secret_key
required for accessing it. In practice only the access_key
and secret_key
are specified during deployment.