Skip to content

FAQs

Connecting from the ScienceCluster to an example.int website (.int domains) does not work. How do I fix this?

Unfortunately the DNS resolution for all domains ending with ".int" domains does not work on the ScienceCluster. This issue will be solved on the next version of the ScienceCluster, and until then you will need to use a workaround to connect to these sites.

Here is an example wget command that fails to resolve a .int domain website:

user@login1:/data/$ wget https://example.int/data.zip
--2021-12-16 14:02:43--  https://example.int/mydata-action.zip
Resolving example.int (example.int)... failed: Name or service not known.
wget: unable to resolve host address 'example.int'

The workaround is to skip the DNS resolution by directly using the IP of the website.

First, we need to find the IP address for the website. To do so you can use a "ping" or run the same wget command from another computer:

user@mylaptop:~$ wget https://example.int/data.zip
--2021-12-16 13:55:31--  https://example.int/data.zip
Resolving example.int (example.int)... 193.147.153.153
Connecting to example.int (example.int)|193.147.153.153|:443... connected.
HTTP request sent, awaiting response... 200
Length: unspecified [application/x-gzip]
Saving to: 'data.zip'@cluster.s3it.uzh.ch:
[..]
The IP address can be found in the Resolving example.int line.

Then replace the domain (example.int) with the IP address of the URL and add the --no-check-certificate option so as to bypass DNS resolution:

user@login1:/data/$ wget --no-check-certificate  https://193.147.153.153/data.zip
Connecting to 193.147.153.153:443... connected.
The certificate's owner does not match hostname '193.147.153.153'
HTTP request sent, awaiting response... 200 
Length: unspecified [application/x-gzip]
Saving to: 'data.zip'

data.zip     [                                                                       <=> ]  57.53M  3.25MB/s    in 14s     

2022-01-13 10:17:51 (3.97 MB/s) - 'data.zip' saved [60323470]

This workaround can be applied to any command or script that requires a URL (e.g., curl, a job that downloads data, lynx, ping, ssh, telnet, etc.).

What will happen to my queued jobs during maintenance?

When ScienceCluster maintenance occurs, S3IT will "drain" the ScienceCluster nodes so that the hardware and/or software used within the cluster can be updated. When a node is "drained", all currently running jobs will be allowed to finished and no additional jobs in the queue will be accepted to run. The maintenance will then be performed once a node has completed all running jobs (i.e., there is no activity on the node).

During this process, the SLURM queue will continue to hold all jobs with their assigned priority. As soon as the ScienceCluster maintenance window has closed, and the nodes are freed from their "drained" status, all jobs in the queue will continue to run normally.

Of note, it will not be possible to schedule jobs with time frames that overlap with a scheduled maintenance window on a node. These jobs will simply be rejected from the queue when you attempt submitting them via sbatch. When this situation occures, you should either adjust the time limit so it doesn't overlap with a maintenance window or simply submit the job(s) after the maintenance has been completed.


Last update: January 31, 2022