Connecting from the ScienceCluster to an example.int website (.int domains) does not work. How do I fix this?¶
Unfortunately the DNS resolution for all domains ending with ".int" domains does not work on the ScienceCluster. This issue will be solved on the next version of the ScienceCluster, and until then you will need to use a workaround to connect to these sites.
Here is an example
wget command that fails to resolve a .int domain website:
user@login1:/data/$ wget https://example.int/data.zip --2021-12-16 14:02:43-- https://example.int/mydata-action.zip Resolving example.int (example.int)... failed: Name or service not known. wget: unable to resolve host address 'example.int'
The workaround is to skip the DNS resolution by directly using the IP of the website.
First, we need to find the IP address for the website. To do so you can use a "ping" or run the same
wget command from another computer:
user@mylaptop:~$ wget https://example.int/data.zip --2021-12-16 13:55:31-- https://example.int/data.zip Resolving example.int (example.int)... 220.127.116.11 Connecting to example.int (example.int)|18.104.22.168|:443... connected. HTTP request sent, awaiting response... 200 Length: unspecified [application/x-gzip] Saving to: 'email@example.com: [..]
Then replace the domain (
example.int) with the IP address of the URL and add the
--no-check-certificate option so as to bypass DNS resolution:
user@login1:/data/$ wget --no-check-certificate https://22.214.171.124/data.zip Connecting to 126.96.36.199:443... connected. The certificate's owner does not match hostname '188.8.131.52' HTTP request sent, awaiting response... 200 Length: unspecified [application/x-gzip] Saving to: 'data.zip' data.zip [ <=> ] 57.53M 3.25MB/s in 14s 2022-01-13 10:17:51 (3.97 MB/s) - 'data.zip' saved 
This workaround can be applied to any command or script that requires a URL (e.g.,
curl, a job that downloads data,
What will happen to my queued jobs during maintenance?¶
When ScienceCluster maintenance occurs, S3IT will "drain" the ScienceCluster nodes so that the hardware and/or software used within the cluster can be updated. When a node is "drained", all currently running jobs will be allowed to finished and no additional jobs in the queue will be accepted to run. The maintenance will then be performed once a node has completed all running jobs (i.e., there is no activity on the node).
During this process, the SLURM queue will continue to hold all jobs with their assigned priority. As soon as the ScienceCluster maintenance window has closed, and the nodes are freed from their "drained" status, all jobs in the queue will continue to run normally.
Of note, it will not be possible to schedule jobs with time frames that overlap with a scheduled maintenance window on a node. These jobs will simply be rejected from the queue when you attempt submitting them via
sbatch. When this situation occures, you should either adjust the time limit so it doesn't overlap with a maintenance window or simply submit the job(s) after the maintenance has been completed.