In addition to the information contained on this page, there are a number of tutorials written by Alex Knudson discussing a way to install specific versions of R and Python using miniconda as well as how to access the department server using jupyter notebooks. You may wish to consult slides for the talk given by Grant Schissler on March 31, 2021 in the graduate student seminar. Finally, the slides for the talk given by Eric Olson on October 4, 2023 are also avaialble.
Since this page has become rather long, please use the following table of contents to navigate to the section you are intrested in:
You should be able to access either server by name or by IP number from on campus. For off-campus you will need the UNR VPN. More information about VPN access is available in the section How to Connect to Okapi.
The goal of Okapi is to make a server available to all graduate students and department faculty for small computational runs and as a convenient software development environment for learning. This improves the research environment in the department and supports our advanced degree programs. Feel free to try things out.
Medium-sized computational runs are now supported on Caprine using the slow batch queue. Note this queue is hidden so it will not appear unless specified explicity with -pslow or with -a to view all queues. For example,
$ sinfo -a
displays the current state of all queues and$ squeue -a
views information about all jobs in the system. See Running Jobs in the Slow Queue for more information.
For very large computations please use the UNR Pronghorn supercomputer.
Note that the Mathematics and Statistics Department owns a fully-paid 32-core CPU node on Pronghorn. This node may be reserved by contacting Mihye Ahn and filling out the appropriate paperwork. While an okapi is arguably cuter than a pronghorn, please use Pronghorn if you have a very large computation to finish.
After connecting through the VPN, access to Okapi and Caprine will be the same as from on campus. Please let me know if you have any difficulty getting the VPN set up.
Here is an example using Singularity to download a container for the Julia programming language. A similar procedure can be followed for R and many other programs used in the mathematical sciences.
First search the package repository for julia.
$ singularity search julia Found 1 users for 'julia' library://julian No collections found for 'julia' Found 5 containers for 'julia' library://sylabs/examples/julia.sif Tags: latest library://sylabs/examples/julia Tags: latest library://dtrudg-utsw/demo/julia Tags: 20190319 latest library://sebastian_mc/default/julia Tags: julia1.1.0 library://crown421/default/juliabase Tags: 1.3.1 1.4.2 latestChoose a version, and pull the container.
$ singularity pull library://crown421/default/juliabase:latest INFO: Downloading library image 137.2MiB / 137.2MiB [=======================================] 100 % 5.6 MiB/s 0s WARNING: Container might not be trusted; run 'singularity verify juliabase_latest.sif' to show who signed itVerify the container with
$ singularity verify juliabase_latest.sif Container is signed by 1 key(s): Verifying partition: FS: 69FC410C07D1F59F435D3E4D8987BB3E9255805E [REMOTE] Steffen Ridderbusch (xps-linux)And finally run the contained by typing[OK] Data integrity verified INFO: Container verified: juliabase_latest.sif
$ singularity exec juliabase_latest.sif julia _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 1.4.2 (2020-05-23) _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release |__/ | julia> 1+1 2 julia> exit();A similar procedure can be used to install specific versions of R, anything else in the repository. It is also possible to create your own containers.
that is accessible only from on campus. Note even though the web interface doesn't use a TLS encrypted socket, login credentials are encrypted using RSA public-key cryptography. Please request any additional R packages from the Ubuntu repository that need to be installed to run your code.
Batch GPU jobs can also be run from the default queue by adding
#SBATCH --gres=gpu:1
to your job configuration file. Note that both GPUs can be scheduled. Generally the first job will run on the V100 and the second on the P100. If you want your program to wait until a particular GPU is available use
#SBATCH --gres=gpu:volta:1
or
#SBATCH --gres=gpu:pascal:1
to specify the V100 and P100 GPUs respectively. More information about using the batch queues is provided in the section Running Programs in the Batch Queue.
The CUDA compiler nvcc is installed and the resulting executables can be launched from the command prompt. If you simultaneously run multiple interactive GPU programs, they will run in shared mode on the V100. Set the environment variable CUDA_VISIBLE_DEVICES=1 to run an interactive GPU program on the P100.
For example, supposing your program is called mygpucalc, then you could run it on the P100 using the commands
$ export CUDA_VISIBLE_DEVICES=1 $ ./mygpucalcSimilarly, if you want to your interactive program to run on the V100, then type
$ export CUDA_VISIBLE_DEVICES=0 $ ./mygpucalc
A more convenient way to run a series of computations is using the Slurm batch scheduling system. Both CPUs, GPUs and RAM are configured as consumable resources. CPU limits are strictly enforced using Linux cgroups while GPU and RAM limits are of an advisory nature. Thus, your program can exceed the memory limits without being deleted. However, if multiple jobs use more memory than expected, the system may experience a performance issue called thrashing in which swap usage dominates the total execution time. The default for a single job is 1 CPU core and 4GB RAM. Additional resources up to 24 cores and 256GB memory can be specified.
Please don't submit GPU jobs without adding
#SBATCH --gres=gpu:1
Otherwise, the GPUs will run in shared mode with a resulting loss of performance.
$ squeue JOBID NAME USER ST TIME MIN CPU REASON 109 dfft.slm ejolson R 0:02 4G 5 None $ scancel 109If you want to cancel all your jobs type
$ scancel -u <your_net_id>
#!/bin/bash
./a.out
Then, in the terminal window type
$ sbatch onecpujob.slm
When there are available resources on the Okapi server, it will execute your program and place the output in a file of the form slurm-X.out where X is the job sequence number displayed when you submitted the job. To check the status of the queue type
$ sinfo
$ squeue
To submit a job to the slow queue add the line
#SBATCH -p slow
to your configuration file.
For example, to run a single processor job in the slow queue create a configuration file oneslowjob.slm of the form
#!/bin/bash
#SBATCH -p slow
./a.out
Then, in the terminal window type
$ sbatch oneslowjob.slm
This job and others submitted in the same way will then be scheduled using a separate queue that will not block other users from getting their work done.
Note that the slow queue is hidden and does not appear unless you explicitly specify -pslow or use the -a option. For example, to view information about jobs in the slow queue type
$ squeue -pslow
#!/bin/bash
Rscript mycalc.R
Then, in the terminal window type
$ sbatch mycalc.slm
Again, if you plan to submit many R script jobs at the same time please use the slow queue by changing mycalc.slm to read
#!/bin/bash
#SBATCH -p slow
Rscript mycalc.R
Note that the slow queue is hidden. To view information about jobs in the slow queue type
$ squeue -pslow
More information about the slow queue may be found in the section Running Jobs in the Slow Queue.
After downloading the demo.R, create a slurm configuration file called demo.slm of the form
#!/bin/bash
#SBATCH --mem=16GB
Rscript demo.R
Then, in the terminal window type
$ sbatch demo.slm
The demo job will take about 2 minutes to run. You can monitor its progress by typing
$ squeue
to see whether the job has starting running or for how long it has been running. Another way to check what calculations are running on the system is using an interactive program called top. Run this program as
$ top
To exit the top program type "q" on the keyboard. After the demo script has finished, you may view the results of the calculation using the command
$ less slurm-X.out
where X is the job sequence number that was assigned when you submitted the job using the sbatch command. Note that less is an interactive text-file viewer that can page up and page down while reading a file. Type "h" for help and "q" to exit the less viewer.
#!/bin/bash
#SBATCH -n 4
#SBATCH --mem=32GB
Rscript mycalc.R
#!/bin/bash
Mscript flops.m
Then, in the terminal window type
$ sbatch flops.slm
Then, in the terminal window type
$ ./mkslm $ for i in *.slm; do sbatch $i; doneMore information about this set of batch jobs was presented in the Fall 2020 graduate student seminar.
$ python3 pi_dartboard.py Number of random points to include in each trial = 100 Number of trials to run = 10000 Doing 10000 trials of 100 points each Executed trial 0 using random seed 23173 with result 85 Executed trial 1000 using random seed 28951 with result 81 Executed trial 2000 using random seed 22201 with result 79 Executed trial 3000 using random seed 54954 with result 74 Executed trial 4000 using random seed 65485 with result 74 Executed trial 5000 using random seed 53049 with result 78 Executed trial 6000 using random seed 10095 with result 86 Executed trial 7000 using random seed 26957 with result 80 Executed trial 8000 using random seed 48687 with result 82 Executed trial 9000 using random seed 39487 with result 81 The value of Pi is estimated to be 3.14905600000000 using 1000000 pointsTo run this program as a batch job the inputs must be specified in the configuration file. This can be done by a configuration file named pi_dartboard.slm of the form
#!/bin/bash
printf '100\n10000\n' | python3 pi_dartboard.py
Then, in the terminal window submit the batch job by typing
$ sbatch pi_dartboard.slm
#!/bin/bash
#SBATCH -n 12
#SBATCH --mem=18GB
mpirun ./mympiprog
Then in the terminal window type
$ sbatch mpijob.slm
Since Slurm will execute each batch job only when resources are available, you can queue many different jobs and they will run one after the other, perhaps overnight, until all the computations are finished.
A multi-threaded MPI parallel job using 4 ranks each with 3 threads requires 12 available CPU cores and a batch configuration file of the form
#!/bin/bash
#SBATCH -n 4
#SBATCH -c 3
export OMP_NUM_THREADS=3
export CILK_NWORKERS=3
mpirun ./hybridprog
Details for submitting a multi-threaded MPI job to the Slurm batch scheduler are the same as for a traditional MPI parallel job.
You may also obtain funding (either external or through an internal grant sponsored, for example, by the graduate school) for pay-as-you go access to the entire cluster. This would be useful if you want to use significantly more than 32-cores for a short period of time.
Since Pronghorn does not support remote desktop but only secure shell, it may be easier to develop programs on Okapi and later move them to Pronghorn to perform the final computations. In this section we describe how to move files back and forth between the two machines as well some modifications which need to be made to the batch submission files.
$ cd
$ mkdir pronghorn
Next mount your Pronghorn home directory by typing
$ cd
$ sshfs pronghorn.rc.unr.edu: pronghorn
At this point is should be possible to see your home directory and files from Pronghorn as if they were local files on Okapi. In particular, you can use the file explorer to drag and drop files and directories from Okapi to Pronghorn.
Before logging out, please unmount your Pronghorn home directory from Okapi. Do this with the command
$ cd
$ fusermount -u pronghorn
This will free up resources on both machines.
Suppose you want to run an R script called mycalc.R which has already been transferred to Pronghorn. This must be done using the batch system. First, log into Pronghorn using a command such as
$ slogin pronghorn.rc.unr.edu -l <your_net_id>
with <your_net_id> replaced by your actual UNR NetID. More information about how to log into Pronghorn may be found here. Like on Okapi, create a mycalc.slm configuration file. The contents, however, are slightly different because Pronghorn deploys R through a Singularity container. For example, a batch configuration file to run R on Pronghorn might look like
#!/bin/bash
#SBATCH -p cpu-s1-ahn-0
#SBATCH -A cpu-s1-ahn-0
#SBATCH --mem=4GB
singularity exec /apps/R/r-base-3.4.3.simg R -q -f mycalc.R
Note that the above configuration file explicitly specifies that 4GB of RAM are needed for running the job. If you do not specify a memory limit, Pronghorn (unlike on Okapi) will assume you want to use all available memory on the node for that one job. Such a job will block all other jobs--even if only using one core--until it is finished and can't start running until the node is fully empty either. If you specify the memory option properly, then it will be possible to simultaneously run 32 single-core jobs on the node at a time.
Submitting to the batch queue is the same as on Okapi and done by typing
$ sbatch mycalc.slm
When cancelling a job it can be difficult to find your job hidden among all the other jobs on the system. To view only your jobs type
$ squeue -u <your_net_id>
with <your_net_id> replaced by your actual UNR NetID. Once you locate the job number you wish to cancel, it can be cancelled using the command "scancel X" where X is the sequence number of the job to cancel. For example, if you wish to cancel job 798644 then type
$ scancel 798644
As with Okapi, you may cancel all your jobs with
$ scancel -u <your_net_id>
Additional help on using Pronghorn is available at https://unrrc.slack.com from the UNR Research Computing team.
------------------------------------------------------------------------ Dual Intel Xeon 6126 Gold 2.60GHz/384GB okapi.math.unr.edu @__@ Welcome to the UNR Mathematics and (oo)\ Statistics Department server! |_/\ \________ \ ===) This system based on Void Linux. |=-----==|\ Unauthorized use prohibited. || || || || ------------------------------------------------------------------------