2024 Resource Allocations Competition Results

List of Resource Allocation Competition 2024 Awards

Summary

The Alliance Federation delivers Canada’s national advanced research computing (ARC) platform in partnership with regional digital research infrastructure (DRI) organizations (Compute Ontario, Calcul Québec, ACENET, the BC DRI Group and the Prairie DRI Group) and institutions across Canada. Providing researchers with access to the infrastructure and expertise they need to accomplish globally competitive, data-driven and transformative research, this national ARC platform serves the needs of more than 20,000 researchers, including over 5,483 faculty based at Canadian institutions as of January 1, 2024.

For the 2024-2025 allocation period, the total available capacity of the national ARC platform for the Resource Allocation Competition is 232,560 CPUs, 56,405 vCPUs (virtual CPUs), 4,237 Reference GPU units (RGU-years) and 210.7 PB of storage on Arbutus (University of Victoria), Cedar (Simon Fraser University), Graham (University of Waterloo), Niagara (University of Toronto), and Béluga and Narval (Calcul Québec).

Ongoing growth in researcher demand for resources means that demand continues to outstrip supply. This year’s RAC was able to award, on average, 42% of the total compute requested, 76% of the total storage requested, 21% of the total RGU-years requested and 70% of the total vCPUs requested on the Arbutus, Béluga, Cedar and Graham clouds. The 2024 RAC received 670 projects submitted, 77 less applications than in the previous year mainly due to the increase in the minimum amount of resources that are required to be eligible to apply for the competition.

Note: The Alliance national systems will be replaced by spring 2025 as a result of $225M in capital funding received from federal, provincial and institutional sources. The transition process from current to new systems will necessitate service disruptions during the winter of 2025, although the Alliance will endeavour to minimise the effects on researchers. This transition may make it difficult for projects to achieve their full allocation over the year. Researchers should monitor the status pages (https://status.alliancecan.ca) for updates as we start receiving the new infrastructure and begin the transition process from the current systems to the new ones. The new systems will result in a significant increase of compute and storage resources available to researchers, as well as increased reliability and availability.

Table 1: Applications Submitted to the Resource Allocation Competition

Year	Applications submitted	Year-over- year increase
2024	670	-5%
2023	707	-1%
2022	716	10%
2021	651	10%
2020	590	16%
2019	507	8%
2018	469	15%
2017	409	12%
2016	366	5%

If you have questions about the terminology used in this page, please consult the Technical Glossary. If you have any questions about the overall report, contact allocations@tech.alliancecan.ca

Computational Resources

Minimum Size of RAC Requests and Opportunistic Compute Access

A minimum of compute resources (currently set at 200 core-years for CPU and 25 RGU-years for GPUs) is required to be eligible to submit a RAC application. These minimum values are set in part to control the number of applications requiring peer-review. A minimum RAC award will also ensure higher job priorities than for non-RAC awardees.

All researchers and their sponsored users with an active account can automatically make opportunistic use of CPU and GPU resources on any system. There is no guarantee on how much CPU or GPU can be consumed by non-RAC holders, as their use of the systems is purely opportunistic.

Historical utilization data shows that many groups are able to reach (or even exceed) the RAC minimums specified above. Non-RAC users who want to maximize their compute usage need to consider strategies that ensure:

they regularly have jobs in the queue;
are able to tolerate longer wait times for jobs to start; and,
submit jobs with “optimal” characteristics.

For example, opportunistic jobs with short time limits that request a few cores

on a general purpose (GP) system will generally run much sooner than those requesting dozens of cores.

Please read this useful documentation about allocation scheduling priorities and job scheduling policies or contact support@tech.alliancecan.ca for advice on how to maximize usage for non-RAC awardees.

CPU Allocations

RAC 2024 was able to meet 43% of all of the CPU resources requested, 5% less than last year. Béluga, Cedar, Graham, Narval and Niagara provide approximately 232,560 cores, of which around 80% (on average) is allocated through the RAC.

Table 2: 2024 CPU allocations per cluster

Cluster	Capacity: Available CPU (core-years)*	Demand: core-years requested	Provided: core-years allocated*	% of CPU capacity allocated
Béluga	28,960	46,540	20,164	70%
Cedar	40,000.00	92,684	39,212	98%
Graham	26,000	43,136	15,792	61%
Narval	61,760	112,007	50,503	82%
Niagara	75,840	150,443	63,431	84%
Total	232,560	444,810	189,102	81%

*The amount of resources allocated include resources that will be unavailable during the downtime needed to replace the old infrastructure. For example, an allocation of 100 cores for the 2024-2025 allocation period on a cluster that will have total cluster shutdown for infrastructure upgrades is reported as 100 core-years, even when it is only expected to work for 11 months. The unallocated portion (share) of the cluster made available to researchers for opportunistic use without a RAC award (that is, the difference between available and allocated) includes resources that will likely break because they have no support or due to planned or unforeseen outages, so this portion of the resources available is in practice much smaller, intermittent and unreliable than the numbers in the table may suggest.

Table 3: Historical CPU ask vs. allocation

Year	Capacity: Available CPU (core-years)	Demand: core-years Requested	Provided: core-years allocated	% of the demand allocated
2024	232,560	444,810	189,102	43%
2023	263,326	460,346	216,164	47%
2022	293,312	436,780	234,275	54%
2021	232,704	468,498	188,925	40%
2020	232,704	455,892	181,502	40%
2019	201,320	390,352	157,262	40%
2018	211,020	284,347	158,612	56%
2017	182,760	255,638	148,100	58%
2016	155,952	237,862	128,463	54%

Scaling CPU Requests

As previously stated, there were insufficient ARC resources to fully meet the CPU demand through RAC 2024.

As a result, a scaling function was applied to the 2024 competition to provide a means by which decisions on RAC allocations, in a context of insufficient capacity, could be made. This function, which is endorsed by the chairs of the review committees, was established so that only applications with a science score greater than 2.0 (out of 5) received an allocation. Applicants who did not receive a CPU allocation can still make opportunistic use of system resources via the Rapid Access Service. The average score of all of the applications submitted to the RAC 2024 was 3.9.

CPU requests are scaled based on the overall score of the application and the size of the request. Details and examples of the scaling function are available here. For further questions, contact allocations@tech.alliancecan.ca

GPU Allocations

The Reference GPU Unit (RGU) was introduced as of RAC 2024 to request, allocate and measure the amount of GPU resources that are used. It represents the "cost" of utilizing a particular GPU model, whose RGU value varies based on performance. GPU allocations are from now on set in, and usage charged on, RGU-years and not in GPU-years. For more information about RGU, visit this page. Since this is a new unit, we are not able to provide historical numbers for GPU capacity, demand and supply, and that is why Table 4 uses RGU-years and Table 5 uses GPU-years.

The demand for GPU resources continues to be more competitive than for CPU resources. As Table 5 shows, the request for GPU resources has been stable in the last 4 years, but the gap between demand and capacity remains quite large. In 2024, the RGU allocation rate compared to the demand was 21%.

Table 4: 2024 GPU allocations per cluster (in RGU-years)

GPU resource	Need: Total RGU-years requested	Provided: Total RGU-years allocated	Supply: Allocatable RGU-years	% of RGU capacity allocated
Béluga	4,164	1,219	1,548	79%
Cedar	5,686	1,660	1,922	86%
Graham	2,751	448	598	75%
Narval	11,389	1,695	2,096	81%
Total	23,990	5,022	6,164	81%

*The amount of resources allocated include resources that will be unavailable during the downtime needed to replace the old infrastructure. For example, an allocation of 10 GPU-years or 26 RGU-years for the 2024-2025 allocation period on a cluster that will have total cluster shutdown for infrastructure upgrades is reported as 10 GPU-years or 26 RGU-years, even when it is only expected to work for 11 months. The unallocated portion (share) of the cluster made available to researchers for opportunistic use without a RAC award (that is, the difference between available and allocated) includes resources that will likely break because they have no support or due to planned or unforeseen outages, so this portion of the resources available is in practice much smaller, intermittent and unreliable than the numbers in the table may suggest.

Table 5: Historical GPU demand vs. supply (in GPU-years)

Year	Capacity: Available GPU (GPU-years)	Demand: GPU-years Requested	Provided: GPU-years allocated	% of the demand allocated
2024	2,416	8,947	1,935	22%
2023	2,569	9,826	2,012	20%
2022	3,062	9,070	2,161	24%
2021	2,610	9,980	2,187	22%
2020	2,552	12,885	1,936	15%
2019	1,664	6,555	1,331	20%
2018	976	4,092	840	20%
2017	1,420	2,790	1,047	39%
2016	373	1,357	269	20%

Scaling GPU requests

GPU allocations are determined by the following factors:

the overall score of the RAC application,
the technical justification provided,
evidence of previous GPU utilization,
the research area of application for which GPUs are requested (e.g., Artificial Intelligence, machine learning, etc.),
the size of the research group.

Keep in mind the following:

GPU allocations are constrained, among other things, by the type of GPU requested and available in each system.
The demand for GPUs for AI applications has increased considerably.
In general, RAC applicants find it difficult to estimate their GPU needs, which in most cases are over-requested (and underutilized). We strongly encourage future RAC applicants to do two things before applying: start using the GPUs in order to get a better understanding of your needs and consult with our technical staff before submitting a RAC application. Our staff can provide advice on how to benchmark your codes and calculate your GPU needs as accurately as possible.

Storage Allocations

Storage integrated with Arbutus, Béluga, Cedar, Graham, Narval and Niagara provided approximately 210.7 PB of storage capacity for 2024. This meant that, across all types of storage, 76% of the total storage capacity was allocated.

Table 6: Historical storage demand vs. supply

Year	Capacity: Available storage (TB)	Demand: Storage requested (TB)	Provided: Total storage allocated (TB)	% of demand allocated
2024	210,764	209,642	159,746	76%
2023	190,479	192,363	153,639	72%
2022	190,479	161,186	151,775	87%
2021	150,915	135,427	122,272	91%
2020	143,914	109,718	100,222	90%
2019	101,344	89,898	77,923	94%
2018	63,340	60,126	43,508	80%

Table 7: 2024 Storage demand vs. supply by storage type

Category	Type	Supply (TB)	Need: Storage requested (TB)	Provided: Storage allocated (TB)	% of the capacity allocated
HPC	Project	63,550	81,400	51,438	81%
HPC	Nearline	117,800	87,561	77,536	66%
HPC	dCache	13,467	18,300	13,467	100%
Cloud	Volumes and snapshot storage	4,947	3,389	3,175	64%
Cloud	Object storage	15,803	15,803	11,135	100%
Cloud	Shared cloud storage	3,000	3,223	3,060	102%
	Total	210,764	209,676	159,811	76%

Cloud Allocations

The Arbutus cluster at the University of Victoria has 41,920 allocatable vCPUs (virtual CPUs). These are available via RAC and RAS and are also utilized for internal services such as software development and hosting. Relatively small cloud offerings are also implemented on Cedar, Graham and Béluga. For RAC 2024, the request for compute vCPUs overall increased on average by 30%: a 30% increase in the request for compute vCPUs, and a 43% increase for persistent vCPUs.

Overprovisioning of persistent vCPUs at a 10:1 ratio has allowed a significant expansion in persistent allocations in the previous few years. Demand for real cores has continued to increase, and aging equipment has begun to reduce available capacity; these pressures have pushed utilization numbers well over 80%, a threshold beyond which it becomes more difficult to schedule larger compute instances.

Between Arbutus and the additional nodes on Cedar, Graham, and Béluga, this year’s RAC was able to allocate 70% of the total virtual CPUs requested, 7% less than in the previous year.

Table 8: Historical Cloud vCPU Demand vs. Supply

Year	Capacity: vCPU Years available	Demand: vCPU Years requested	Provided: vCPU Years allocated	% of demand allocated
2024	56,405	46,192	32,511	70%
2023	56,405	35,618	27,313	77%
2022	62,549	34,536	27,444	79%
2021	62,549	30,323	24,443	81%
2020	50,501	18,330	18,229	99%
2019	29,147	19,479	18,511	95%
2018	24,854	12,480	11,829	95%

Assessment Process

The RAC involves two review processes each year:

a scientific review, which is a peer-review process involving more than 100 discipline-specific experts from Canadian academic institutions. These volunteers assess and rate the merits of the computational research projects submitted. The scientific review results in a single score that provides a critical and objective measure to guide allocation decisions; and
a technical review that is undertaken by staff who are responsible for verifying the accuracy of the computational resources needed for each project based on the technical requirements outlined in the application and for making recommendations about the national system to which the resources should be allocated to meet the project's needs.

The overall process is overseen by the Resource Access Program Administrative Committee, which includes representatives from each region and national system host sites.

Note that while new applications receive both scientific and technical reviews, applications submitted via the Fast Track process and Research Platforms and Portals (RPP) with a multi-year award receive only a technical review.

Guiding Principles

RAC is guided by the following principles:

all applications are given fair consideration through both a scientific and technical review process;
resources are awarded based on the merits of the computational research project presented, rather than the merits of the overall research program;
there is no direct correlation between the amount of computational resources needed and the quality (excellence) of the research outcomes of a project - important research can be done with a small amount of computational resources; and
the challenges arising from the shortage of resources and other constraints within the system are shared among all applicants.

Technical Review

The technical review is conducted by technical experts who:

ensure the appropriate system is requested by the PI;
ensure that the required software is available;
evaluate application efficiency and scalability;
identify groups that may need help with application and workflow optimization;
identify discrepancies between the online request and the complete description of the project;
identify special software requirements; and,
provide a technical opinion on the reasonableness of the request.

Technical reviewers are required to sign a Non-Disclosure Agreement prior to accessing any RAC application.

Science Review

New applications submitted to the RAC are peer-reviewed and scored. Scientific reviewers are required to sign a Non-Disclosure Agreement and accept the Conflict of Interest Policy prior to accessing any RAC application.

The final RAC score is based on the following:

the scientific excellence of the specific research project for which computational resources are being requested;
the scientific and technical feasibility of the proposed research project;
the appropriateness of the resources requested to achieve the project’s objectives; and,
the likelihood that the resources requested will be efficiently used.

Applications are reviewed in one of the committees below:

Astronomy, Astrophysics and Cosmology
Bioinformatics
Chemistry, Biochemistry and Biophysics
Computer Sciences and Mathematics
Engineering
Environmental and Earth Sciences
Humanities and Social Sciences
Nano, Materials and Condensed Matter
Neurosciences, Medical Imaging and Medical Physics
Subatomic Physics, Nuclear Physics and Space Physics

Monetary Value of the 2024 Allocations

These values represent an average across the national ARC platform’s facilities and include total capital and operational costs incurred to deliver the resources and associated services. These are not commercial or market values. For the 2024 competition, the value of the resources allocated was calculated using the following rates:

Table 9: Financial Value of RAC Awards

Resources	2024
1 CPU core-year	$107.63
1 RGU-year (for GPU and VGPU)	$1,145.31
1 TB of project storage	$59.24
1 TB of nearline storage	$26.53
1 VCPU-year	$39.15
1 TB of cloud storage (Ceph)	$50.85
1 TB of object storage	$50.85
1 TB of shared filesystem storage	$50.85