Main Content

On Canada’s Future Digital Research Infrastructure

Dr Ross Dickson
Research Consultant, ACENET

My 35-year career has been spent in either research computing or the support thereof. As an employee of a regional research computing consortium I have a vested interest in the success of NDRIO. I trust the reader will not view this as a​conflict​of interest.

I’ve structured this paper around a list of​issues I’ve encountered when assisting researchers. Thoughts on solutions follow each item.

  • Amount of hardware (CPUs, GPUs, storage,​etc.​)
  • Lifetime of archival storage
  • Openness versus security
  • Programming and development expertise
  • Centralization versus specialization

Amount of hardware​: Demand will always exceed supply since researchers can always come up with new ways to use computers, but continuous expansion and renewal of the hardware is vital. This issue is so obvious I nearly omitted it, but I want to be clear that the supply of hardware stands with the​retention of expert personnel​as​sine qua non.

Lifetime of archival storage​: The ongoing move from paper to digital as the medium of information preservation has led to a commendable focus on Research Data Management in recent years, including new requirements from funding agencies and institutions. However, I see researchers scrambling for technological means to preserve digital information with a lifespan even​approaching ​the lifespan of paper. Such technologies simply do not exist yet, and no cure-all is on the immediate horizon.

The only solution I can imagine for this is a​multi-decade ​commitment of funding to operate digital archives. Such archives must be continually staffed and continually supplied with new equipment, including the capability to move data from obsolete media to newer media. NDRIO might be tempted to do this centrally, but there are risks to over-centralization (see ​“Centralization” below). In my opinion it is more appropriately done at an institutional level, as FRDR is trying to do, but it will be challenging to ensure that each research institution takes its responsibility seriously and assigns appropriate funding. In the old days, if you cut your library budget, it meant your (paper) collection stopped growing. Nowadays, if you cut your digital library funding, your (digital) collection becomes unusable on a timescale of only a few years.

Openness versus security​: As the Open Science movement argues, research thrives when it is widely available. Web sites have proliferated both for publication and for collaboration. However, the simultaneous proliferation of internet security threats, including bad actors both interested directly in the research and those only interested in the computing resources they use (​e.g.​for DDoS attacks, or cryptocurrency mining) means that researchers trying to build web sites for collaboration, data sharing,​etc.​, are forced to become informed about computer security issues beyond what they might reasonably expect or desire.

I suggest two mutually-supporting ways to address this issue. (1) Continue to grow the budget for both computer security personnel and for security training and education for researchers. (2) Enable more Platform-as-a-Service (PaaS) or Software-as-a-Service (Saas) offerings tailored to specific research needs. Such offerings could both improve research data security and reduce the burden on researchers of having to know about so many different threats, but​would require new investments in personnel to create and operate​.

Programming and development expertise​: Researchers are experts in their particular field of study. Few of them should have to become experts in computing in order to forward their research goals. The previous item, ​“Security”, provides a sharp example. But if the researcher is not to be a computer expert, who then is?

The Research Software Engineer is recognized as a specialization in several other countries now (​https://​research​soft​ware​.org/​). NDRIO should encourage the growth and recognition of this specialization within Canada, especially​beyond ​Compute Canada or its successor organization. This is a specialization that should be given its due in research groups and universities.

Centralization versus specialization​: The increasing centralization of DRI in Canada in the past 5 – 10 years has provided enormous benefits for researchers. The common software stack, scheduler, authentication, documentation, and support systems make it possible for researchers to move between clusters with great ease, giving them access to a far larger pool of hardware. These improvements should continue to be built upon and expanded.

But it must be remembered that centralization is a means to an end, not a goal in itself. It can have negative effects.

  • Users with unique needs are occasionally ill-served. The larger and more complicated the resource one manages, the harder it is to make changes to suit​this ​client without harming​that ​client, or driving up your costs, or overtaxing your staff​.
  • A single, large, heterogeneous cluster requires complex and therefore opaque scheduling policies.
  • Single points-of-failure, such as the poor performance or failure of a key filesystem, affect a large number of users.

I encourage NDRIO to separate the ideas of ​“data centre” and ​“a cluster”. A single data centre can host more than one cluster. A smaller cluster can be more homogeneous than a large one, and hence simpler to manage and to use. Smaller clusters can be specialized, and the range of needs of researchers can be handled not by running wildly different workloads on a single cluster, but by directing researchers onto a cluster appropriate to their workload. For example, we might have a (national) machine-learning cluster, loaded with GPUs and with a filesystem designed and managed for the needs and wants of the machine-learning community. Another cluster might be designed for health data with certain tradeoffs between security and accessibility tilted in the direction of security.

In my opinion the organization of resources (hardware​and​human) more along disciplinary lines and less along geographical lines might improve flexibility and responsiveness. This must be done carefully, though, to preserve the sense of community among staff that has been built by Compute Canada over many years. The sense of community and open communication among CCF staff has allowed expertise and innovation to flow​between​disciplines.