Main Content

Lead HPC Architect (High Performance Computing)

Organization
Digital Research Alliance of Canada
Location
Remote, anywhere in Canada
Type
Full time
Salary Range
Interested candidates are asked to supply a statement that details salary expectations for the role.

ABOUT THE ALLIANCE

The Digital Research Alliance of Canada (the Alliance) serves Canadian researchers, with the objective of advancing Canada’s position as a leader in the knowledge economy on the international stage. By integrating, championing, and funding the infrastructure and activities required for advanced research computing (ARC), research data management (RDM) and research software (RS), we provide the platform for the research community to access tools and services faster than ever before. We have an ambitious mandate — to transform how research across all academic disciplines is organized, managed, stored, and used. We work with other ecosystem partners and stakeholders across the country to help provide Canadian researchers with the support they need for leading-edge research excellence, research, innovation, and advancement across all disciplines.

 

POSITION SUMMARY

The Lead HPC Architect is responsible for providing strategic leadership in the development of the technology architecture that supports Alliance initiatives in Advanced Research Computing (ARC). As a member of the Strategy & Planning team, the Lead HPC Architect reports to the Director of Architecture providing architectural leadership and best practices for the organization, with a focus on HPC/supercomputing, HTC (High through-put computing), and associated technologies. This position will deliver innovative solutions and high-quality architectural design services to address challenges in computationally and data-intensive supercomputing environments. This highly influential role will work cohesively other Architects and will collaborate continuously with Alliance team members, various national working groups, DRI ecosystem stakeholders, and Alliance executives.

This is a permanent position, however a secondment from a Canadian higher education institution or other broader public sector entity is an option.

 

RESPONSIBILITIES

· Work in a fast-moving team environment to analyze ARC requirements from diverse stakeholder groups, and transform those into scalable, flexible, and resilient technical architectures.

· Perform architecture options and feasibility analysis, proactively debate alternatives with subject matter experts, and build consensus on recommended architecture within the ARC SME community.

· Deliver well founded ARC technical architecture recommendations and defend the recommendations to the Architecture Review Board with rigorous analysis.

· Deliver comprehensive architecture models and documentation that describe technical architectures for various stakeholders and at various levels of detail.

· Communicate technical information to both technical and non-technical staff and stakeholders and participate in Alliance training initiatives.

· Support the architecture governance process and Architecture Review Board (ARB) through detailed architecture review, analysis, and recommendations.

· Develop strong working relationships with community stakeholders and vendors.

· Validate technical architectures with industry experts

· Work with the vendor and stakeholder community to understand the latest HPC and ARC developments, and how they might be incorporated into the Alliance’s services and offerings.

· Lead working groups and committees in HPC, HTC, AI, and other technology streams.

· Lead experimental and proof-of-concept projects to test feasibility and value of initiatives.

· Participate in a range of national and international committees and working groups, and occasional speaking engagements to provide architectural and technical expertise.

· Participate in procurement exercises such as scenario design, requirements definition, and the evaluation of CFP/RFPs.

· Keep up with HPC/supercomputing emerging trends and market insights, both in academia and in industry.

· Coach, mentor and guide Alliance staff and community members on matters related to HPC architecture and architecture documentation.

· Supervise project team members, as required.

 

QUALIFICATIONS

· Master’s degree in computer science, computational science, or a related area, with up to 10+ years of experience; or equivalent experience/training.

· Proven expert in HPC and HTC, with knowledge of associated infrastructure

· Experience delivering technical architecture design documents.

· 5+ years’ experience working in an academic research environment, or related experience.

· Proficient storytelling and adept use of visual aids and diagrams to effectively communicate designs and strategies

· Demonstrated success in managing geographically dispersed collaborators from diverse disciplines and backgrounds

· Proven leadership skills with a collaborative approach that will facilitate interaction within all levels of the organization, as well as with ecosystem partners to generate high-level stakeholder engagement.

· Ability to develop strong working relationships with stakeholders at all levels

· Ability to be agile and flexible in responding to the changing context and shifting priorities of the research ecosystem.

· Highly advanced skills and demonstrated experience associated with several of the following:

  • Designing and integrating cutting-edge hardware and software resources into complex HPC/supercomputing system solutions.
  • Using enterprise architecture frameworks and methodologies
  • Advanced knowledge of HPC middleware stack including cluster management tools, job schedulers, and resources managers. Examples include Slurm, HTCondor, PBS (or derivatives), Maui, Onesis, OpenHPC, Rocks, etc.
  • Advanced knowledge of high-performance storage technology: e.g., CEPH, GPFS, BeeGFS, etc.
  • In-depth experience in cluster management tasks including deployment, configuration, and troubleshooting of compute nodes, management nodes, network switches, high-performance file systems, and file servers.
  • HPC hardware power and performance analysis.
  • Software performance analysis.
  • Research, design, modification, implementation, and deployment of HPC / data science applications and tools of large-scale scope.
  • Experience researching and evaluating new technology and solutions for complex ARC environments including high-performance computing, high throughput computing, quantum and AI-focused computing, parallel workflows, MPI, OpenMP, virtualization, orchestration tools, high-performance file systems, storage systems, storage, networking at scale, research platforms, edge computing.

NICE-TO-HAVES

· IT consulting experience in a client facing role.

· Experience in virtualization, containerization, and public and private cloud technologies and associated management and orchestration tools.

· Experience with Python, Matlab, R, and other scientific and engineering software and scripting tools.

· Cisco, Cray, Dell, HPE, or IBM training.

· Experience working in large data centers.

· Exposure to Quantum and AI technologies and workloads.

· TOGAF, ITIL, or other industry certifications.

· Fluency or working proficiency in both official languages.

 

BENEFITS / WORK PERKS

In addition to a competitive salary and a rewarding career where you can truly make a difference in the Canadian research community, we offer a comprehensive benefits package that meets the various needs of our diverse team and that spans across Canada, including:

● Comprehensive Benefits Plan, including:

o Health

o Dental

o Long-Term Disability

o Life Insurance

o Flexible Spending Account

o Wellness Spending Account

o Mental Health Supports

● Defined Benefit Pension Plan

● Paid Vacation

● Remote Work

 

Please apply here: Lead HPC Architect

 

The Alliance is strongly committed to equity and inclusion within the community and encourages applications from all qualified candidates, including women, members of racialized groups, people of colour, persons with disabilities, and Indigenous- and 2SLGBTQIA+ identified people.