Skip to main content
Case study · Cloud Automation

AUTOMATED COMPUTE GRIDS

  1. Home
  2. Case studies
  3. Automated Compute Grids
THE FACTORY - AUTOMATED COMPUTE GRIDS

Automated compute grids

A research institute replaced an end-of-life compute grid with Azure Batch and end-to-end automation, so research teams run their workloads themselves with cost and security controls in place.

Microsoft Azure

The compute grid runs on Microsoft Azure, with cloud-native services from end to end.

Azure Batch

High-performance compute workloads on Azure Batch, sized for parallel processing.

Terraform & GitLab CI

Infrastructure described in Terraform, applied through GitLab CI/CD pipelines.

Ansible configuration

Configuration management and orchestration with Ansible (and AWX).

Azure Functions

Serverless compute for event-driven automation and workflow orchestration.

Performance gains

One specific calculation moved from two months to 46 hours.

01Summary

A research institute needed to replace a compute grid that had reached end of support, triggering a cloud migration. International research teams ran long-running scientific workloads on it. An MVP on Azure Compute and Azure Batch showed the platform fit, and we extended it with end-to-end cloud automation so research teams use the public cloud platform as self-service.

02The challenge

Give researchers a consistent way to run their workloads on Azure. Help them size jobs against Azure Batch, Azure Functions and related services. Keep cost and security controls in place. Let platform changes flow through to every environment as code, not as tickets.

03The solution

We started with the solution design: services, components, networking, naming and tagging conventions. The full landing zone was provisioned with Terraform and Ansible (with AWX) through automated CI/CD pipelines on GitLab CI. For each research group we defined a blueprint that could be deployed from input parameters.

Each blueprint shipped with dashboards and billing alerts, so research project leaders kept control of their budget and resource allocation. Access to the platform was provided through Site-to-Site VPN from on-premises and a Point-to-Site VPN for remote workers.

Workload runtimes dropped sharply, with cost savings on top. One specific calculation moved from two months to 46 hours.

  • AWS, Azure
  • Design, Build, Run
  • Everything in code: fast, predictable, consistent
  • Centrally managed cloud infrastructure
  • Automation, Security, Monitoring, Support
Read about our managed public cloud service

The Factory has migrated many workloads from on-premises to public cloud. We help define cloud strategy: cloud-first or hybrid: and run the right workloads in the right cloud. Our architects help with strategy, target operating model, migration, and day-two operations: secure and cost-aware.

Read the service
goToTop