Disaster Recovery Prerequisites
This page consolidates the prerequisites that must be met before performing any backup/restore operation on a HostedCluster. All disaster recovery guides in this section reference these prerequisites.
General Prerequisites
Ensure the following requirements are met on the Management cluster (connected or disconnected):
- A valid StorageClass configured in the Management cluster.
- Cluster-admin access to the Management cluster.
- Access to online storage compatible with OpenShift ADP cloud storage providers (e.g., S3, Azure, GCP, MinIO).
- HostedControlPlane pods are accessible and functioning correctly.
- Access to the
openshift-adpsubscription through a CatalogSource (version depends on the DR procedure you follow).
Important
Before proceeding with any backup/restore procedure, keep in mind:
- Restoration will occur in a green field environment. After the HostedCluster has been backed up, it must be destroyed to initiate the restoration process.
- Node reprovisioning will take place. Back up workloads in the Data Plane before deleting the HostedCluster.
HostedCluster Service Publishing Strategy Requirements
Critical Requirement for Backup/Restore to a Different Management Cluster
When restoring a HostedCluster to a different Management cluster, all services in the HostedCluster must be configured with a fixed hostname in their servicePublishingStrategy. This applies to all platforms (AWS, Agent, KubeVirt, OpenStack, etc.).
The most critical service is the APIServer, which must have a fixed hostname. Without it, the restore will fail and nodes will be unable to rejoin the cluster.
Why Is This Required?
When a HostedCluster is restored on a different Management cluster:
- The infrastructure endpoints (e.g., Load Balancer addresses, Route URLs) change because they are ephemeral and tied to the original Management cluster.
- Nodes store the KAS (Kube API Server) address in their kubelet configuration. If that address was an ephemeral Load Balancer or Route URL, nodes will point to the old address after restore.
- TLS certificates (SAN - Subject Alternative Name) will not match the new ephemeral endpoints, causing certificate validation failures.
- A fixed hostname configured via DNS allows you to update the DNS record to point to the new Management cluster's endpoint, making the migration transparent for existing nodes.
Minimum Required Configuration
At a minimum, the APIServer service must have a fixed hostname:
spec:
services:
- service: APIServer
servicePublishingStrategy:
type: LoadBalancer
loadBalancer:
hostname: api-int.example.com
Recommended Production Configuration
For production environments, it is strongly recommended to configure all services with fixed hostnames:
spec:
services:
- service: APIServer
servicePublishingStrategy:
type: LoadBalancer
loadBalancer:
hostname: api-int.example.com
- service: OAuthServer
servicePublishingStrategy:
type: Route
route:
hostname: oauth.example.com
- service: OIDC
servicePublishingStrategy:
type: Route
route:
hostname: oidc.example.com
- service: Konnectivity
servicePublishingStrategy:
type: Route
route:
hostname: konnectivity.example.com
- service: Ignition
servicePublishingStrategy:
type: Route
route:
hostname: ignition.example.com
This ensures full service continuity and DNS consistency during the restore process on a different Management cluster.
AWS Self-Managed Platform Specifics
When using AWS platform with self-managed infrastructure, the APIServer can also use a Route service publishing strategy with a fixed hostname:
spec:
platform:
aws:
endpointAccess: Public
services:
- service: APIServer
servicePublishingStrategy:
type: Route
route:
hostname: api.example.com
Platform-Specific Prerequisites
Bare Metal / Agent Provider
InfraEnv Lifecycle
Since the InfraEnv has a different lifecycle than the HostedCluster, it should reside in a namespace separate from that of the HostedControlPlane and must not be deleted during backup or restore procedures.
AWS Provider
- Ensure OIDC provider configuration is accessible for post-restore fixup (see Fixing OIDC After Restore).
- If using S3 for backup storage, ensure IAM roles and policies are configured following the official documentation.
Fixing OIDC After Restore
After completing an OADP restore on AWS, if the control-plane-operator reports WebIdentityErr errors or NodePool nodes remain not-ready due to a missing default security group, run the OIDC disaster recovery command:
hypershift fix dr-oidc-iam \
--hc-name <cluster-name> \
--hc-namespace <namespace> \
--aws-creds ~/.aws/credentials
This re-uploads the OIDC discovery documents using the existing cluster signing key and recreates the IAM OIDC provider if needed. See the AWS Disaster Recovery documentation for full details.