Self-Management

How ArgoCD maintains and updates itself on the Infra Management Cluster

The ArgoCD instance on the Infra Management Cluster is self-managing - it uses an ArgoCD Application to deploy and update its own configuration. This creates a powerful but carefully controlled GitOps loop.

Self-Referencing Architecture

graph TD
    A["ArgoCD Manifest<br/>(Initial YAML)"] -->|1. Manual Deploy| B["ArgoCD Instance<br/>(Running in K8s)"]
    B -->|2. Create| C["ArgoCD App<br/>(argocd.yaml)"]
    C -->|3. References<br/>source.path| A
    C -->|4. Manages| D["ArgoCD manages<br/>itself + other<br/>applications"]

Bootstrap Process

  1. Initial Manual Deploy: ArgoCD is initially deployed manually using the HA manifest
  2. Application Creation: An ArgoCD Application resource is created that points to the ArgoCD configuration in Git
  3. Self-Reference: The Application’s source.path points to k8s/infra-services/argocd/overlays/infra-platform-cluster
  4. Ongoing Management: From this point, ArgoCD manages its own configuration, including updates

The ArgoCD Application

The self-referencing Application is defined in k8s/infra-services/argocd/overlays/infra-platform-cluster/apps/argocd.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd
  labels:
    cluster: 'infra-platform-mgmt'
    environment: 'prod'
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  syncPolicy:
    automated:
      prune: false      # Manual prune for safety
      selfHeal: false   # Manual heal for stability
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  project: infra-services
  source:
    path: k8s/infra-services/argocd/overlays/infra-platform-cluster
    repoURL: https://github.com/Titanbay/infra-services
    targetRevision: 'main'

Conservative Sync Policy

The ArgoCD application uses deliberately conservative sync settings:

SettingValueReason
automated.prunefalsePrevents accidental deletion of resources
automated.selfHealfalseAllows manual intervention for drift

This approach prioritises stability over automation for the core infrastructure.

App-of-Apps Pattern

The ArgoCD Application is also an app-of-apps - it manages not only ArgoCD itself but also all other Applications defined in the apps/ and application-sets/ directories.

What Gets Managed

When ArgoCD syncs itself, it also syncs:

  1. ArgoCD core components (from the base HA manifest)
  2. ArgoCD configuration (ConfigMaps, Secrets, Ingress)
  3. All other Applications defined in apps/
  4. All ApplicationSets in application-sets/
  5. AppProjects in projects/

This means a single sync of the argocd Application can bootstrap the entire infrastructure.

Updating ArgoCD

Upgrading the Version

To upgrade ArgoCD:

  1. Download the new HA manifest from the ArgoCD releases
  2. Add it to k8s/infra-services/argocd/base/ (e.g., argocd-ha-3.3.0.yaml)
  3. Update base/kustomization.yaml to reference the new file
  4. Commit and push to main
  5. ArgoCD will detect the change and show OutOfSync status
  6. Manually sync or wait for the next sync cycle

Configuration Changes

For configuration updates:

  1. Modify files in the overlay or base
  2. Commit and push to main
  3. ArgoCD auto-syncs the changes

Patch Files

The overlay uses Kustomize patches for customisation:

PatchPurpose
argo-cd-cm.yamlConfigMap settings
argocd-cmd-params-cm.yamlCommand parameters
argocd-rbac-cm.yamlRBAC policies
argocd-server-resources.yamlServer resource limits
argocd-app-controller-resources.yamlController resources
argocd-repo-server-resources.yamlRepo server resources
dex-env-vars.yamlDex environment variables

Safety Considerations

Cascade Delete Protection

The Application uses a finalizer:

finalizers:
  - resources-finalizer.argocd.argoproj.io

Warning: Deleting the argocd Application from the cluster will trigger a cascading delete of all managed resources, including ArgoCD itself and all other Applications.

Recovery Procedure

If ArgoCD becomes unavailable:

  1. The HA manifest can be reapplied manually: kubectl apply -f argocd-ha-3.2.1.yaml
  2. The Application will resync from Git once ArgoCD is running
  3. All managed Applications will be restored

Notifications

The ArgoCD Application is configured with Slack notifications:

annotations:
  notifications.argoproj.io/subscribe.on-app-synced.slack: platform-infra-notifications
  notifications.argoproj.io/subscribe.on-app-outofsync.slack: platform-infra-notifications
  notifications.argoproj.io/subscribe.on-app-sync-failed.slack: platform-infra-notifications
  notifications.argoproj.io/subscribe.on-app-degraded.slack: platform-infra-notifications

This ensures the platform team is alerted to any issues with the core infrastructure.

Managing TB Platform ArgoCD Instances

The hub ArgoCD also manages the ArgoCD installations on tb-platform clusters via Helm. These are defined in k8s/infra-services/argocd/tb-platform/:

tb-platform/
├── base/
│   └── argocd-helm.yaml      # Helm chart Application
└── overlays/
    ├── tb-platform-dev/      # Dev domain patch
    ├── tb-platform-qa/       # QA domain patch
    └── tb-platform-prod/     # Prod domain patch

The hub creates Applications that deploy the ArgoCD Helm chart to each tb-platform cluster, with environment-specific domain configurations:

EnvironmentArgoCD Domain
Devargocd-dev.nessie-chimera.ts.net
QAargocd-qa.nessie-chimera.ts.net
Prodargocd-prod.nessie-chimera.ts.net