Monitoring Stack
Grafana, Loki, Alloy, and Tempo for observability
The observability stack on the Infra Management Cluster consists of Grafana for dashboards, Loki for logs, Alloy for telemetry collection, and Tempo for distributed tracing.
ArgoCD Resources
| Application | Namespace | Source Type | Chart/Path |
|---|---|---|---|
grafana | monitoring | Kustomize | k8s/infra-services/grafana/base |
grafana-loki | grafana-loki | Helm | loki (grafana.github.io) |
grafana-alloy-hub | grafana-alloy | Helm | alloy (grafana.github.io) |
grafana-tempo | grafana-tempo | Helm | tempo-distributed (grafana.github.io) |
File Paths
| Application | File |
|---|---|
grafana | apps/monitoring-services.yaml |
grafana-loki | apps/grafana-loki.yaml |
grafana-alloy-hub | apps/grafana-alloy.yaml |
grafana-tempo | apps/grafana-tempo.yaml |
Grafana
Dashboards and visualisation platform.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: grafana
spec:
syncPolicy:
automated: {}
destination:
namespace: monitoring
server: https://kubernetes.default.svc
project: infra-services
source:
path: k8s/infra-services/grafana/base
repoURL: https://github.com/Titanbay/infra-services
targetRevision: 'main'
Source Structure:
k8s/infra-services/grafana/
└── base/
└── ... (Grafana manifests)
Grafana Loki
Log aggregation system deployed in distributed mode.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: grafana-loki
spec:
syncPolicy:
automated:
prune: true
syncOptions:
- ServerSideApply=true
destination:
namespace: grafana-loki
server: https://kubernetes.default.svc
project: infra-services
source:
chart: loki
repoURL: https://grafana.github.io/helm-charts
targetRevision: 6.31.0
helm:
valuesObject:
deploymentMode: Distributed
loki:
auth_enabled: false
storage:
type: gcs
bucketNames:
chunks: tb-grafana-loki
ruler: tb-grafana-loki
admin: tb-grafana-loki
serviceAccount:
create: false
name: grafana-loki
Key Configuration:
- Deployment mode: Distributed (for production scale)
- Storage: GCS bucket
tb-grafana-loki - Auth: Disabled (internal use)
- Tracing: Enabled with OTEL export to Alloy
Grafana Alloy
Telemetry collection agent (successor to Grafana Agent).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: grafana-alloy-hub
spec:
syncPolicy:
automated:
prune: true
syncOptions:
- ServerSideApply=true
destination:
namespace: grafana-alloy
server: https://kubernetes.default.svc
project: infra-services
source:
chart: alloy
repoURL: https://grafana.github.io/helm-charts
targetRevision: 1.1.2
helm:
valuesObject:
fullnameOverride: grafana-alloy
alloy:
configMap:
create: false
name: grafana-alloy
key: config.alloy
clustering:
enabled: true
name: "grafana-alloy-hub"
extraPorts:
- name: "otel-grpc"
port: 4317
- name: "otel-http"
port: 4318
controller:
type: 'deployment'
replicas: 2
serviceAccount:
create: false
name: grafana-alloy
Key Configuration:
- Clustering enabled for HA
- OTEL ports exposed (4317 gRPC, 4318 HTTP)
- External ConfigMap for configuration
- 2 replicas with topology spread
Grafana Tempo
Distributed tracing backend.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: grafana-tempo
spec:
syncPolicy:
automated: {}
destination:
namespace: grafana-tempo
server: https://kubernetes.default.svc
project: infra-services
source:
chart: tempo-distributed
repoURL: https://grafana.github.io/helm-charts
targetRevision: 1.46.0
helm:
valuesObject:
fullnameOverride: 'grafana-tempo'
serviceAccount:
create: false
name: tempo-gcs
ingester:
replicas: 2
metricsGenerator:
enabled: true
distributor:
replicas: 1
compactor:
replicas: 2
querier:
replicas: 2
Key Configuration:
- Distributed deployment mode
- Metrics generator enabled
- GCS storage via Workload Identity
- Multiple replicas for HA
Source Structure
k8s/infra-services/
├── grafana/
│ └── base/ # Grafana Kustomize manifests
├── grafana-alloy/ # Alloy ConfigMaps and resources
│ └── ...
└── loki/ # Additional Loki resources
└── ...
How to Update
Upgrading Helm Chart Versions
- Update
targetRevisionin the Application YAML - Review the chart’s changelog for breaking changes
- Update
valuesObjectif needed - Commit and push to
main
Modifying Configuration
For Helm-based applications:
- Edit the
valuesObjectin the Application YAML - Commit and push to
main
For Grafana (Kustomize):
- Edit manifests in
k8s/infra-services/grafana/base/ - Commit and push to
main
Alloy Configuration
Alloy uses an external ConfigMap for its configuration:
- Edit
k8s/infra-services/grafana-alloy/resources - The ConfigMap
grafana-alloycontains the Alloy config - Commit and push to
main
Integration
All components are integrated:
graph LR
A[Applications] -->|logs| B[Alloy]
A -->|traces| B
A -->|metrics| B
B -->|logs| C[Loki]
B -->|traces| D[Tempo]
B -->|metrics| E[GCP Monitoring]
C --> F[Grafana]
D --> F
E --> FRelated Resources
| Resource | Purpose |
|---|---|
grafana-loki ServiceAccount | Workload Identity for GCS access |
tempo-gcs ServiceAccount | Workload Identity for GCS access |
grafana-alloy ConfigMap | Alloy pipeline configuration |