Observability

Content

Understanding how to do cluster monitoring:

Built-in Monitoring Stack

  • Cluster Monitoring: Leverage OpenShift’s integrated Prometheus and Alertmanager

  • Web Console: Use built-in monitoring dashboards and metrics views

  • Cluster Monitoring Operator: Manage the monitoring stack configuration

  • User Workload Monitoring: Enable monitoring for user applications (optional configuration)

Metrics Collection

  • Platform Metrics: Monitor control plane, nodes, and OpenShift components automatically

  • Node Metrics: Collect system-level metrics via built-in node-exporter

  • Application Metrics: Expose application metrics via /metrics endpoints for Prometheus scraping

  • Custom Resources: Monitor custom resource metrics through ServiceMonitor objects

Alerting and Notifications

  • Default Alert Rules: Use pre-configured alerts for cluster health and performance

  • Custom Alerts: Create PrometheusRule objects for application-specific alerts

  • Alertmanager Configuration: Configure notification channels (email, webhooks, etc.)

  • Alert Routing: Set up alert routing and grouping policies

Log Management (Basic)

  • Container Logs: Access pod and container logs via oc logs and Console

  • Event Logs: Monitor Kubernetes events for troubleshooting

  • Audit Logs: Configure API server audit logging (basic level)

  • Journal Logs: Access systemd journal logs on cluster nodes

Health Checks and Probes

  • Liveness Probes: Configure application health checks for automatic restarts

  • Readiness Probes: Ensure pods are ready to receive traffic

  • Startup Probes: Handle slow-starting applications appropriately

  • Cluster Health: Monitor overall cluster component health

Performance Monitoring

  • Resource Utilization: Track CPU, memory, storage, and network usage

  • Capacity Planning: Monitor resource trends and utilization patterns

  • SLI Monitoring: Track basic Service Level Indicators

  • Quota Monitoring: Monitor namespace resource quotas and limits

References

Knowledge Check