Version: 2.x (Latest)

Metrics & Monitoring

Authorizer exposes Prometheus-compatible metrics for monitoring authentication activity, API performance, security events, and infrastructure health.

Endpoints

`/healthz` - Liveness Probe

Returns storage health status. Use with Kubernetes liveness probes.

curl http://localhost:8080/healthz
# {"status":"ok"}

Returns 503 with {"status":"unhealthy","error":"storage unavailable"} when the database is unreachable (details are logged server-side only).

`/readyz` - Readiness Probe

Checks storage and memory store health. Use with Kubernetes readiness probes.

curl http://localhost:8080/readyz
# {"status":"ready"}

Returns 503 with {"status":"not ready","error":"storage unavailable"} when the system is not ready to serve traffic (details are logged server-side only).

`/metrics` - Prometheus Metrics

Serves all metrics in Prometheus exposition format.

/metrics is never on the main HTTP server. It is always served by a separate minimal HTTP server that runs in parallel with the main Gin app (same pattern as running distinct app and metrics listeners). By default --http-port is 8080 and --metrics-port is 8081; --http-port and --metrics-port must differ or the process exits at startup.

The metrics listener is not reachable from other machines by default: --metrics-host defaults to 127.0.0.1, so only loopback can scrape unless you change it (see below).

curl http://127.0.0.1:8081/metrics

Bind address and security

Single host / node-exporter style: Keep defaults (127.0.0.1 + --metrics-port). Run Prometheus (or an agent) on the same host and scrape 127.0.0.1:8081, or use a reverse proxy that forwards from an internal network to that socket.
Docker / Kubernetes / another machine scrapes the pod: Set --metrics-host=0.0.0.0 (or the pod IP interface you use) so the metrics port accepts connections on the container network. Do not put the metrics port on a public ingress or internet-facing load balancer; use a ClusterIP Service (or internal Docker network) and scrape from inside the cluster only.

For Docker EXPOSE vs -p / Compose ports: and Kubernetes pod vs Service exposure, see Docker deployment and Kubernetes deployment.

Available Metrics

HTTP Metrics

Metric	Type	Labels	Description
`authorizer_http_requests_total`	Counter	`method`, `path`, `status`	Total HTTP requests received
`authorizer_http_request_duration_seconds`	Histogram	`method`, `path`	HTTP request latency in seconds

For routes that do not match a registered Gin pattern, path is recorded as unmatched (not the raw URL), to keep Prometheus cardinality bounded.

Authentication Metrics

Metric	Type	Labels	Description
`authorizer_auth_events_total`	Counter	`event`, `status`	Authentication event count
`authorizer_active_sessions`	Gauge	—	Approximate active session count
`authorizer_api_operations_total`	Counter	`protocol`, `operation`, `status`	API operations served, attributed to the protocol they came in on

The protocol label on authorizer_api_operations_total is one of graphql, grpc, or rest (the rest value covers calls made through the grpc-gateway /v1/* surface). operation is the gRPC method / GraphQL operation name and status is ok or error. The same protocol is also recorded on each audit-log entry (under the protocol key of the entry's metadata field), so operations are attributable to their transport in both metrics and the audit trail.

Auth event values:

Event	Description
`login`	User email/password login
`signup`	New user registration
`logout`	User session termination
`forgot_password`	Password reset request
`reset_password`	Password reset completion
`verify_email`	Email verification
`verify_otp`	OTP verification
`magic_link_login`	Magic link authentication
`admin_login`	Admin dashboard login
`admin_logout`	Admin dashboard logout
`oauth_login`	OAuth provider redirect
`oauth_callback`	OAuth provider callback
`token_refresh`	Token refresh
`token_revoke`	Token revocation

Status values: success, failure

Security Metrics

Metric	Type	Labels	Description
`authorizer_security_events_total`	Counter	`event`, `reason`	Security-sensitive events for alerting
`authorizer_client_id_header_missing_total`	Counter	—	Requests with no `X-Authorizer-Client-ID` header (allowed for some routes)

Security event examples:

Event	Reason	Trigger
`invalid_credentials`	`bad_password`	Failed password comparison
`invalid_credentials`	`user_not_found`	Login with non-existent email
`account_revoked`	`login_attempt`	Login to a revoked account
`invalid_admin_secret`	`admin_login`	Failed admin authentication

GraphQL Metrics

Metric	Type	Labels	Description
`authorizer_graphql_errors_total`	Counter	`operation`	GraphQL responses containing errors (HTTP 200 with errors)
`authorizer_graphql_request_duration_seconds`	Histogram	`operation`	GraphQL operation latency
`authorizer_graphql_limit_rejections_total`	Counter	`limit`	Operations rejected for exceeding a configured query limit

The operation label is anonymous for unnamed operations, or op_ + a short SHA-256 prefix of the operation name so client-controlled names cannot create unbounded time series.

GraphQL APIs return HTTP 200 even when the response contains errors. These metrics capture those application-level errors that would otherwise be invisible to HTTP-level monitoring.

The limit label on authorizer_graphql_limit_rejections_total is one of:

Value	What was exceeded	Tunable via
`depth`	Selection-set nesting depth	`--graphql-max-depth`
`complexity`	Total complexity score	`--graphql-max-complexity`
`alias`	Total aliased fields per operation	`--graphql-max-aliases`
`body_size`	HTTP request body size	`--graphql-max-body-bytes`

A sustained non-zero rate on any label usually means either an exploration attempt or a legitimate operation that needs the limit raised. Alert at the rate that distinguishes the two for your traffic profile. See GraphQL hardening for the limits themselves.

Authorization (FGA) Metrics

These metrics cover the embedded fine-grained authorization (OpenFGA) engine. They appear only once FGA is enabled and the corresponding operation has run at least once. See Authorization (FGA).

Metric	Type	Labels	Description
`authorizer_fga_checks_total`	Counter	`operation`, `result`	Access decisions from `check_permissions`. The headline metric for adoption and denial/error alerting.
`authorizer_fga_check_duration_seconds`	Histogram	`operation`	Latency of the client-facing FGA engine reads.
`authorizer_fga_operations_total`	Counter	`operation`, `result`	Non-decision FGA operations (model/tuple management, enumeration, reset) by outcome.

authorizer_fga_checks_total labels:

Label	Values
`operation`	`check_permissions` (each supplied pair is counted individually)
`result`	`allowed` · `denied` · `error` (the engine call failed — fail-closed, so the caller was denied)

authorizer_fga_check_duration_seconds operation: check_permissions · list_permissions. The histogram's _count also gives you a call rate per operation for free.

authorizer_fga_operations_total operation: get_model · write_model · read_tuples · write_tuples · delete_tuples · list_users · expand · list_permissions · reset. result: success · error.

Useful queries:

# FGA denial rate (last 5 minutes)
sum(rate(authorizer_fga_checks_total{result="denied"}[5m]))

# FGA check error rate — should be ~0; a spike means the engine/store is failing closed
sum(rate(authorizer_fga_checks_total{result="error"}[5m]))

# Admin authorization changes (model/tuple writes, resets)
sum by (operation) (increase(authorizer_fga_operations_total{operation=~"write_model|write_tuples|delete_tuples|reset"}[1h]))

# p99 check latency
histogram_quantile(0.99, sum by (le, operation) (rate(authorizer_fga_check_duration_seconds_bucket[5m])))

A non-zero result="error" rate on authorizer_fga_checks_total is an operational signal — the engine or its datastore is failing, and checks are denying as a result. Page on it.

Infrastructure Metrics

Metric	Type	Labels	Description
`authorizer_db_health_check_total`	Counter	`status`	Database health check outcomes (`healthy`/`unhealthy`)

Prometheus Configuration

Add Authorizer as a scrape target in your prometheus.yml:

scrape_configs:
  - job_name: 'authorizer'
    scrape_interval: 15s
    static_configs:
      # In Docker/K8s, use --metrics-host=0.0.0.0 so the scraper can reach the pod/container; scrape via internal DNS/service.
      - targets: ['authorizer:8081']  # default --metrics-port; same host only: use 127.0.0.1:8081

For Kubernetes with service discovery:

scrape_configs:
  - job_name: 'authorizer'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: authorizer
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: '$1:8081'  # metrics port; ensure deployment sets --metrics-host=0.0.0.0 for in-cluster scrape

Grafana Dashboard

Suggested Panels

Authentication Overview:

# Login success rate (last 5 minutes)
rate(authorizer_auth_events_total{event="login",status="success"}[5m])
/ rate(authorizer_auth_events_total{event="login"}[5m])

# Signup rate
rate(authorizer_auth_events_total{event="signup",status="success"}[5m])

# Active sessions
authorizer_active_sessions

Security Alerts:

# Failed login rate (alert if > 10/min)
rate(authorizer_security_events_total{event="invalid_credentials"}[1m]) > 10

# Failed admin login attempts
increase(authorizer_security_events_total{event="invalid_admin_secret"}[5m])

# Revoked account login attempts
increase(authorizer_security_events_total{event="account_revoked"}[5m])

API Performance:

# GraphQL p99 latency
histogram_quantile(0.99, rate(authorizer_graphql_request_duration_seconds_bucket[5m]))

# HTTP p95 latency by path
histogram_quantile(0.95, sum(rate(authorizer_http_request_duration_seconds_bucket[5m])) by (le, path))

# GraphQL error rate
rate(authorizer_graphql_errors_total[5m])

Infrastructure:

# Database health check failure rate
rate(authorizer_db_health_check_total{status="unhealthy"}[5m])

# Request rate by endpoint
sum(rate(authorizer_http_requests_total[5m])) by (path)

Alerting Rules

Example Prometheus alerting rules:

groups:
  - name: authorizer
    rules:
      - alert: HighLoginFailureRate
        expr: rate(authorizer_security_events_total{event="invalid_credentials"}[5m]) > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High login failure rate detected"
          description: "More than 0.5 failed logins/sec for 2 minutes — possible brute force."

      - alert: DatabaseUnhealthy
        expr: authorizer_db_health_check_total{status="unhealthy"} > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Authorizer database health check failing"

      - alert: HighGraphQLErrorRate
        expr: rate(authorizer_graphql_errors_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Elevated GraphQL error rate"

Manual Testing

Start Authorizer and verify metrics are working:

# 1. Start Authorizer (dev mode with SQLite)
make dev

# 2. Check health endpoints
curl http://localhost:8080/healthz
curl http://localhost:8080/readyz

# 3. View raw metrics (default: loopback + metrics port)
curl http://127.0.0.1:8081/metrics

# 4. Generate some auth events via GraphQL
curl -X POST http://localhost:8080/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"mutation { login(params: {email: \"test@example.com\", password: \"wrong\"}) { message } }"}'

# 5. Check metrics again — look for auth and security counters
curl -s http://127.0.0.1:8081/metrics | grep authorizer_auth
curl -s http://127.0.0.1:8081/metrics | grep authorizer_security
curl -s http://127.0.0.1:8081/metrics | grep authorizer_graphql

# 6. Run integration tests
TEST_DBS="sqlite" go test -p 1 -v -run "TestMetrics|TestHealth|TestReady|TestAuthEvent|TestAdminLoginMetrics|TestGraphQLError" ./internal/integration_tests/

CLI Flags

Flag	Default	Description
`--metrics-port`	`8081`	Port for the dedicated Prometheus `/metrics` listener (must differ from `--http-port`)
`--metrics-host`	`127.0.0.1`	Bind address for that dedicated listener only (use `0.0.0.0` for in-cluster or cross-container scrape; never expose on the public internet without a proxy and auth)

/healthz, /readyz, and /health stay on the main HTTP port (--host:--http-port). /metrics is only on the dedicated listener (--metrics-host:--metrics-port).

Endpoints​

/healthz - Liveness Probe​

/readyz - Readiness Probe​

/metrics - Prometheus Metrics​

Bind address and security​

Available Metrics​

HTTP Metrics​

Authentication Metrics​

Security Metrics​

GraphQL Metrics​

Authorization (FGA) Metrics​

Infrastructure Metrics​

Prometheus Configuration​

Grafana Dashboard​

Suggested Panels​

Alerting Rules​

Manual Testing​

CLI Flags​