Telemetry & Observability

Dahlke PressCenter integrates OpenTelemetry for metrics collection, providing visibility into both machine status (PLC connection, messages, symbols) and application health (Quartz jobs, runtime metrics).

Metrics are exported via OTLP/HTTP to a local collector and visualized in Grafana, accessible over Tailscale.

Architecture

PressCenter ──OTLP/HTTP──> OTEL Collector ──> Prometheus (metrics)
                                            ──> Loki (logs)
                                                      │
                                                Grafana (:3000)

The Dahlke.Lib.Telemetry project owns all OpenTelemetry setup — meters, instrument definitions, and OTLP exporter configuration. Other projects reference it and record measurements against its static instruments.

Important

PressCenter is an Avalonia desktop app — it does not have an ASP.NET Core IHost. The telemetry setup uses OpenTelemetrySdk.Create() (introduced in SDK 1.10.0), which is the official API for non-hosted applications. This starts the MeterProvider and its periodic exporting reader immediately, without requiring a host to call StartAsync.

Project Structure

Dahlke.Lib.Telemetry/
├── Dahlke.Lib.Telemetry.csproj
├── TelemetrySetup.cs               # OpenTelemetrySdk.Create() + DI registration
├── Instruments/
│   ├── PlcInstruments.cs           # Meter "Dahlke.Plc"
│   ├── JobInstruments.cs           # Meter "Dahlke.Jobs"
│   ├── AppInstruments.cs           # Meter "Dahlke.App"
│   └── StatisticsInstruments.cs    # Meter "Dahlke.Statistics"
└── Settings/
    └── TelemetrySettings.cs        # IOptions<TelemetrySettings>

Configuration

Telemetry is configured in appsettings.json:

{
  "Telemetry": {
    "Enabled": true,
    "ServiceName": "Dahlke.PressCenter",
    "OtlpEndpoint": "http://localhost:4318",
    "ExportIntervalMs": 15000
  }
}

Key

Default

Description

Enabled

true

Master switch. When false, no OTel SDK is registered.

ServiceName

Dahlke.PressCenter

Service name in exported metrics.

OtlpEndpoint

http://localhost:4318

OTLP/HTTP receiver base URL. The code appends /v1/metrics automatically (see note below).

ExportIntervalMs

15000

Periodic export interval in milliseconds.

Note

When setting exporterOptions.Endpoint programmatically in the .NET OTLP SDK, the URL is used as-is — the SDK does not auto-append /v1/metrics (unlike the OTEL_EXPORTER_OTLP_ENDPOINT environment variable, which does). TelemetrySetup.cs therefore appends /v1/metrics to the configured OtlpEndpoint value. Keep OtlpEndpoint set to the base URL (e.g. http://localhost:4318).

Custom Instruments

Four custom meters expose domain-specific metrics.

Meter: Dahlke.Plc

PLC connection health, message flow, and operation latencies.

Instrument

Type

Unit

Description

plc.connection.state

ObservableGauge<int>

Current PLC state (0=INVALID, 1=RUN, 2=STOP)

plc.connection.reconnects

Counter<long>

reconnects

Cumulative reconnect count

plc.communication_lost

Counter<long>

events

CommunicationLost events fired

plc.messages.received

Counter<long>

messages

Total PLC messages received

plc.messages.active

ObservableGauge<int>

messages

Currently active PLC messages

plc.symbols.count

ObservableGauge<int>

symbols

Number of loaded PLC symbols

plc.rpc.duration

Histogram<double>

ms

RPC method call duration

plc.write.duration

Histogram<double>

ms

Symbol write duration

plc.heartbeat.latency

Histogram<double>

ms

Heartbeat TryReadState latency

Meter: Dahlke.Jobs

Quartz job execution metrics. All instruments are tagged with job.name.

Instrument

Type

Unit

Description

job.execution.duration

Histogram<double>

ms

Per-job execution time

job.execution.count

Counter<long>

executions

Executions per job

job.execution.errors

Counter<long>

errors

Failed executions

job.messages.polled

Counter<long>

messages

Messages polled by MessagePollingJob

Meter: Dahlke.App

Application-level health metrics.

Instrument

Type

Unit

Description

app.uptime

ObservableGauge<double>

seconds

Time since application start

app.errors.unhandled

Counter<long>

errors

Unhandled exception count

Meter: Dahlke.Statistics

Production statistics and power monitoring metrics. Power data is sourced from PowerMonitoringPollingJob; sheet count from TotalSheetCounter.

Instrument

Type

Unit

Description

production.sheets.total

ObservableGauge<long>

sheets

Cumulative total sheet count

power.active

ObservableGauge<double>

W

Total active power

power.energy

ObservableGauge<double>

Wh

Total active energy consumed

power.frequency

ObservableGauge<double>

Hz

Grid frequency

power.factor

ObservableGauge<double>

Total power factor

power.phase.voltage

ObservableGauge<double>

V

Per-phase voltage (tagged phase=L1/L2/L3)

power.phase.current

ObservableGauge<double>

A

Per-phase current (tagged phase=L1/L2/L3)

production.sheets.good

ObservableGauge<long>

sheets

Good sheets produced per counter group (tagged counter=shift/job/stack)

production.sheets.waste

ObservableGauge<long>

sheets

Waste sheets per counter group (tagged counter=shift/job/stack)

production.sheets.target

ObservableGauge<long>

sheets

Target sheet count per counter group (tagged counter=shift/job/stack)

production.speed

ObservableGauge<int>

sheets/h

Current production speed

Built-in Instrumentation

Additionally, OpenTelemetry.Instrumentation.Process and OpenTelemetry.Instrumentation.Runtime provide:

  • CPU time, memory usage, thread count (process)

  • GC collections, heap size, thread pool queue length (runtime)

Integration Points

App.axaml.cs

Telemetry is registered during startup. AddDahlkeTelemetry creates the OpenTelemetrySdk eagerly — the MeterProvider and its periodic reader start exporting immediately:

services.AddDahlkeTelemetry(config);

The OpenTelemetrySdk instance is registered as a singleton. On shutdown, App.This_ShutdownRequested disposes it to flush any remaining metrics:

var otelSdk = Ioc.Default.GetService<OpenTelemetry.OpenTelemetrySdk>();
otelSdk?.Dispose();

The AppDomain.CurrentDomain.UnhandledException handler records unhandled errors via AppInstruments.

Warning

Do not use services.AddOpenTelemetry() from the OpenTelemetry.Extensions.Hosting package. That API registers an IHostedService that is only started by IHost — Avalonia has no host, so the MeterProvider would never activate. Use OpenTelemetrySdk.Create() from the base OpenTelemetry package instead.

AdsPlcCommunication

Accepts an optional PlcInstruments via constructor injection. Records:

  • Symbol count after Init()

  • Heartbeat latency and connection state

  • Reconnect and communication-lost events

  • Message received counts

  • RPC and write operation durations

OfflineAdsPlcCommunication

Same injection pattern, so dashboards work in simulation mode. Reports state as RUN and records simulated messages.

Quartz Jobs

Each job accepts an optional JobInstruments. Execution is wrapped with a Stopwatch to record duration, count, and errors:

  • MessagePollingJob — also records job.messages.polled

  • PowerMonitoringPollingJob — also updates StatisticsInstruments power metrics

  • MetalcolorBackgroundJob

TotalSheetCounter

Accepts an optional StatisticsInstruments via constructor. Updates production.sheets.total whenever the cumulative sheet count changes.

Docker Compose Stack

An observability stack is provided at docker/observability/, ready for deployment when Docker is available on the target machine.

cd docker/observability
cp .env.example .env        # edit .env to set GF_SECURITY_ADMIN_PASSWORD
docker compose up -d

The Grafana admin password is read from the .env file (not committed to git). Copy .env.example and set a password before starting the stack.

Service

Image

Port

Purpose

otel-collector

otel/opentelemetry-collector-contrib

4318

Receive and route telemetry

prometheus

prom/prometheus

9090

Metrics storage

loki

grafana/loki

3100

Log aggregation

grafana

grafana/grafana

3000

Dashboards (accessible via Tailscale)

The stack includes:

  • otel-collector-config.yaml — OTLP receiver, batch processor, Prometheus + Loki exporters

  • prometheus.yml — scrapes otel-collector every 15 seconds

  • grafana/provisioning/datasources/datasources.yaml — auto-provisions Prometheus and Loki datasources

Native Installation (Windows)

When Docker is not available, each component can be installed natively on Windows. This section walks through downloading, configuring, and running the full observability stack as Windows services.

Services are managed with WinSW — a single-binary, XML-configured service wrapper (MIT license). Download WinSW-net461.exe from the WinSW releases page and copy it into each service directory, renaming it to match the service ID (e.g. OtelCollector.exe). Place a matching <service-id>.xml file next to it.

Directory Layout

Create a base directory for the stack:

mkdir C:\Observability
mkdir C:\Observability\otel-collector
mkdir C:\Observability\prometheus
mkdir C:\Observability\prometheus\data
mkdir C:\Observability\loki
mkdir C:\Observability\loki\data
mkdir C:\Observability\grafana

Step 1: OpenTelemetry Collector

Download the latest release from the OpenTelemetry Collector Contrib releases page. Choose the otelcol-contrib_<version>_windows_amd64.tar.gz asset.

Extract otelcol-contrib.exe to C:\Observability\otel-collector\.

Create the configuration file at C:\Observability\otel-collector\config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
    resource_to_telemetry_conversion:
      enabled: true

  otlphttp/loki:
    endpoint: http://localhost:3100/otlp

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/loki]

Note

Compared to the Docker Compose config, http://loki:3100 becomes http://localhost:3100 since all services run on the same machine.

Test it manually first:

C:\Observability\otel-collector\otelcol-contrib.exe --config C:\Observability\otel-collector\config.yaml

Register as a Windows service with WinSW. Copy WinSW-net461.exe to C:\Observability\otel-collector\OtelCollector.exe and create C:\Observability\otel-collector\OtelCollector.xml:

<service>
  <id>OtelCollector</id>
  <name>OpenTelemetry Collector</name>
  <description>Receives and routes telemetry data</description>
  <executable>%BASE%\otelcol-contrib.exe</executable>
  <arguments>--config "%BASE%\config.yaml"</arguments>
  <startmode>Automatic</startmode>
  <log mode="roll"/>
  <onfailure action="restart" delay="10 sec"/>
  <onfailure action="restart" delay="30 sec"/>
  <workingdirectory>%BASE%</workingdirectory>
</service>
C:\Observability\otel-collector\OtelCollector.exe install
C:\Observability\otel-collector\OtelCollector.exe start

Step 2: Prometheus

Download the latest release from the Prometheus downloads page. Choose the prometheus-<version>.windows-amd64.zip asset.

Extract prometheus.exe to C:\Observability\prometheus\.

Create the configuration file at C:\Observability\prometheus\prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "otel-collector"
    static_configs:
      - targets: ["localhost:8889"]

Test it manually:

C:\Observability\prometheus\prometheus.exe --config.file=C:\Observability\prometheus\prometheus.yml --storage.tsdb.path=C:\Observability\prometheus\data

Register as a Windows service with WinSW. Copy WinSW-net461.exe to C:\Observability\prometheus\Prometheus.exe and create C:\Observability\prometheus\Prometheus.xml:

<service>
  <id>Prometheus</id>
  <name>Prometheus</name>
  <description>Prometheus metrics storage</description>
  <executable>%BASE%\prometheus.exe</executable>
  <arguments>--config.file="%BASE%\prometheus.yml" --storage.tsdb.path="%BASE%\data"</arguments>
  <startmode>Automatic</startmode>
  <log mode="roll"/>
  <onfailure action="restart" delay="10 sec"/>
  <onfailure action="restart" delay="30 sec"/>
  <workingdirectory>%BASE%</workingdirectory>
</service>

Note

WinSW renames itself internally — the executable must share the same base name as the XML file (e.g. Prometheus.exe + Prometheus.xml).

C:\Observability\prometheus\Prometheus.exe install
C:\Observability\prometheus\Prometheus.exe start

Prometheus UI will be available at http://localhost:9090.

Step 3: Loki

Download the latest release from the Grafana Loki releases page. Choose the loki-windows-amd64.exe.zip asset.

Extract loki-windows-amd64.exe to C:\Observability\loki\ and rename to loki.exe.

Create a minimal configuration file at C:\Observability\loki\loki-config.yaml:

auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: C:\Observability\loki\data
  storage:
    filesystem:
      chunks_directory: C:\Observability\loki\data\chunks
      rules_directory: C:\Observability\loki\data\rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: "2024-01-01"
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  allow_structured_metadata: true

Test it manually:

C:\Observability\loki\loki.exe -config.file=C:\Observability\loki\loki-config.yaml

Register as a Windows service with WinSW. Copy WinSW-net461.exe to C:\Observability\loki\Loki.exe and create C:\Observability\loki\Loki.xml:

<service>
  <id>Loki</id>
  <name>Grafana Loki</name>
  <description>Grafana Loki log aggregation</description>
  <executable>%BASE%\loki.exe</executable>
  <arguments>-config.file="%BASE%\loki-config.yaml"</arguments>
  <startmode>Automatic</startmode>
  <log mode="roll"/>
  <onfailure action="restart" delay="10 sec"/>
  <onfailure action="restart" delay="30 sec"/>
  <workingdirectory>%BASE%</workingdirectory>
</service>
C:\Observability\loki\Loki.exe install
C:\Observability\loki\Loki.exe start

Step 4: Grafana

Download the latest Windows installer (.msi) from the Grafana download page.

Run the installer. The default installation directory is C:\Program Files\GrafanaLabs\grafana.

Grafana installs itself as a Windows service automatically. After installation:

  1. Open http://localhost:3000 in a browser.

  2. Log in with the default credentials (admin / admin), then set a new password.

  3. Add datasources manually via Connections > Data sources:

    Prometheus:

    • Type: Prometheus

    • URL: http://localhost:9090

    • Set as default

    Loki:

    • Type: Loki

    • URL: http://localhost:3100

Alternatively, provision datasources via file. Create C:\Program Files\GrafanaLabs\grafana\conf\provisioning\datasources\datasources.yaml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true

  - name: Loki
    type: loki
    access: proxy
    url: http://localhost:3100
    editable: true

Then restart the Grafana service:

net stop Grafana
net start Grafana

Verification

Once all services are running, verify the stack:

# Check OTEL Collector is accepting OTLP
curl http://localhost:4318

# Check Prometheus targets (should show otel-collector as UP)
curl http://localhost:9090/api/v1/targets

# Check Loki is ready
curl http://localhost:3100/ready

# Open Grafana
Start-Process http://localhost:3000

In Grafana, navigate to Explore, select the Prometheus datasource, and query plc_connection_state to confirm metrics are flowing from PressCenter through the collector.

Windows Firewall

If accessing Grafana remotely via Tailscale, allow inbound traffic on port 3000:

New-NetFirewallRule -DisplayName "Grafana" -Direction Inbound -LocalPort 3000 -Protocol TCP -Action Allow

The other services (ports 4317, 4318, 8889, 9090, 3100) only need local access and should not be exposed externally.

Managing Services

Each WinSW-managed service supports status, start, stop, restart, and uninstall commands via its own executable:

# Check service status
C:\Observability\otel-collector\OtelCollector.exe status
C:\Observability\prometheus\Prometheus.exe status
C:\Observability\loki\Loki.exe status
sc query Grafana

# Stop all observability services
C:\Observability\otel-collector\OtelCollector.exe stop
C:\Observability\prometheus\Prometheus.exe stop
C:\Observability\loki\Loki.exe stop
net stop Grafana

# Start all observability services
C:\Observability\otel-collector\OtelCollector.exe start
C:\Observability\prometheus\Prometheus.exe start
C:\Observability\loki\Loki.exe start
net start Grafana

# Remove a service (if uninstalling)
C:\Observability\otel-collector\OtelCollector.exe uninstall
C:\Observability\prometheus\Prometheus.exe uninstall
C:\Observability\loki\Loki.exe uninstall

Log Export

When telemetry is enabled, Serilog logs are exported via OTLP/HTTP to the OpenTelemetry Collector, which forwards them to Loki for storage and querying in Grafana.

The pipeline is:

Serilog ──WriteTo.OpenTelemetry──> OTEL Collector (:4318) ──otlphttp/loki──> Loki (:3100)

This is configured automatically: when Telemetry.Enabled is true and Telemetry.OtlpEndpoint is set, Log.UpdateDefaultLogger() adds the OpenTelemetry sink alongside the existing Console and File sinks.

The sink uses OtlpProtocol.HttpProtobuf and sets service.name = Dahlke.PressCenter as a resource attribute, so logs can be filtered by service in Grafana’s Explore view with the Loki datasource.

To verify, start PressCenter with the observability stack running, then open Grafana > Explore > Loki and query {service_name="Dahlke.PressCenter"}.

NuGet Packages

The Dahlke.Lib.Telemetry project references:

Package

Version

OpenTelemetry

1.11.2

OpenTelemetry.Exporter.OpenTelemetryProtocol

1.11.2

OpenTelemetry.Instrumentation.Process

1.11.0-beta.1

OpenTelemetry.Instrumentation.Runtime

1.11.0

Note

The base OpenTelemetry package provides OpenTelemetrySdk.Create(). Do not add OpenTelemetry.Extensions.Hosting — it is designed for ASP.NET Core / Generic Host apps and will not work in Avalonia.

Disabling Telemetry

Set Telemetry.Enabled to false in appsettings.json. When disabled, no OpenTelemetry SDK is created and no metrics are exported. The application behaves identically — instrument classes are not instantiated.