OPENTELEMETRY
OPENTELEMETRY
Initial Setup
Instrumentation
Set up traces
Export Configuration
Sensitivity: Public
Configure export endpoints
Backend Setup
Validation
Rollback to Checkpoint
Sensitivity: Public
curl -L https://github.com/open-telemetry/opentelemetry-java-
instrumentation/releases/latest/download/opentelemetry-javaagent.jar -o
opentelemetry-javaagent.jar
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-agent-config
data:
OTEL_SERVICE_NAME: "${POD_NAME}"
OTEL_TRACES_SAMPLER: "parentbased_traceidratio"
OTEL_TRACES_SAMPLER_ARG: "1.0"
OTEL_METRICS_EXPORTER: "otlp"
OTEL_LOGS_EXPORTER: "otlp"
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production"
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
data:
config.yaml: |
receivers:
otlp:
Sensitivity: Public
protocols:
grpc:
http:
processors:
batch:
memory_limiter:
limit_mib: 1500
spike_limit_mib: 512
check_interval: 5s
exporters:
otlp:
endpoint: "<your-backend-endpoint>"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
metrics:
receivers: [otlp]
exporters: [otlp]
apiVersion: apps/v1
Sensitivity: Public
kind: Deployment
spec:
template:
spec:
containers:
- name: app
volumeMounts:
- name: otel-agent
mountPath: /opt/opentelemetry
env:
- name: JAVA_TOOL_OPTIONS
value: "-javaagent:/opt/opentelemetry/opentelemetry-javaagent.jar"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
envFrom:
- configMapRef:
name: otel-agent-config
volumes:
- name: otel-agent
configMap:
name: otel-agent-jar
#!/bin/bash
Sensitivity: Public
# Update all deployments in the target namespaces
for ns in $NAMESPACES; do
done
done
Verify Instrumentation
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "200m"
memory: 400Mi
Sensitivity: Public
Set Up Health Checks
livenessProbe:
httpGet:
path: /health
port: 13133
readinessProbe:
httpGet:
path: /health
port: 13133
Important Notes:
Ensure backend system can handle the telemetry volume from 80 services
Rollback to Checkpoint
Sensitivity: Public
First, create the ConfigMap for collector configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
memory_limiter:
limit_mib: 1500
spike_limit_mib: 512
check_interval: 5s
exporters:
otlp:
endpoint: "<your-backend-endpoint>"
service:
pipelines:
traces:
receivers: [otlp]
Sensitivity: Public
processors: [memory_limiter, batch]
exporters: [otlp]
metrics:
receivers: [otlp]
exporters: [otlp]
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
ports:
Sensitivity: Public
- containerPort: 4318 # OTLP HTTP
volumeMounts:
- name: config
mountPath: /conf
args:
- --config=/conf/config.yaml
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "200m"
memory: 400Mi
livenessProbe:
httpGet:
path: /health
port: 13133
readinessProbe:
httpGet:
path: /health
port: 13133
volumes:
- name: config
configMap:
name: otel-collector-config
Sensitivity: Public
apiVersion: v1
kind: Service
metadata:
name: otel-collector
spec:
selector:
app: otel-collector
ports:
- name: otlp-grpc
port: 4317
targetPort: 4317
- name: otlp-http
port: 4318
targetPort: 4318
Rollback to Checkpoint
Sensitivity: Public
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
spec:
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:latest
ports:
- containerPort: 16686 # UI
env:
- name: COLLECTOR_OTLP_ENABLED
value: "true"
apiVersion: v1
kind: Service
Sensitivity: Public
metadata:
name: jaeger
spec:
selector:
app: jaeger
ports:
- name: ui
port: 16686
targetPort: 16686
- name: grpc
port: 14250
targetPort: 14250
- name: http
port: 14268
targetPort: 14268
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
Sensitivity: Public
http:
processors:
batch:
memory_limiter:
limit_mib: 1500
spike_limit_mib: 512
check_interval: 5s
exporters:
jaeger:
endpoint: "jaeger:14250"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [jaeger]
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jaeger-ingress
spec:
rules:
- host: jaeger.your-domain.com
Sensitivity: Public
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: jaeger
port:
number: 16686
Rollback to Checkpoint
add a step to use service type of type LoadBalancer to expose the jaegar end
point instead of ingress
apiVersion: v1
kind: Service
metadata:
name: jaeger
Sensitivity: Public
spec:
selector:
app: jaeger
ports:
- name: ui
port: 16686
targetPort: 16686
- name: grpc
port: 14250
targetPort: 14250
- name: http
port: 14268
targetPort: 14268
Once deployed:
Sensitivity: Public
OpenTelemetry Collector Setup
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
data:
config.yaml:
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
Sensitivity: Public
processors: [batch]
exporters: [jaeger]
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: config
mountPath: /conf
volumes:
- name: config
configMap:
Sensitivity: Public
name: otel-collector-config
spec:
containers:
- name: app
image: your-app:version
env:
- name: JAVA_TOOL_OPTIONS
value: "-javaagent:/opt/opentelemetry-javaagent.jar"
- name: OTEL_SERVICE_NAME
value: "your-service-name"
- name: OTEL_TRACES_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"
volumeMounts:
- name: otel-agent
mountPath: /opt
volumes:
- name: otel-agent
emptyDir: {}
initContainers:
Sensitivity: Public
- name: otel-agent-downloader
image: curlimages/curl
volumeMounts:
- name: otel-agent
mountPath: /agent
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
spec:
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:latest
Sensitivity: Public
ports:
- containerPort: 16686
- containerPort: 14250
- containerPort: 14268
apiVersion: v1
kind: Service
metadata:
name: jaeger
spec:
type: LoadBalancer
selector:
app: jaeger
ports:
- name: ui
port: 16686
targetPort: 16686
- name: grpc
port: 14250
targetPort: 14250
- name: http
port: 14268
targetPort: 14268
Review Confirmation:
Sensitivity: Public
✓ Auto-instrumentation requires no code changes
add more explanation for all the various steps you have provided and what
each step does. For instance, the span,the sampling rate etc
I'll break down each component and explain their roles in detail:
receivers:
protocols:
Sensitivity: Public
processors:
send_batch_size: 10000
timeout: 10s
max_queue_size: 5000
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
env:
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
Sensitivity: Public
value: "0.1" # 10% sampling rate
env:
- name: OTEL_PROPAGATORS
value: "tracecontext,baggage,b3multi"
Propagation Types:
Resource Attributes
env:
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=production,service.version=$
{APP_VERSION}"
env:
- name: OTEL_INSTRUMENTATION_COMMON_DEFAULT_ENABLED
Sensitivity: Public
value: "true"
- name: OTEL_INSTRUMENTATION_SPRING_WEB_ENABLED
value: "true"
- name: OTEL_METRIC_EXPORT_INTERVAL
value: "60000"
- name: OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT
value: "128"
- name: OTEL_SPAN_EVENT_COUNT_LIMIT
value: "128"
- name: OTEL_SPAN_LINK_COUNT_LIMIT
value: "128"
Spans:
Contains:
Name
Start/End timestamps
Status (success/error)
Trace Context:
Sensitivity: Public
ParentSpanId: Links to parent operation
Sampling Strategies:
env:
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
spec:
containers:
- name: otel-collector
resources:
limits:
cpu: 1
memory: 2Gi
requests:
cpu: 200m
memory: 400Mi
spec:
containers:
- name: jaeger
env:
- name: SPAN_STORAGE_TYPE
Sensitivity: Public
value: "elasticsearch" # or "cassandra"
- name: ES_SERVER_URLS
value: "http://elasticsearch:9200"
- name: ES_USERNAME
valueFrom:
secretKeyRef:
name: es-credentials
key: username
Health Checks:
spec:
containers:
- name: otel-collector
livenessProbe:
httpGet:
path: /
port: 13133
readinessProbe:
httpGet:
path: /
port: 13133
Performance Considerations:
Batch Processing:
Sensitivity: Public
Memory Management:
Sampling Strategy:
Storage Scaling:
Collector Metrics:
Queue length
Drop count
Application Impact:
CPU overhead
Memory usage
Network traffic
Storage Metrics:
Sensitivity: Public
Write throughput
Query latency
Storage growth
Configurable sampling
Production-ready performance
Remember to:
Sensitivity: Public