opentelemetry-collector概述与helm安装

[TOC]

#opentelemetry 概述
> 详情请参考官方文档：https://opentelemetry.io/docs/what-is-opentelemetry/

OpenTelemetry 是一个可观测性框架和工具包，旨在创建和管理遥测数据，例如跟踪、指标和日志。
至关重要的是，OpenTelemetry 与供应商和工具无关，这意味着它可以与各种可观测性后端一起使用，包括 Jaeger 和 Prometheus 等开源工具，以及商业产品。
OpenTelemetry 是一个云原生计算基金会 （CNCF） 项目。

OpenTelemetry 不像 Jaeger、Prometheus 或商业供应商那样是可观测性后端。
OpenTelemetry 专注于遥测数据的生成、收集、管理和导出。这些数据的存储和可视化有意留给其他工具。

## collector概念
https://opentelemetry.io/docs/collector/configuration/
任何收集器配置文件的结构都由访问遥测数据的以下四类管道组件组成：

### Receivers <img src="https://trojan.zh3.fun/svgs/Receivers.svg" width = "60" height = "40" alt="" align=center />
接收器从一个或多个源收集遥测数据。它们可以基于拉取或推送，并且可以支持一个或多个数据源。
许多接收器都带有默认设置，因此指定接收器的名称就足以对其进行配置。

### Processors <img src="https://trojan.zh3.fun/svgs/Processors.svg" width = "60" height = "40" alt="" align=center />
处理者获取接收者收集的数据，并在将其发送给导出器之前对其进行修改或转换。
数据处理根据为每个处理器定义的规则或设置进行，其中可能包括筛选、删除、重命名或重新计算遥测数据等操作。
管道中处理器的顺序决定了收集器应用于信号的处理操作的顺序。
### Exporters <img src="https://trojan.zh3.fun/svgs/Exporters.svg" width = "60" height = "40" alt="" align=center />
导出器将数据发送到一个或多个后端或目标。导出器可以基于拉取或推送，并且可以支持一个或多个数据源。
大多数导出器需要至少指定目标的配置，以及安全设置，如身份验证令牌或 TLS 证书。

### Connectors <img src="https://trojan.zh3.fun/svgs/Connectors.svg" width = "60" height = "40" alt="" align=center />
连接器连接两个管道，同时充当输出器和接收器。连接器在一个管道的末尾作为导出器使用数据，并在另一个管道的开头作为接收器发出数据。
使用和发出的数据可以是同一类型，也可以是不同的数据类型。可以使用连接器来汇总、复制或路由使用的数据。
默认情况下，未配置任何连接器。每种类型的连接器都设计为使用一对或多对数据类型，并且只能用于相应地连接管道。

# 环境和工具情况简述
环境：k8s集群版本 1.26.3
系统：aliyun os 2.x == centos 7.x
helm版本：3.13.2
```
NAME                                  	CHART VERSION	APP VERSION	DESCRIPTION                                      
open-telemetry/opentelemetry-collector	0.67.0       	0.84.0     	OpenTelemetry Collector Helm chart for Kubernetes
```

# 准备
```
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

# 查看所有版本，因为新版本废弃了jaeger exporter的支持，我就选用了0.67.0
helm search repo opentelemetry-collector -l

helm pull open-telemetry/opentelemetry-collector --version 0.67.0

tar xf opentelemetry-collector-0.67.0.tgz

mv opentelemetry-collector-0.67.0.tgz opentelemetry-collector/

```

## opentelemetry-collector values.yaml
>span指标参考这里填写的：https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/README.md

```
# Default values for opentelemetry-collector.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

nameOverride: ""
fullnameOverride: ""

# Valid values are "daemonset", "deployment", and "statefulset".
mode: "deployment"

# Specify which namespace should be used to deploy the resources into
namespaceOverride: ""

# Handles basic configuration of components that
# also require k8s modifications to work correctly.
# .Values.config can be used to modify/add to a preset
# component configuration, but CANNOT be used to remove
# preset configuration. If you require removal of any
# sections of a preset configuration, you cannot use
# the preset. Instead, configure the component manually in
# .Values.config and use the other fields supplied in the
# values.yaml to configure k8s as necessary.
presets:
  # Configures the collector to collect logs.
  # Adds the filelog receiver to the logs pipeline
  # and adds the necessary volumes and volume mounts.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#filelog-receiver for details on the receiver.
  logsCollection:
    enabled: false
    includeCollectorLogs: false
    # Enabling this writes checkpoints in /var/lib/otelcol/ host directory.
    # Note this changes collector's user to root, so that it can write to host directory.
    storeCheckpoints: false
    # The maximum bytes size of the recombined field.
    # Once the size exceeds the limit, all received entries of the source will be combined and flushed.
    maxRecombineLogSize: 102400
  # Configures the collector to collect host metrics.
  # Adds the hostmetrics receiver to the metrics pipeline
  # and adds the necessary volumes and volume mounts.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#host-metrics-receiver for details on the receiver.
  hostMetrics:
    enabled: false
  # Configures the Kubernetes Processor to add Kubernetes metadata.
  # Adds the k8sattributes processor to all the pipelines
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-attributes-processor for details on the receiver.
  kubernetesAttributes:
    enabled: false
    # When enabled the processor will extra all labels for an associated pod and add them as resource attributes.
    # The label's exact name will be the key.
    extractAllPodLabels: false
    # When enabled the processor will extra all annotations for an associated pod and add them as resource attributes.
    # The annotation's exact name will be the key.
    extractAllPodAnnotations: false
  # Configures the collector to collect node, pod, and container metrics from the API server on a kubelet..
  # Adds the kubeletstats receiver to the metrics pipeline
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubeletstats-receiver for details on the receiver.
  kubeletMetrics:
    enabled: false
  # Configures the collector to collect kubernetes events.
  # Adds the k8sobject receiver to the logs pipeline
  # and collects kubernetes events by default.
  # Best used with mode = deployment or statefulset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-objects-receiver for details on the receiver.
  kubernetesEvents:
    enabled: false
  # Configures the Kubernetes Cluster Receiver to collect cluster-level metrics.
  # Adds the k8s_cluster receiver to the metrics pipeline
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = deployment or statefulset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-cluster-receiver for details on the receiver.
  clusterMetrics:
    enabled: false

configMap:
  # Specifies whether a configMap should be created (true by default)
  create: true

# Base collector configuration.
# Supports templating. To escape existing instances of {{ }}, use {{` <original content> `}}.
# For example, {{ REDACTED_EMAIL }} becomes {{` {{ REDACTED_EMAIL }} `}}.
config:
  exporters:
    logging: {}
    jaeger:
      endpoint: "jaeger-collector.tracing.svc.cluster.local:14250"
      tls:
        insecure: true
    prometheusremotewrite:
      endpoint: "http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/write"
      target_info:
        enabled: true
#      external_labels:
#        cluster: dev-test

extensions:
    # The health_check extension is mandatory for this chart.
    # Without the health_check extension the collector will fail the readiness and liveliness probes.
    # The health_check extension can be modified, but should never be removed.
    health_check: {}
    memory_ballast: {}
  processors:
    batch: {}
    # If set to null, will be overridden with values based on k8s resource limits
    memory_limiter: null
  receivers:
    jaeger:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:14250
        thrift_http:
          endpoint: ${env:MY_POD_IP}:14268
        thrift_compact:
          endpoint: ${env:MY_POD_IP}:6831
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318
#    prometheus:
#      config:
#        scrape_configs:
#          - job_name: opentelemetry-collector
#            scrape_interval: 10s
#            static_configs:
#              - targets:
#                  - ${env:MY_POD_IP}:8888
    zipkin:
      endpoint: ${env:MY_POD_IP}:9411

connectors:
    spanmetrics:
      histogram:
        explicit:
          buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
      dimensions:
        - name: http.method
          default: GET
        - name: http.status_code
#          default: GET
      exemplars:
        enabled: true
#      exclude_dimensions: ['status.code']
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
      metrics_flush_interval: 15s
#      events:
#        enabled: true
#        dimensions:
#          - name: exception.type
#          - name: exception.message

service:
    telemetry:
      metrics:
        address: ${env:MY_POD_IP}:8888
    extensions:
      - health_check
      - memory_ballast
    pipelines:
#      logs:
#        exporters:
#          - logging
#        processors:
#          - memory_limiter
#          - batch
#        receivers:
#          - otlp
      metrics:
        exporters:
#          - logging
          - prometheusremotewrite
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - spanmetrics
      traces:
        exporters:
#          - logging
          - jaeger
          - spanmetrics
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - jaeger
          - zipkin

image:
  # If you want to use the core image `otel/opentelemetry-collector`, you also need to change `command.name` value to `otelcol`.
  repository: otel/opentelemetry-collector-contrib
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""
  # When digest is set to a non-empty value, images will be pulled by digest (regardless of tag value).
  digest: ""
imagePullSecrets: []

# OpenTelemetry Collector executable
command:
  name: otelcol-contrib
  extraArgs: []

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""

clusterRole:
  # Specifies whether a clusterRole should be created
  # Some presets also trigger the creation of a cluster role and cluster role binding.
  # If using one of those presets, this field is no-op.
  create: false
  # Annotations to add to the clusterRole
  # Can be used in combination with presets that create a cluster role.
  annotations: {}
  # The name of the clusterRole to use.
  # If not set a name is generated using the fullname template
  # Can be used in combination with presets that create a cluster role.
  name: ""
  # A set of rules as documented here : https://kubernetes.io/docs/reference/access-authn-authz/rbac/
  # Can be used in combination with presets that create a cluster role to add additional rules.
  rules: []
  # - apiGroups:
  #   - ''
  #   resources:
  #   - 'pods'
  #   - 'nodes'
  #   verbs:
  #   - 'get'
  #   - 'list'
  #   - 'watch'

clusterRoleBinding:
    # Annotations to add to the clusterRoleBinding
    # Can be used in combination with presets that create a cluster role binding.
    annotations: {}
    # The name of the clusterRoleBinding to use.
    # If not set a name is generated using the fullname template
    # Can be used in combination with presets that create a cluster role binding.
    name: ""

podSecurityContext: {}
securityContext: {}

nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []

# Allows for pod scheduler prioritisation
priorityClassName: ""

extraEnvs: []
extraEnvsFrom: []
extraVolumes: []
extraVolumeMounts: []

# Configuration for ports
# nodePort is also allowed
ports:
  otlp:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    hostPort: 4317
    protocol: TCP
    #nodePort: 30317
    appProtocol: grpc
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    hostPort: 4318
    protocol: TCP
  jaeger-compact:
    enabled: true
    containerPort: 6831
    servicePort: 6831
    hostPort: 6831
    protocol: UDP
  jaeger-thrift:
    enabled: true
    containerPort: 14268
    servicePort: 14268
    hostPort: 14268
    protocol: TCP
  jaeger-grpc:
    enabled: true
    containerPort: 14250
    servicePort: 14250
    hostPort: 14250
    protocol: TCP
  zipkin:
    enabled: true
    containerPort: 9411
    servicePort: 9411
    hostPort: 9411
    protocol: TCP
  metrics:
    # The metrics port is disabled by default. However you need to enable the port
    # in order to use the ServiceMonitor (serviceMonitor.enabled) or PodMonitor (podMonitor.enabled).
    enabled: true
    containerPort: 8888
    servicePort: 8888
    protocol: TCP

# Resource limits & requests. Update according to your own use case as these values might be too low for a typical deployment.
#resources: {}
resources:
  limits:
    cpu: 250m
    memory: 512Mi

podAnnotations: {}

podLabels: {}

# Host networking requested for this pod. Use the host's network namespace.
hostNetwork: false

# Pod DNS policy ClusterFirst, ClusterFirstWithHostNet, None, Default, None
dnsPolicy: ""

# Custom DNS config. Required when DNS policy is None.
dnsConfig: {}

# only used with deployment mode
replicaCount: 1

# only used with deployment mode
revisionHistoryLimit: 10

annotations: {}

# List of extra sidecars to add
extraContainers: []
# extraContainers:
#   - name: test
#     command:
#       - cp
#     args:
#       - /bin/sleep
#       - /test/sleep
#     image: busybox:latest
#     volumeMounts:
#       - name: test
#         mountPath: /test

# List of init container specs, e.g. for copying a binary to be executed as a lifecycle hook.
# Another usage of init containers is e.g. initializing filesystem permissions to the OTLP Collector user `10001` in case you are using persistence and the volume is producing a permission denied error for the OTLP Collector container.
initContainers: []
# initContainers:
#   - name: test
#     image: busybox:latest
#     command:
#       - cp
#     args:
#       - /bin/sleep
#       - /test/sleep
#     volumeMounts:
#       - name: test
#         mountPath: /test
#  - name: init-fs
#    image: busybox:latest
#    command:
#      - sh
#      - '-c'
#      - 'chown -R 10001: /var/lib/storage/otc' # use the path given as per `extensions.file_storage.directory` & `extraVolumeMounts[x].mountPath`
#    volumeMounts:
#      - name: opentelemetry-collector-data # use the name of the volume used for persistence
#        mountPath: /var/lib/storage/otc # use the path given as per `extensions.file_storage.directory` & `extraVolumeMounts[x].mountPath`

# Pod lifecycle policies.
lifecycleHooks: {}
# lifecycleHooks:
#   preStop:
#     exec:
#       command:
#       - /test/sleep
#       - "5"

# liveness probe configuration
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
##
livenessProbe:
  # Number of seconds after the container has started before startup, liveness or readiness probes are initiated.
  # initialDelaySeconds: 1
  # How often in seconds to perform the probe.
  # periodSeconds: 10
  # Number of seconds after which the probe times out.
  # timeoutSeconds: 1
  # Minimum consecutive failures for the probe to be considered failed after having succeeded.
  # failureThreshold: 1
  # Duration in seconds the pod needs to terminate gracefully upon probe failure.
  # terminationGracePeriodSeconds: 10
  httpGet:
    port: 13133
    path: /

# readiness probe configuration
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
##
readinessProbe:
  # Number of seconds after the container has started before startup, liveness or readiness probes are initiated.
  # initialDelaySeconds: 1
  # How often (in seconds) to perform the probe.
  # periodSeconds: 10
  # Number of seconds after which the probe times out.
  # timeoutSeconds: 1
  # Minimum consecutive successes for the probe to be considered successful after having failed.
  # successThreshold: 1
  # Minimum consecutive failures for the probe to be considered failed after having succeeded.
  # failureThreshold: 1
  httpGet:
    port: 13133
    path: /

service:
  # Enable the creation of a Service.
  # By default, it's enabled on mode != daemonset.
  # However, to enable it on mode = daemonset, its creation must be explicitly enabled
  # enabled: true

type: ClusterIP
  #type: NodePort
  # type: LoadBalancer
  # loadBalancerIP: 1.2.3.4
  # loadBalancerSourceRanges: []

# By default, Service of type 'LoadBalancer' will be created setting 'externalTrafficPolicy: Cluster'
  # unless other value is explicitly set.
  # Possible values are Cluster or Local (https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip)
  # externalTrafficPolicy: Cluster

annotations: {}

# By default, Service will be created setting 'internalTrafficPolicy: Local' on mode = daemonset
  # unless other value is explicitly set.
  # Setting 'internalTrafficPolicy: Cluster' on a daemonset is not recommended
  # internalTrafficPolicy: Cluster

ingress:
  enabled: false
  # annotations: {}
  # ingressClassName: nginx
  # hosts:
  #   - host: collector.example.com
  #     paths:
  #       - path: /
  #         pathType: Prefix
  #         port: 4318
  # tls:
  #   - secretName: collector-tls
  #     hosts:
  #       - collector.example.com

# Additional ingresses - only created if ingress.enabled is true
  # Useful for when differently annotated ingress services are required
  # Each additional ingress needs key "name" set to something unique
  additionalIngresses: []
  # - name: cloudwatch
  #   ingressClassName: nginx
  #   annotations: {}
  #   hosts:
  #     - host: collector.example.com
  #       paths:
  #         - path: /
  #           pathType: Prefix
  #           port: 4318
  #   tls:
  #     - secretName: collector-tls
  #       hosts:
  #         - collector.example.com

podMonitor:
  # The pod monitor by default scrapes the metrics port.
  # The metrics port needs to be enabled as well.
  enabled: true
  metricsEndpoints:
    - port: metrics
      interval: 30s
      path: /metrics
      scheme: http

# additional labels for the PodMonitor
  extraLabels: {}
  #   release: kube-prometheus-stack

serviceMonitor:
  # The service monitor by default scrapes the metrics port.
  # The metrics port needs to be enabled as well.
  enabled: true
  metricsEndpoints:
    - port: metrics
      interval: 30s
      path: /metrics
      scheme: http

# additional labels for the ServiceMonitor
  extraLabels: {}
  #  release: kube-prometheus-stack

# PodDisruptionBudget is used only if deployment enabled
podDisruptionBudget:
  enabled: false
#   minAvailable: 2
#   maxUnavailable: 1

# autoscaling is used only if deployment enabled
autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  behavior: {}
  targetCPUUtilizationPercentage: 80
  # targetMemoryUtilizationPercentage: 80

rollout:
  rollingUpdate: {}
  # When 'mode: daemonset', maxSurge cannot be used when hostPort is set for any of the ports
  # maxSurge: 25%
  # maxUnavailable: 0
  strategy: RollingUpdate

prometheusRule:
  enabled: false
  groups: []
  # Create default rules for monitoring the collector
  defaultRules:
    enabled: false

# additional labels for the PrometheusRule
  extraLabels: {}

statefulset:
  # volumeClaimTemplates for a statefulset
  volumeClaimTemplates: []
  podManagementPolicy: "Parallel"

networkPolicy:
  enabled: false

# Annotations to add to the NetworkPolicy
  annotations: {}

# Configure the 'from' clause of the NetworkPolicy.
  # By default this will restrict traffic to ports enabled for the Collector. If
  # you wish to further restrict traffic to other hosts or specific namespaces,
  # see the standard NetworkPolicy 'spec.ingress.from' definition for more info:
  # https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/network-policy-v1/
  allowIngressFrom: []
  # # Allow traffic from any pod in any namespace, but not external hosts
  # - namespaceSelector: {}
  # # Allow external access from a specific cidr block
  # - ipBlock:
  #     cidr: 192.168.1.64/32
  # # Allow access from pods in specific namespaces
  # - namespaceSelector:
  #     matchExpressions:
  #       - key: kubernetes.io/metadata.name
  #         operator: In
  #         values:
  #           - "cats"
  #           - "dogs"

# Add additional ingress rules to specific ports
  # Useful to allow external hosts/services to access specific ports
  # An example is allowing an external prometheus server to scrape metrics
  #
  # See the standard NetworkPolicy 'spec.ingress' definition for more info:
  # https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/network-policy-v1/
  extraIngressRules: []
  # - ports:
  #   - port: metrics
  #     protocol: TCP
  #   from:
  #     - ipBlock:
  #         cidr: 192.168.1.64/32

# Restrict egress traffic from the OpenTelemetry collector pod
  # See the standard NetworkPolicy 'spec.egress' definition for more info:
  # https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/network-policy-v1/
  egressRules: []
  #  - to:
  #      - namespaceSelector: {}
  #      - ipBlock:
  #          cidr: 192.168.10.10/24
  #    ports:
  #      - port: 1234
  #        protocol: TCP

```

# 安装
```
#安装
helm install opentelemetry-collector -n tracing ./opentelemetry-collector-0.67.0/
#更新
helm upgrade opentelemetry-collector -n tracing ./opentelemetry-collector-0.67.0/
#删除
helm uninstall opentelemetry-collector -n tracing
```

# 问题
将采集的span数据上传到prometheus，会出现权限问题和remote_write_url开启问题，以下是基于kube-prometheus-0.13.0的operator（非helm）两处更改

## prometheus 报错日志
> 需要把日志级别修改为**debug**级别
以下报的是权限问题

```
ts=2023-12-04T09:20:00.155Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: failed to list *v1.Endpoints: endpoints is:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"tracing\""
ts=2023-12-04T09:20:00.155Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: Failed to watch *v1.Endpoints: failed tpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"tracing\""
ts=2023-12-04T09:20:07.428Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: failed to list *v1.Pod: pods is forbidden:ount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"tracing\""
ts=2023-12-04T09:20:07.428Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: Failed to watch *v1.Pod: failed to listen: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"tracing\""
ts=2023-12-04T09:20:28.846Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: failed to list *v1.Service: services is forviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"tracing\""
```

## prometheus-prometheus.yaml
```
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.46.0
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: alertmanager-main
      namespace: monitoring
      port: web
  enableFeatures: []
  externalLabels: {}
  image: reg.maltcost.com/monitoring/prometheus:v2.46.0
  nodeSelector:
    kubernetes.io/os: linux
    kubernetes.io/role: prom
  podMetadata:
    labels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: k8s
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: kube-prometheus
      app.kubernetes.io/version: 2.46.0
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 1
  resources:
    limits:
      cpu: 3
      memory: 12Gi
    requests:
      cpu: 2
      memory: 4Gi
  ruleNamespaceSelector: {}
  ruleSelector: {}
  securityContext:
    #    fsGroup: 2000
    #runAsNonRoot: true
    #    runAsUser: 1000
    #   runAsUser: 0
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.46.0
  #  retention: 30d
  volumeMounts:
  - name: prometheus-hostpath
    mountPath: /data
  - name: prometheus-config
    mountPath: /etc/config
  volumes:
  - name: prometheus-hostpath
    hostPath:
      path: /data/prometheus
      type: DirectoryOrCreate
  - name: prometheus-config
    configMap:
      name: prometheus-cm
  containers:
    - name: prometheus
      args:
        - --web.console.templates=/etc/prometheus/consoles
        - --web.console.libraries=/etc/prometheus/console_libraries
        - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
#        - --config.file=/etc/config/prometheus.yaml
        - --web.enable-lifecycle
        - --web.route-prefix=/
        - --storage.tsdb.retention.time=30d
        - --storage.tsdb.path=/data
        - --web.config.file=/etc/prometheus/web_config/web-config.yaml
        - --web.enable-remote-write-receiver # 增加了这个
#        - --log.level=debug
      securityContext:
        runAsUser: 0
        #privileged: true
```

## prometheus-clusterRole.yaml 原始
```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.46.0
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

```

## prometheus-clusterRole.yaml 修改后
```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.46.0
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - pods
  - endpoints
  - nodes/proxy
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
```

# 使用
原始需求来源于开发，开发想要用otel sdk替换jaeger sdk 进行数据的采集，opentelemetry经过多年的整合与变化，成为了行业的标准，不仅可以采集traces，还可以采集metrics和logs。

go sdk的使用参考：
https://opentelemetry.io/docs/instrumentation/go/getting-started/

# 参考
https://opentelemetry.io
https://www.jaegertracing.io
https://prometheus.io
https://medium.com/@zzzzzYang/implement-opentelemetry-to-export-data-to-jaeger-prometheus-and-grafana-1098352370c0

导航

最近发表

友情链接