跳到主要内容

04、SkyWalking Linux服务端集群部署

一、版本要求

  • SkyWalking5.0.0-GA
  • ElasticSearch-5.x

注意6.X版本不支持;新版本的skywalking使用ES作为存储,所以先安装es。

  • JDK8+ (SkyWalking collector和WebUI部署在jdk8及以上版本)
  • JDK6+(被监控的应用程序运行在jdk6及以上版本)
  • 其他

二、地址

  • apache官网下载地址

http://skywalking.apache.org/downloads/

  • gitlib部署文档

https://github.com/apache/incubator-skywalking/blob/v5.0.0-alpha/docs/README_ZH.md

  • 项目地址

https://github.com/OpenSkywalking/skywalking-netcore

三、部署过程

【参考】

https://github.com/apache/incubator-skywalking/blob/5.x/docs/cn/Deploy-backend-in-cluster-mode-CN.md

1、Elasticsearch

参考文档:2、2-SkyWalking部署文档-Linux.docx

2、SW-collector

2.1 端口要求

确保端口10800,11800,12800不被占用

2.2 存储要求

collector配置ElasticSearch作为运行存储介质

2.3 时间设置要求

被监控应用的宿主服务器系统时间(包含时区)与collectors,UIs部署的宿主服务器时间设置正确且相同

2.4 配置

2、 4.1位置;

..\config\application.yml

2、 4.2配置项解释;

【官方解释】

下面是关于collector连接配置的5种类型方式

<1>、naming :agent使用HTTP协议连接collectors

<2>、agent_gRPC :agent使用gRPC协议连接collectors

<3>、remote :Collector使用gRPC协议连接collector

<4>、ui :使用HTTP协议连接collector,(大多数情况不需要修改)

<5>、agent_jetty:agent使用HTTP协议连接collectors(可选连接)

2、 4.3配置内容-参考1;

参考文档:2、2-SkyWalking部署文档-Linux.docx

2、 4.4配置内容-参考2-节点1;

##########节点1:ip=192.168.0.3

#cluster:

zookeeper:

hostPort: 192.168.0.101:2181, 192.168.0.102:2181,192.168.103.10

# zookeeper集群的主机IP地址及端口,使用逗号分隔

sessionTimeout: 100000

naming:

jetty:

#OS real network IP(binding required), for agent to find collector cluster

host: 192.168.0.3

# 部署collector的主机IP地址,即当前节点的ip地址

port: 10800

contextPath: /

cache:

# guava:

caffeine:

remote:

gRPC:

# OS real network IP(binding required), for collector nodes communicate with each other in cluster. collectorN --(gRPC) --> collectorM

host: 192.168.0.3

# 部署collector的主机IP地址,即当前节点的ip地址

port: 11800

agent_gRPC:

gRPC:

#OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector. agent--(gRPC)--> collector

host: 192.168.0.3

# 部署collector的主机IP地址,即当前节点的ip地址

port: 11800

# Set these two setting to open ssl

#sslCertChainFile: $path

#sslPrivateKeyFile: $path

# Set your own token to active auth

#authentication: xxxxxx

agent_jetty:

jetty:

# OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector through HTTP. agent--(HTTP)--> collector

# SkyWalking native Java/.Net/node.js agents don't use this.

# Open this for other implementor.

host: 192.168.0.3

# 部署collector的主机IP地址,即当前节点的ip地址

port: 12800

contextPath: /

analysis_register:

default:

analysis_jvm:

default:

analysis_segment_parser:

default:

bufferFilePath: ../buffer/

bufferOffsetMaxFileSize: 10M

bufferSegmentMaxFileSize: 500M

bufferFileCleanWhenRestart: true

ui:

jetty:

# Stay in \localhost\ if UI starts up in default mode.

# Change it to OS real network IP(binding required), if deploy collector in different machine.

host: 192.168.0.3

# 部署collector的主机IP地址,即当前节点的ip地址

port: 12800

contextPath: /

storage:

elasticsearch:

clusterName: eqx-jz-test

# Elasticsearch的cluster.name值

clusterTransportSniffer: true

clusterNodes: 192.168.0.17:9303

# Elasticsearch的ip和端口

indexShardsNumber: 2

indexReplicasNumber: 0

highPerformanceMode: true

# Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html

bulkActions: 2000 # Execute the bulk every 2000 requests

bulkSize: 20 # flush the bulk every 20mb

flushInterval: 10 # flush the bulk every 10 seconds whatever the number of requests

concurrentRequests: 2 # the number of concurrent requests

# Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.

traceDataTTL: 90 # Unit is minute

minuteMetricDataTTL: 90 # Unit is minute

hourMetricDataTTL: 36 # Unit is hour

dayMetricDataTTL: 45 # Unit is day

monthMetricDataTTL: 18 # Unit is month

#storage:

# h2:

# url: jdbc:h2:~/memorydb

# userName: sa

configuration:

default:

#namespace: xxxxx

# alarm threshold

applicationApdexThreshold: 2000

serviceErrorRateThreshold: 10.00

serviceAverageResponseTimeThreshold: 2000

instanceErrorRateThreshold: 10.00

instanceAverageResponseTimeThreshold: 2000

applicationErrorRateThreshold: 10.00

applicationAverageResponseTimeThreshold: 2000

# thermodynamic

thermodynamicResponseTimeStep: 50

thermodynamicCountOfResponseTimeSteps: 40

# max collection's size of worker cache collection, setting it smaller when collector OutOfMemory crashed.

workerCacheMaxSize: 10000

#receiver_zipkin:

# default:

# host: localhost

# port: 9411

# contextPath: /

2、 4.4配置内容-参考2-节点2;

##########节点1:ip=192.168.0.4

#cluster:

zookeeper:

hostPort: 192.168.0.101:2181, 192.168.0.102:2181,192.168.103.10

# zookeeper集群的主机IP地址及端口,使用逗号分隔

sessionTimeout: 100000

naming:

jetty:

#OS real network IP(binding required), for agent to find collector cluster

host: 192.168.0.4

# 部署collector的主机IP地址,即当前节点的ip地址

port: 10800

contextPath: /

cache:

# guava:

caffeine:

remote:

gRPC:

# OS real network IP(binding required), for collector nodes communicate with each other in cluster. collectorN --(gRPC) --> collectorM

host: 192.168.0.4

# 部署collector的主机IP地址,即当前节点的ip地址

port: 11800

agent_gRPC:

gRPC:

#OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector. agent--(gRPC)--> collector

host: 192.168.0.4

# 部署collector的主机IP地址,即当前节点的ip地址

port: 11800

# Set these two setting to open ssl

#sslCertChainFile: $path

#sslPrivateKeyFile: $path

# Set your own token to active auth

#authentication: xxxxxx

agent_jetty:

jetty:

# OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector through HTTP. agent--(HTTP)--> collector

# SkyWalking native Java/.Net/node.js agents don't use this.

# Open this for other implementor.

host: 192.168.0.4

# 部署collector的主机IP地址,即当前节点的ip地址

port: 12800

contextPath: /

analysis_register:

default:

analysis_jvm:

default:

analysis_segment_parser:

default:

bufferFilePath: ../buffer/

bufferOffsetMaxFileSize: 10M

bufferSegmentMaxFileSize: 500M

bufferFileCleanWhenRestart: true

ui:

jetty:

# Stay in \localhost\ if UI starts up in default mode.

# Change it to OS real network IP(binding required), if deploy collector in different machine.

host: 192.168.0.4

# 部署collector的主机IP地址,即当前节点的ip地址

port: 12800

contextPath: /

storage:

elasticsearch:

clusterName: eqx-jz-test

# Elasticsearch的cluster.name值

clusterTransportSniffer: true

clusterNodes: 192.168.0.17:9303

# Elasticsearch的ip和端口

indexShardsNumber: 2

indexReplicasNumber: 0

highPerformanceMode: true

# Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html

bulkActions: 2000 # Execute the bulk every 2000 requests

bulkSize: 20 # flush the bulk every 20mb

flushInterval: 10 # flush the bulk every 10 seconds whatever the number of requests

concurrentRequests: 2 # the number of concurrent requests

# Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.

traceDataTTL: 90 # Unit is minute

minuteMetricDataTTL: 90 # Unit is minute

hourMetricDataTTL: 36 # Unit is hour

dayMetricDataTTL: 45 # Unit is day

monthMetricDataTTL: 18 # Unit is month

#storage:

# h2:

# url: jdbc:h2:~/memorydb

# userName: sa

configuration:

default:

#namespace: xxxxx

# alarm threshold

applicationApdexThreshold: 2000

serviceErrorRateThreshold: 10.00

serviceAverageResponseTimeThreshold: 2000

instanceErrorRateThreshold: 10.00

instanceAverageResponseTimeThreshold: 2000

applicationErrorRateThreshold: 10.00

applicationAverageResponseTimeThreshold: 2000

# thermodynamic

thermodynamicResponseTimeStep: 50

thermodynamicCountOfResponseTimeSteps: 40

# max collection's size of worker cache collection, setting it smaller when collector OutOfMemory crashed.

workerCacheMaxSize: 10000

#receiver_zipkin:

# default:

# host: localhost

# port: 9411

# contextPath: /

2.5 启动

参考文档:2、2-SkyWalking部署文档-Linux.docx

【注】请分别启动各个节点的collectorService.sh脚本,然后再分别启动步骤3.2.3中的脚本webappService.sh

2.6 验证

参考文档:2、2-SkyWalking部署文档-Linux.docx

3、SW-Web UI

3.1 位置

WebUI的配置项保存在**\webapp\webapp.yml中

3.2 配置

其他配置参考文档:2、2-SkyWalking部署文档-Linux.docx

3、 2.1、节点1;

server:

port: 8091

collector:

path: /graphql

ribbon:

ReadTimeout: 10000

listOfServers: 192.168.0.3:10800,192.168.0.4:10800

security:

user:

admin:

password: admin

3、 2.2、节点2;

server:

port: 8091

collector:

path: /graphql

ribbon:

ReadTimeout: 10000

listOfServers: 192.168.0.3:10800,192.168.0.4:10800

security:

user:

admin:

password: admin

3.2.3 启动

分别启动集群每个节点脚本 bin/webappService.sh,

3.2.4 验证

参考文档:2、2-SkyWalking部署文档-Linux.docx

4、SW-Agent

参考文档:2、4-SkyWalking部署文档-Docker-Agent客户端集群.docx

  • 启动顺序

1、ElasticSearch

启动elasticsearch.sh

2、SkyWalking

【注】集群环境,首先在各个节点分别启动脚本collectorService.sh,然后在各个节点分别启动脚本webappService.sh

3、启动被监控的程序

参考文档:2、2-SkyWalking部署文档-Linux.docx

六、参考资源

1、collector集群配置文件

 

 

2、collector集群和web ui关系

问题:

 

答案: