Вы находитесь на странице: 1из 3

1 Nokia

2 CSD
3

11 Background
2 1.1 Overview
3
4ETCD issues like Key not found, have been identified occasionally during CSD/SM
5instantiation and Upgrade, this is because of the slow disk response, the disk on which ETCD
6DB is mounted.
7
8ETCD (a key value pair db) is integrated in the CSD VNF Application Blueprint to supply
9service discovery and configuration functionality for all the platform components like, HA,
10SNMP, DNS, NTP, IPCONFIG, HTTP, ZABBIX, MARIADB, METRICS, etc.
11
12Some of the custom functionalities from CSD application are also have bare minimum (optional)
13ETCD uses, such as, route add, multiple diameter address groups, dde_config.
14
15ETCD is currently running on 3 Nodes within CSD/SM VNF, and forms as a cluster with below
16VNFC’s
17OAM-1 (index 0)
18OAM-2 (index 1)
19DB-1 (index 0)
20
21All the ETCD hosted nodes are connected over Internal network (192.x.x.x) with TCP on 2379
22port.
23
24Hardware recommendations:
25ETCD usually runs well with limited resources for development or testing purposes; it’s common
26to develop with etcd on a laptop or a cheap cloud machine. However, when running etcd clusters
27in production, some hardware guidelines are useful for proper administration. These suggestions
28are not hard rules; they serve as a good starting point for a robust production deployment. As
29always, deployments should be tested with simulated workloads before running in production.
30For more information please refer to below article.
31https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#hardware-
32recommendations
33
34Tuning:
35The default settings in etcd should work well for installations on a local network where the
36average network latency is low. However, when using etcd across multiple data centers or over
37networks with high latency, the heartbeat interval and election timeout settings may need tuning.

38The network isn't the only source of latency. Each request and response may be impacted by
39slow disks on both the leader and follower. Each of these timeouts represents the total time from
40request to successful response from the other machine. For more information please refer to
41below article.

42https://github.com/etcd-io/etcd/blob/master/Documentation/tuning.md
43
4
Version: 1.0 Version Date: 06/03/2019
Nokia Proprietary - Use Pursuant to Company Instructions
Page 1 of 3
5
6 Nokia
7 CSD
8

44Current CSD/SM ETCD tunable parameters are as below (in /etc/etcd/etcd.conf)


45#[tuning]
46ETCD_SNAPSHOT_COUNT="1000"
47ETCD_HEARTBEAT_INTERVAL="300"
48ETCD_ELECTION_TIMEOUT="3000"
49
50
51 1.2 How to check the ETCD performance?
52
53From any of the OAM VM run below command to check the ETCD cluster health
54. /etc/etcd/etcd.client.conf
55etcdctl --endpoints=${ETCDCTL_ENDPOINT} cluster-health
56
57if ETCD cluster is healty : “cluster is healthy”
58
59Option-1 (preferable):
60From OAM-1 run below command by passing OAM-2 Internal IP address (change the internal IP as per VNF
61deployment).
62
63#to check check backend commit duration
64curl -s http://192.168.3.11:2379/metrics | grep -E backend_commit_duration
65
66monitor backend_commit_duration_seconds (p99 duration should be less than 25ms) to confirm the disk is
67reasonably fast.
68
69#wait fsync duration
70curl -s http://192.168.3.11:2379/metrics | grep -E wal_fsync_duration
71
72monitor wal_fsync_duration_seconds(p99 duration should be less than 10ms) to confirm the disk is
73reasonably fast.
74
75
76Option -2:
77From OAM-1 run below command by passing OAM-2 Internal IP address (change the internal IP as per VNF
78deployment).
79
80#to check the disk performance
81ETCDCTL_API=3 etcdctl --endpoints http://192.168.3.11:2379 check perf --load="l"
82Expected output: overall status of the above command should be PASS
83
84ETCDCTL_API=3 etcdctl --endpoints http://192.168.3.11:2379 del --prefix /etcdctl-check-perf/
85
86
87 1.3 How to recover from error “mvcc: database space
88 exceeded”
89Removing excessive keyspace data and defragmenting the backend database will put the cluster back
90within the quota limits:
91
92# get current revision
93rev=$(ETCDCTL_API=3 etcdctl --endpoints=192.168.3.11:2379 endpoint status --write-out="json" | egrep -o
94'"revision":[0-9]*' | egrep -o '[0-9].*')
95
9
Version: 1.0 Version Date: 06/03/2019
Nokia Proprietary - Use Pursuant to Company Instructions
Page 2 of 3
10
11 Nokia
12 CSD
13
96# compact away all old revisions
97ETCDCTL_API=3 etcdctl --endpoints=192.168.3.11:2379 compact $rev
98
99# defragment away excessive space
100ETCDCTL_API=3 etcdctl --endpoints=192.168.3.11:2379 defrag
101
102# disarm alarm
103ETCDCTL_API=3 etcdctl --endpoints=192.168.3.11:2379 alarm disarm
104
105# test puts are allowed again
106ETCDCTL_API=3 etcdctl --endpoints=192.168.3.11:2379 put newkey 123
107Expected output : OK
108
109
110#again check the health of the ETCD cluster
111. /etc/etcd/etcd.client.conf
112etcdctl --endpoints=${ETCDCTL_ENDPOINT} cluster-health
113
114Note:
115 1. Even with a slow mechanical disk or a virtualized network disk, such as Amazon’s EBS or Google’s PD, 
116 applying a request should normally take fewer than 50 milliseconds.
117 2. If the average apply duration exceeds 100 milliseconds, etcd will warn that entries are taking too long to
118 apply.
119 3. To rule out a slow disk, monitor backend_commit_duration_seconds (p99 duration should be less than
120 25ms) to confirm the disk is reasonably fast. 
121 4. If the Ceph storage speed is not close to the etcd benchmarked numbers, fast ceph pool implementation
122 is an alternative option.
123 5. Consideration: SSD provides faster access than the other disks.
124

14
Version: 1.0 Version Date: 06/03/2019
Nokia Proprietary - Use Pursuant to Company Instructions
Page 3 of 3
15

Вам также может понравиться