Вы находитесь на странице: 1из 8

For performing an ISSU (In-Service Software Upgrade) the followings steps need to be done:

1.
2.
3.
4.
5.

Load the Junos Software package on the device - KB20955


Verify the Health of the Cluster (Important step) - KB20956
Create backup of the current configuration and set the rescue config - KB20957
Start the In-Service Software Upgrade - KB20958
Process to follow, in the event of the ISSU process stalling in the middle of the upgrade - KB19500

Load the Junos Software package on the device KB20956


a. Junos software installation requires the package to be on the SRX device.
For help on getting the Junos software package, refer to Downloading Software Packages from
Juniper Networks.
There are multiple methods for transferring the software package to the device. Copy/transfer the
software package on to the device by using FTP or USB. (You can determine the amount of
temporary storage space left on the device by following KB17367.)
b. Once the package is loaded, verify that it is available under the directory (/cf/var/tmp). You can do
this by following any of the methods shown below:
From Shell:
{primary:node0}
root@test-node0> exit
root@test-node0% ls /var/tmp
juniper.conf.spu.gz
juniper.data
junos-srx5000-10.1R1.8-domestic.tgz
From CLI:
{primary:node0}
root@test-node0> file list /cf/var/tmp/junos*
junos-srx5000-10.1R1.8-domestic.tgz
c. To ensure that the package transferred to the device is not truncated or corrupted, perform a MD5
checksum, which proves the integrity of the package.
> file checksum md5 /var/tmp/jinstall-ex-4200-10.4R1.9-domestic-signed.tgz

Verify the Health of the Cluster (Important step) - KB20956


Review the output of the following commands, and verify completely that the cluster is in
good shape and the health is excellent. It is highly recommended that ISSU is done only if the
Chassis Cluster is in a healthy failover state.
1. Confirm the Chassis Cluster is in the Primary/Secondary state with a proper priority.
Follow the steps here: KB20673 - How to verify that Chassis Cluster in
Primary/Secondary State has proper priority. KB20673 is the common method for

verifying the Chassis Cluster health. However for an ISSU upgrade, also perform the
following steps.
KB20673
Run the command show chassis cluster status on either node to verify the Chassis
Cluster status:
{primary:node0}
root@J-SRX> show chassis cluster status
Cluster ID: 1
Node
Priority

Status

Preempt

Manual failover

Redundancy group: 0 , Failover count: 1


node0
100
secondary
node1
150
primary

no
no

no
no

Redundancy group: 1 , Failover count: 1


node0
100
secondary
node1
150
primary

no
no

no
no

Do you see one node with the status of primary and one node with the status of secondary?

Yes - Proceed with Step 2


No - Go to KB20641 - Troubleshooting steps when the Chassis Cluster does not come up

What is the priority of each node?

If the priority is 0, then proceed to KB16869 - What does priority 0 mean in a JSRP chassis
cluster?

Priority 0 means that the node is in the ineligible state. There are several reasons
why you could see the ineligible state:

Cold sync failure (see J-Series/SRX Security Configuration Guide for more
details)
Monitored interface down
IP Tracking is failing (SRX3000 and SRX5000)
Possible hardware issue
Perform the following to correct the priority 0 state:

Check chassis cluster statistics. Are there any missing heartbeats or probes?
Check chassis cluster interfaces. Are any of the monitored interfaces down or is a
tracked IP missed?
Check jsrpd logs. Are there any errors?
Check chassisd logs. Is any hardware down?
Check messages log. Are there any events leading up to the problem?

If the priority is 255, then proceed to KB16870 - What does priority 255 mean in a JSRP
chassis cluster?

Priority 255 means that a manual failover was initiated. Manual failover will show
'yes' in that scenario.
After a manual failover, it is always recommended to reset the manual flag in the
cluster status. Otherwise, no additional failovers may occur for that redundancy
group.
To remove manual failover state and restore proper priority state, use below CLI
command.
request chassis cluster failover reset redundancy-group <0-128>

If the priority is between 1 and 254, proceed to Step 3.

If the priority for both nodes is between 1 and 254, it means that the Chassis Cluster
is in a healthy state.
Each Redundancy Group (other than Redundancy Group 0) contains one or more redundant
Ethernet interfaces. A redundant Ethernet interface is a pseudo interface that contains a pair
of physical Gigabit Ethernet interfaces or a pair of Fast Ethernet interfaces. If a Redundancy
Group is active on node 0, then the child links of all the associated redundant Ethernet
interfaces on node 0 are active. If the redundancy group fails over to node 1, then the child
links of all redundant Ethernet interfaces on node 1 become active.
2. Verify that all the FPCs and the PICs are showing online. If any of the FPCs are
showing as Present or Offline, it is important to determine the cause for this and
make sure that on both nodes they come up as online before proceeding further. (If the
FPC in non-online mode does not take part in the Chassis Cluster failover, then it
should be OK to proceed. Contact your technical support representative, if in doubt at
this stage.)
{primary:node0}
root@test-node0> show chassis fpc pic-status
node0:
-------------------------------------------------------------------------Slot 0 Online SRX5k DPC 40x 1GE
PIC 0 Online 10x 1GE RichQ
PIC 1 Online 10x 1GE RichQ
PIC 2 Online 10x 1GE RichQ
PIC 3 Online 10x 1GE RichQ
Slot 3 Online SRX5k SPC
PIC 0 Online SPU Cp
PIC 1 Online SPU Flow
Slot 4 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow

Slot 5 Online SRX5k SPC


PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 6 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 7 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 8 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
node1:
-------------------------------------------------------------------------Slot 0 Online SRX5k DPC 40x 1GE
PIC 0 Online 10x 1GE RichQ
PIC 1 Online 10x 1GE RichQ
PIC 2 Online 10x 1GE RichQ
PIC 3 Online 10x 1GE RichQ
Slot 3 Online SRX5k SPC
PIC 0 Online SPU Cp
PIC 1 Online SPU Flow
Slot 4 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 5 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 6 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 7 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 8 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
3. Check the Chassis control link and verify that you see a closely uniform sent/receive
packets. Also confirm that the error count is not increasing. It is suggested to run
this command twice.
srx> show chassis cluster control-plane statistics ()

4. It is best to have all the Redundancy Groups to be primary on any one node, eg. node
0. If not, proceed to do a failover of the Redundancy Group before you proceed
further.
srx> request chassis cluster failover redundancy-group 0 node 0
srx> request chassis cluster failover reset redundancy-group 0

srx> request chassis cluster failover redundancy-group 1 node 0


srx> request chassis cluster failover reset redundancy-group 1

For the failover for RG0, there might be a slight lag and you may need to wait for
about 2 3 mins maximum as the RE is getting failed over. The rest of the RG groups
will failover faster.
5. Verify there are no alarms. Run the following command:
{primary:node0}
root@test-node0> show chassis cluster information

The cluster information and statistics should not be showing any alarms that could
cause a disruption when this is going on. You can check for the SPU counts and it
should match with the number of SPU you have on the device.
Check if the events are showing any irregular problems. If so, solve that first before
proceeding further.
Verify that the date and time match with the system date and time.
Check for errors happening during our troubleshooting period prior to the upgrade.
Check if the packet counts and SPU counts match each other. If you do find
discrepancies, contact your technical support representative for consultations before
proceeding further.
root@test-node0> show chassis cluster information | no-more

Create backup of the current configuration and set the rescue config - KB20957
If the current configuration is a good one, save it as RESCUE:
{primary:node0}
root@test-node0> request system configuration rescue save

Reason: Precautionary step for ISSU, to make sure that even if active configuration gets
wiped out for some reason, the device will always have a rescue config to load from once it
boots up. You can verify the rescue configuration on the device by doing a 'file list /config/'
to confirm that the file exists.
Then do a 'file show /config/rescue.conf.gz' and review the contents.
Further, it is recommended that you have a latest copy of the running configuration stored on
a different storage device/server for easy retrieval if required.

Start the In-Service Software Upgrade - KB20958


1. First, verify that you have both console connectivity to the primary and
secondary nodes, and verify that 'logging' is enabled on both terminal sessions. This
is necessary to verify and monitor the ISSU process as it upgrades the Junos image.
2. Perform the upgrade with the following command:
{primary:node0}
root@test-node0> request system software in-service-upgrade /var/tmp/junossrx5000-10.4R3.7-domestic.tgz reboot <---be sure to include the 'reboot' option

Important: Make sure that you have the reboot command specified in the command.
If is not specified, it will upgrade node 1 but not reboot, and the physical reboot of
node 1 is needed before the automatic failover happens. This could also lead the ISSU
process to stall. If it stalls, follow the ISSU abort process in KB19500.
The messages that reported on node 0 and node 1 will be on similar lines as follows.
(Messages that are not important have been omitted.)
NODE 0:
{primary:node0}
root@test-node0> request system software in-service-upgrade
/var...(complete the package information as shown above)
Chassis ISSU Started
node1:
------------------------------------------------------------------------Chassis ISSU Started
ISSU: Validating Image
Inititating in-service-upgrade
node1:
------------------------------------------------------------------------Inititating in-service-upgrade
Checking compatibility with configuration
Initializing...
Verified manifest signed by PackageProduction_10_1_0
Verified junos-10.1-domestic signed by PackageProduction_10_1_0
Using /var/tmp/junos-srx5000-domestic.tgz
Checking junos requirements on /
Saving boot file package in /var/sw/pkg/junos-boot-srx500010.1R4.4.tgz
Verified manifest signed by PackageProduction_10_1_0
Hardware Database regeneration succeeded
Validating against /config/juniper.conf.gz
mgd: commit complete
Validation succeeded
Validating against /config/rescue.conf.gz
mgd: commit complete
Validation succeeded
ISSU: Preparing Backup RE
Pushing bundle to node1
JUNOS 10. become active at next reboot
WARNING: A reboot is required to load this software correctly
WARNING: Use the 'request system reboot' command
WARNING: when software installation is complete
Saving package file in /var/sw/pkg/junos-10..tgz ...
Saving state for rollback ...

Finished upgrading secondary node node1


Rebooting Secondary Node
node1:
------------------------------------------------------------------------Shutdown NOW!
[pid 21958]
ISSU: Backup RE Prepare Done
Waiting for node1 to reboot.
node1 booted up.
Waiting for node1 to become secondary
node1 became secondary.
Waiting for node1 to be ready for failover
ISSU: Preparing Daemons

Once this is done, on NODE 1, the following will be reported :


{secondary:node1}
root@test-node1> show chassis cluster status
Cluster ID: 2
Node Priority Status Preempt Manual failover
Redundancy group: 0 , Failover count: 2
node0 254 primary no no
node1 2 secondary no no
Redundancy group: 1 , Failover count: 2
node0 254 primary no no
node1 0 secondary no no

At this stage, Node 1 has rebooted successfully and is on the Junos version that you
upgraded to. Check the command show version to verify this. Also, check the
following commands.
srx> show chassis cluster status
srx> show chassis fpc pic-status (all the PICs in NODE 1 should be online keep
monitoring it for 2 mins or so to make sure all are online)
srx> show chassis alarms
srx> show system alarms
Srx> show log messages | grep issu

Now the automatic failover will happen and once that is done, the upgrade of Node 0
will happen. The messages reported are similar to above, but still monitor it to see if
there are any problems or warnings that the boot messages are throwing.
Node 0 should come back up in the healthy state. Verify everything as mentioned in
the KB20673 - How to verify that Chassis Cluster in Primary/Secondary State has
proper priority
Also see that the Redundancy Groups are now primary on Node 1 to bring it back to
node 0, follow the process as shown below:
srx>
srx>
srx>
srx>

request
request
request
request

chassis
chassis
chassis
chassis

cluster
cluster
cluster
cluster

failover
failover
failover
failover

redundancy-group 0 node 0
redundancy-group 1 node 0
redundancy-group X node 0
reset redundancy-group X

As mentioned earlier, you will see that the failover of RG0 might take some time. The rest of
the Redundancy Groups should failover fast.
The ISSU process is now complete, and you can check the health of the Cluster as mentioned
in KB20673 - How to verify that Chassis Cluster in Primary/Secondary State has proper
priority.
Process to follow, in the event of the ISSU process stalling in the middle of the upgrade KB19500
In case system does not complete ISSU process perform the following steps to completely
stop the ISSU process and rollback to previous state.

If both nodes completed upgrade, verify with 'show version', run the following commands on
both nodes simultaneously to rollback to previous Junos version
request chassis cluster in-service-upgrade abort
request system software rollback
request system reboot

If only node completed upgrade, verify with 'show version', run the following commands
1) On upgraded node
request chassis cluster in-service-upgrade abort
request system software rollback

2) On Node that did not complete upgrade


request chassis cluster in-service-upgrade abort

3) On both nodes after completing the above steps


request system reboot

If neither node completed upgrade succesfully , verify with 'show version', run the
following commands on both nodes simultaneously to rollback to previous Junos version
request chassis cluster in-service-upgrade abort
request system reboot

Вам также может понравиться