Вы находитесь на странице: 1из 6

I came across an issue with the Knowledge Consistency Checker (KCC) wherein the Active Directory

Replication appears to be fine but one DC only in one (or more) sites begins logging Knowledge
Consistency Checker (KCC) Warning and Error events in the Directory Service event log. I included sample
events below.

KCC is a distributed application that runs on every domain controller. The KCC is responsible for creating
the connections between domain controllers and collectively forms the replication topology. The
KCC uses Active Directory data to determine where (from what source domain controller to what
destination domain controller) to create these connections.

In some cases these errors are logged all the time and in others they are logged at regular intervals and
they clear on their own only to reappear like clockwork. Typically other DCs in the same site(s), perhaps
even in the whole forest, report no KCC errors at all. In some cases the DC logging these errors have a
small number of connection objects compared with their peer DCs in the same site:


In some cases this event is also seen. It suggests name resolution is working but a network port is
blocked:

If you encounter this issue it could be the DC logging the errors is hosting the Intersite Topology
Generator (ISTG) role for its site. This role is responsible for maintaining all of the Inter-site connection
objects for the site. This role polls each DC in its site for connection objects that have failed and if failures
are reported by the peer DCs the ISTG logs these events indicating something is not right with
connectivity.

For those wondering what these events mean here is a quick rundown:
The 1311 event indicates the KCC couldn't connect up all the sites.
The 1566 event indicates the DC could not replicate from any server in the site identified in the event
description.
When logged, the 1865 event contains secondary information about the failure to connect the sites and
tells which sites are disconnected from the site where the KCC errors are occurring.

These errors are pointing to a topology or a connectivity issue. Either there are not enough site links to
connect all the sites or more likely network connectivity is failing for a number of reasons.

If your network is not fully routed (the ability for any DC in the forest to perform an RPC bind to every
other DC in the forest) make certain Bridge All Sites Links (BASL) is unchecked. If BASL is unchecked Site
Links and/or Site Link Bridges must be configured. Site Links and Site Link Bridges provide the KCC with
the information it needs to build connections over existing network routes. If the network is fully routed
and you have BASL checked, its fine. While the network routes may exist the ports needed for Active
Directory to replicate must not be restricted.
To locate the source of the KCC events and identify the root cause, you need to execute the following
commands while the KCC events are being logged.
1) Identify the ISTG covering each site by running this command. Determine from the output if the DC
logging these events (DC1X) is the ISTG or not.
repadmin /istg
The output will list all sites in the forest and the ISTG for each site:

2) If the DC logging the events is the ISTG any one of the DCs in the same site as this ISTG could have
connectivity issues to the site identified in the 1566 event. You can identify which DC(s) are failing to
replicate from the site identified in the 1566 event by running this command which targets all DCs in the
site that the ISTG logging the errors resides in.
For example, IFFANDAD02 is logging the events and it is the ISTG for NavisiteProd. As per the 1566 event
log, USSF is the site with which we have the connectivity issue. To identify which DCs in NavisiteProd are
failing to replicate from USSF run this command:
repadmin /failcache site:NavisiteProd >site-failcache.txt

The failcache output shows three DCs in NavisiteProd:

The output above identifies the Destination DC as IFF-USSF-DC01 in USSF that is failing to inbound
replicate from NavisiteProd. In some cases the DC name is not resolved and shows as a GUID (s9hr423d-
a477-4285-adc5-2644b5a170f0._msdcs.contoso.com). If the DC name is not resolved determine the
hostname of the Destination DC by pinging the fully qualified CNAME, refer below snapshot-

NOTE: IFF-USSF-DC01 may or may not be logging Error events in its Directory Services event log like the
IFFANDAD02 the ISTG is.
3) Logon to the Destination DC identified in the previous step and determine if RPC connectivity from the
Destination DC to the Source DC is working.
repadmin /bind IFFANDAD02.mail.global.iff.com
If repadmin /bind <source DC> from the Destination DC succeeds:
Run repadmin /showrepl <Destination DC> and examine the output to determine if Active Directory
Replication is blocked. The reason for replication failure should be identified in the output. Take the
appropriate corrective action to get replication working.
If repadmin /bind <source DC> from the Destination DC fails:
Verify firewall rules are not interfering with connectivity between the Destination DC and the Source DC. If
the port blockage between the Destination DC and the Source DC cannot be resolved, configure the other
DCs in the site where the errors are logged to be Preferred Bridgeheads and force KCC to build new
connection objects with the Preferred Bridgeheads only.
NOTE: Running "repadmin /bind <source DC> from the ISTG logging the KCC errors may reveal no
connectivity issues to <source DC> in the remote site. As noted earlier, the ISTG is responsible for
maintaining inter-site connectivity and may not be the DC having the problem. For this reason the
command must be run from the Destination DC that repadmin /failcache identified as failing to inbound
replicate.
A successful bind looks similar to this:
C:\>repadmin /bind IFFANDAD02
Bind to IFFANDAD02succeeded.
NTDSAPI V1 BindState, printing extended members.
bindAddr: IFFANDAD02
Extensions supported (cb=48):
BASE : Yes
ASYNCREPL : Yes
REMOVEAPI : Yes
MOVEREQ_V2 : Yes
GETCHG_COMPRESS : Yes
DCINFO_V1 : Yes
RESTORE_USN_OPTIMIZATION : Yes
KCC_EXECUTE : Yes
ADDENTRY_V2 : Yes
LINKED_VALUE_REPLICATION : Yes
DCINFO_V2 : Yes
INSTANCE_TYPE_NOT_REQ_ON_MOD : Yes
CRYPTO_BIND : Yes
GET_REPL_INFO : Yes
STRONG_ENCRYPTION : Yes
DCINFO_VFFFFFFFF : Yes
TRANSITIVE_MEMBERSHIP : Yes
ADD_SID_HISTORY : Yes
POST_BETA3 : Yes
GET_MEMBERSHIPS2 : Yes
GETCHGREQ_V6 (WHISTLER PREVIEW) : Yes
NONDOMAIN_NCS : Yes
GETCHGREQ_V8 (WHISTLER BETA 1) : Yes
GETCHGREPLY_V5 (WHISTLER BETA 2) : Yes
GETCHGREPLY_V6 (WHISTLER BETA 2) : Yes
ADDENTRYREPLY_V3 (WHISTLER BETA 3): Yes
GETCHGREPLY_V7 (WHISTLER BETA 3) : Yes
VERIFY_OBJECT (WHISTLER BETA 3) : Yes
XPRESS_COMPRESSION : Yes
DRS_EXT_ADAM : No
Site GUID: stn45bf5-f33f-4d53-9b1b-e7a0371f9a3d
Repl epoch: 0
Forest GUID: idk4734-eeca-11d2-a5d8-00805f9f21f5
Security information on the binding is as follows:
SPN Requested: LDAP/ IFFANDAD02
Authn Service: 9
Authn Level: 6
Authz Service: 0
4) If these events occur at specific periods of the day or week and then they resolve on their own,
verify DNS Scavenging is not set too aggressively. It could be DNS Scavenging is so aggressive that SRV,
A, CNAME and other valid records are purged from DNS causing name resolution between DCs to fail. If
this is the behavior you are seeing, verify scavenging settings on these DNS zones:
_msdcs.forestroot.com
forestroot.com
Scavenging settings need to be checked on child domains if the Source or Destination DCs are in
child domains.
Example: if Scavenging is set this way the outage will occur every 24 hours:
Non-refresh period: 8 hours
Refresh period: 8 hours
Scavenging period: 8 hours
To correct this change the Refresh and Non-refresh periods to 1 day each and set scavenging to 3 days.
See Managing the aging and scavenging of server data on Technet to configure these settings for the
DNS Server and/or zones.
Hopefully this clears up the mysterious KCC errors on that one DC.
I would welcome advice and suggestions or corrections from our AD Experts :)

Вам также может понравиться