Вы находитесь на странице: 1из 8

Path Length

1)

Introduction

In calculating the average path length, we must find the shortest path from a source node
to all other nodes contained within the graph. Previously, we found that by using an
inefficient algorithm, experimentally calculating the path length of a graph can be time
consuming. Since then, I have looked further into the necessary algorithms for solving
this problem.

2)

Verification of the Algorithm and Code


Algorithm

The algorithm I used to solve the single source shortest path problem within the
polymeric gel was a breadth first search. A breadth first search begins at a starting node
and explores all of the neighboring nodes. Then for each of those nearest nodes, it
explores their unexplored neighbors until it finds the goal.

For every node within the graph do {


Initialize the distances to all other vertices as -1 (not computed),
Initialize the queue to null
Store s (start node) in a queue
Set the distance to s to be 0 in the Distance Table.
While there are vertices in the queue {
Read a vertex v from the queue
For all adjacent vertices w {
If distance to w is -1 (not computed) do {
Make distance to w equal to (distance to v) + 1
Add w to the queue

The algorithm can be designed to generate a distribution of path lengths between nodes.
This is done by tracking the following process within a given timestep: Starting from a
given node the algorithm will keep count how many nodes are immediate neighbors to
this starting node. The path length between these immediate neighbors and the starting
node is one. This is then done for second nearest neighbors. The path length between the
second nearest neighbors and the starting node is two. This process continues until all
reachable neighbors are visited.

With the polymeric gel, it was necessary to repeat this algorithm N (the number of
aggregates within a timestep) times, starting from each node within a given timestep. It
is well-known that within our gel network, all aggregates are not necessarily connected.
Rattlers and a disconnected graph resulting in a giant component are two examples of this
situation. To account for the shortest paths between all of the aggregates it is necessary
to repeat this algorithm, starting from each node within the network. By repeating this
algorithm, starting from each aggregate and continually updating shorter path lengths for
a given timestep, the discontinuous nature of the polymeric gel network can be accounted
for.
From here a distribution is created by counting the number of each of the specific path
lengths within each timestep.

ErdsRnyi Random Graph

In an effort to validate the accuracy of the FORTRAN code, I compared experimental


results of an ErdsRnyi random graph to the calculated values of well-known formula.
Starting with N disconnected nodes, ErdsRnyi Random graphs are generated by
connecting couples of randomly selected nodes, prohibiting multiple connections, until
the number of edges equals K (S. Boccaletti et al./ Physics Reports 424 (2006) 175-308).
Connections between randomly chosen nodes were made with the exception of loops.
Any connection resulting in a loop was not allowed.
Using this definition of an E.R random graph, I created networks that contain the same
number of nodes and links as our gel at that given temperature. I looked at two
experimental values and three calculated values for each temperature. The experimental
values were gathered using the FORTRAN code of the breadth first search algorithm.
The first was the average path length as calculated from the probability distribution of
path lengths. I calculated the probability distribution by dividing the path length
distribution by the sum of all the path lengths (this includes any disconnections),

Pi (k ) =

Li
Li

From here the average path length is defined as

l = P(k ) k .

[1]

In the following tables of data, this value is labeled Experimental 1.

The second method of experimentally calculating the average path length is defined as
follows

l=

1
Di , j

N ( N 1) i , j

[2]

In an unweighted graph, Di , j is the shortest distance between node i and node j . This
definition assumes that Di , j = 0 if node i cannot be reached by node j or if i = j . N in
this definition is the total number of nodes who have connections. In the following tables
of data, this value is labeled Experimental 2.

I compared these two methods of experimentally gathering average path lengths to


calculated values for a random E.R. graph using formula found in M. Newman et al. /
Phys. Rev. E 64 026118 (2001). In each case z m is the average number of neighbors at
distance m . The first is as follows

ln ( N 1)( z 2 z1 ) + z12 ln z12


l=
ln ( z 2 z1 )

[3]

and will be labeled Calculated 1 in the following table of data (Table 1). In the special
circumstance where the following two conditions hold,

N >> z1

z 2 >> z1

Eq.[3] reduces to

l=

ln(N z1 )
+ 1.
ln( z 2 z1 )

[4]

This value will be labeled Calculated 2. In the special case of an E.R random graph,
2
for which z1 = k and z 2 = k , Eq.[4] reduces to the following

l=

ln( N )
ln k

[5]

(S. Boccaletti et al./ Physics Reports 424 (2006) 175-308). In the following tables of
data, this value will be labeled Calculated 3.

There are a couple of considerations to take into account. In the creation of the E.R.
random networks we started with the same number of nodes and links as the gel at each
given temperature. Importantly, when we randomly choose to make connections between
N nodes, not all of the N nodes will be selected. There will be some nodes that do not
have connections to others. This results in a network with k links, but the total number of
connected nodes is less than the number of desired nodes. This fact is important when
comparing calculated values to experimental results.
I am assuming that based upon the definition of an E.R. random graph, according to S.
Boccaletti et al., the value for N must include those nodes that do not have connections.
A second consideration is in the fact that the created E.R random graph might be
disconnected. The formula used to calculate average path lengths assumes that all nodes
are reachable from any randomly chosen starting node. As stated in M. Newman et al. /
Phys. Rev. E 64 026118 (2001), in general this will not be true and Eq. [4] is
meaningless. A better approximation to l may therefore be given by replacing N in
Eq.[4] by NS , where S is the fraction of the graph occupied by the giant component.
Therefore, I made this approximation. I averaged the largest component of the random
graph per timestep and included this factor in each of the three calculated values.

Figures 1 and 2 contain plots of average path length verses temperature for the gel and
random graphs.

Path Length

Gel - Exp 1
Random - Exp 1
Random - Calc 1
Random - Calc 2
Random - Calc 3

Average Path Length

20

15

10

0
0.5

1.5

Temperature
Figure 1 contains a graph of the average path length data verses temperature for the polymeric gel and E.R.
random graphs.

Path Length

Average Path Length

10

Gel - Exp 1
Random - Exp 1
Random - Calc 1
Random - Calc 2
Random - Calc 3

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Temperature
Figure 2 contains a graph of the average path length data verses temperature for the polymeric gel and E.R.
random graphs. This is the same data as in figure 1, just a closer view.

Tables 1 and 2 contain the average path length data (experimental and calculated) for the
random graph and the polymeric gel.

Table 1

E.R. RANDOM MATRIX PATH LENGTH


N = 2000

Temperature
Cluster Count (approx)
links
# of Nodes

2.00

1.80

1.50

1.20

1.00

0.90

0.80

0.70

0.60

0.50

0.45

0.40

0.35

0.30

1413
971

1335
966

1250
960

1117
948

977
934

883
921

762
904

606
877

427
836

261
790

206
772

176
747

161
724

142
734

1053.1

1020

981.8

910.7

833.4

773

691.2

573.4

418.5

260.5

205.8

175.9

161

142

0.9634 0.98884 0.99689

Ratio - Giant Component


to Total Cluster Count

0.65511 0.71167 0.77297 0.84781 0.90557 0.93105

Average Z1
Average Z2

1.84598 1.89608 1.95763 2.08411 2.24382 2.38551 2.61863 3.06243


4 6.07294 7.51215 8.504832 9.00621 10.3521
2.5348 2.75686 3.00387 3.52454 4.27094 4.96455 6.19878 8.76317 15.2793 33.557 47.1817 56.30699 59.6199 69.2113

<k>

1.84598 1.89608 1.95763 2.08411 2.24382 2.38551 2.61863 3.06243

4 6.07294 7.51215 8.504832 9.00621 10.3521

<k>^2

3.40764 3.59511 3.83231 4.34352 5.03473 5.69066 6.85725 9.37851

16 36.8806 56.4324 72.33217 81.1118 107.166

Experimental values
Experimental 1

Average Path Length

Average Path Length

6.63186 7.10304 7.71975 7.95787

7.7241 7.28176 6.72078 5.79176 4.49197 3.26783 2.83802

2.62216 2.53295 2.35146

Experimental 2
6.63839 7.11042 7.72812 7.96762 7.73413 7.29168

6.7307 5.80216 4.50281 3.28044 2.85189

2.63716 2.54878 2.36814

Calculated values
Calculated 1

Average Path Length

Average Path Length

Average Path Length

16.5118 14.5079 13.0273 10.9471 9.13104 8.08003 6.90495 5.61081 4.25684 3.08397 2.70885 2.517848 2.44093 2.29601

Calculated 2
20.607 17.6088 15.4842 12.6461 10.2862

8.9716 7.54056 6.01875 4.48258 3.19999 2.80211 2.602939

2.5256 2.37825

Calculated 3
11.1426 10.7169 10.2323 9.33257 8.39577 7.72001 6.85466 5.71444

4.3668 3.08482 2.64211 2.415398 2.31192 2.12042

Table 2

POLYMERIC GEL PATH LENGTH


N = 2000

Temperature

2.00

Cluster Count (approx)


links

1413
971

# of Nodes

1.80
1335
966

1.50
1250
960

1.20
1117
948

1.00
977
934

0.90

0.80

883
921

762
904

0.70
609
877

0.60
429
836

0.55
338
815

0.50
261
790

1413.39 1335.94 1250.81 1117.64 977.685 883.087 762.985 609.691 429.014 338.657 261.948

Ratio - Giant Component


to Total Cluster Count

0.0345

0.0677

0.45

0.40

206
772

0.35

176
747

161
724

0.30
142
734

206.97 176.1839 161.984 144.111

0.1652 0.46671 0.66513 0.74723 0.82145 0.88617 0.93782 0.95747 0.97467 0.98735 0.995831 0.99874 0.99259

Average Z1
Average Z2

1.36991 1.43859 1.52291 1.67481 1.86827 2.02433 2.26189 2.66008 3.36427 3.91926 4.62495 5.35033 5.770649 5.96497 6.19191
1.17664 1.47748 1.88046 2.70186 3.89262 4.94847 6.69323 9.75338 14.8364 17.9091 20.9416 23.409 24.95407 24.7829 25.5831

<k>
<k>^2

1.37713 1.44645 1.53145 1.6843 1.87857 2.03531 2.27322 2.67195 3.37635 3.93092 4.63552 5.35828 5.775521 5.96762 6.33488
1.89648 2.09223 2.34535 2.83688 3.52903 4.14248 5.16751 7.13932 11.3998 15.4521 21.488 28.7112 33.35664 35.6125 40.1307

Experimental values
Experimental 1

Average Path Length

Average Path Length

0.02007 0.07875 0.47734 2.95585 4.43444 4.77851

4.8884 4.76399 4.45435 4.26854 4.08551 3.91669 3.774252 3.73232 3.47408

Experimental 2

3)

0.02009 0.07883 0.48037 2.95965 4.44104 4.78668 4.89813 4.77602 4.46983 4.28665 4.10622 3.93957 3.797657 3.75663 3.50048

Discussion

For the random graph, as seen in table 1, the experimentally gathered path lengths ([1]
and [2]) are in close agreement. Yet the difference in the calculated values increases with
temperature. It was expected that the calculation of the path length using formula [3],
[4], and [5] would be more consistent with the experimental results. But, we can see in
figure 1 that this is not the case. Calculated 3 Eq.[5] seems to be the closest to the
experimental results for a random graph at all temperatures. However, the two conditions
of N >> z1 and z 2 >> z1 are not met for all temperatures. I feel that the second condition
is not met for temperatures greater than 0.6. Due to this, I should expect that the
calculated value (using Eq.[5]) should start to deviate from experimental starting at T =
0.6. As seen in figure 2, Random - calc 3, Eq.[5] seems to hold consistent with
experimental results up to T = 0.8.
Since Eq.[4] and [5] have been stated as the result of reducing Eq.[3], while imposing
special conditions, I have assumed that Eq.[3] should hold true for the random graph at
all temperatures and without these two special conditions. Yet, this calculation is only
second best to the experimental results.
If I were to change the value of N by using only the number of connected nodes, this
would result in a lower calculated value in all three cases. However, the calculated
values would still deviate at higher temperatures.

To check if the FORTRAN code was functioning as desired, I have built two small
networks of 20 nodes. Twice, I manually drew the connections between nodes and
verified that the resulting shortest path lengths and distributions are correct.
In this work, if a starting node does not have a path to another, its shortest path length
(zero) is not counted. Earlier, which at this point I dont remember the details, you
informed me of how to deal with these disconnections while taking the inverse. Could
you refresh my memory on those details?

Вам также может понравиться