Вы находитесь на странице: 1из 12

#A6 INTEGERS 11 (2011)

MAXIMUM GCD AMONG PAIRS OF RANDOM INTEGERS


R. W. R. Darling
Mathematics Research Group, National Security Agency, Fort George G. Meade,
Maryland, USA
E. E. Pyle
Mathematics Research Group, National Security Agency, Fort George G. Meade,
Maryland, USA
Received: 11/13/09, Revised: 9/3/10, Accepted: 11/23/10, Published: 1/13/11
Abstract
Fix > 0, and sample N integers uniformly at random from
_
1, 2, . . . ,
_
e
N
__
.
Given > 0, the probability that the maximum of the pairwise GCDs lies between
N
2
and N
2+
converges to 1 as N . More precise estimates are obtained.
This is a Birthday Problem: two of the random integers are likely to share some
prime factor of order N
2
_
log(N). The proof generalizes to any arithmetical semi-
group where a suitable form of the Prime Number Theorem is valid.
1. Main Result
The distribution of the sizes of the prime divisors of a random integer has been
well studiedsee portions of Billingsley [1]. Diaconis and Erd os [3] compute the
probability distribution of the GCD of two random integers; moments of this dis-
tribution were estimated earlier by Ces` aro [2]; the GCD of k random integers was
treated by Nymann [5]. However the authors are unaware of any published results
on the pairwise Greatest Common Divisors (GCD) among a large collection of ran-
dom integers. Theorem 1 establishes probabilistic upper and lower bounds for the
maximum of these pairwise GCDs.
Theorem 1. Suppose > 0, and T
1
, . . . , T
N
is a random sample, drawn with
replacement, from the integers
_
n N : n e
N
_
. Let
j,k
denote the Greatest
Common Divisor of T
j
and T
k
. For any > 0,
lim
N
P
_
N
2
< max
1j<kN
{
j,k
} < N
2+
_
= 1. (1)
Indeed there are more precise estimates: for all s (0, 1), and b > 0, the right side
INTEGERS: 11 (2011) 2
of (2) is nite, and
P
_
max
1j<kN
{
j,k
} N
2/s
b
1/s
_

1
2b

pP
_
1 +
p
s
1
p
2
p
s
_
, (2)
where P denotes the rational primes; while if
j,k
denotes the largest common prime
factor of T
j
and T
k
, then for all > 0,
lim
N
P
_
max
1k<jN
{
j,k
} <
N
2
log (N

)
_
e
/8
. (3)
Remark. There is an upper bound, similar to (2), for the radical (i.e the largest
square-free divisor) rad(
j,k
) of the GCD:
P
_
max
1j<kN
{rad(
j,k
)} N
2/s
b
1/s
_

1
2b

pP
_
1 p
2
+ p
s2
_
. (4)
The proof, which is omitted, uses methods similar to those of Proposition 2, based
upon a Bernoulli model for occurrence of prime divisors, instead of a Geometric
model for prime divisor multiplicities. For example, when s = 0.999, the prod-
uct on the right side of (4) is approximately 12.44; for the right side of (2), it is
approximately 17.64.
1.1. Overview of the Proof of Theorem 1
Let Z
k
i
be a Bernoulli random variable, which takes the value 1 when prime p
i
divides T
k
. As a rst step towards the proof, imagine proving a comparable result in
the case where
_
Z
k
i
, 1 k N, i 1
_
were independent, and P
_
Z
k
i
= 1

= 1 /p
i
.
The harder parts of the proof arise in dealing with the reality that, for xed k,
_
Z
k
i
, i 1
_
are negatively associated, and change with N. Convergence of the
series

pP
p
2
log(p) <
ensures that the parameter , which governs the range of integers being sampled,
appears in none of the bounds (1), (2), nor (3). However the proof for the lower
bound depends crucially on an exponential (in N) rate of growth in the range, in
order to moderate the dependence among
_
Z
k
i
, i 1
_
for xed k.
Consider primes as labels on a set of urns; the random variable T
j
contributes
a ball to the urn labelled p if prime p divides T
j
. The lower bound comes from
showing that, with asymptotic probability at least 1e
/8
, some urn with a label
p > N
2
/ log
_
N

_
contains more than one ball; in that case prime p is a common
divisor of two distinct members of the list T
1
, . . . , T
N
. The upper bound comes from
a rst moment estimate: multiply the number of pairs by the probability that a
specic pair has a GCD above some threshold.
INTEGERS: 11 (2011) 3
If T
1
, . . . , T
N
were sampled uniformly without replacement from the integers from
1 to N
2
, the lower bound (3) would fail; see the analysis in [1] of the distribution of
the largest prime divisor of a random integer. In the case of sampling from integers
from 1 to N
r
, where r 3, the upper bound (2) remains valid, but we do not know
whether the lower bound (3) holds or not.
1.2. Generalizations to Arithmetical Semigroups
Although details will not be given, the techniques used to prove Theorem 1 will
be valid in the more general context of a commutative semigroup G with identity
element 1, containing a countably innite subset P = {p
1
, p
2
, . . .} called the primes
of G, such that every element a = 1 of G has a unique factorization of the form
a =

i1
p
ei
i
, (e
1
, e
2
, . . .) Z

+
where all but nitely many (e
i
) are zero. Assume in addition that G is an arith-
metical semigroup in the sense of [4], meaning that there exists a real-valued norm
| | on G such that:
|1| = 1, |p
i
| > 1 for all p
i
P.
|ab| = |a||b| for all a, b G.
The set
G
(x) = {i 1 : |p
i
| e
x
} is nite, for each real x > 0.
The only analytic condition needed is an abstract form of the Prime Number The-
orem (see [4, Chapter 6]):
lim
x
xe
x
|
G
(x)| = 1,
used in the proof of Proposition 5. This in turn will imply convergence of series
such as:

pP
log
_
1 + |p|
s2
_
, s < 1,
which appear (in an exponentiated form) in the bound (2). For example, Landaus
Prime Ideal Theorem provides such a result in the case where G is the set of integral
ideals in an algebraic number eld, P is the set of prime ideals, and |a| is the norm
of a. Knopfmacher [4] also studies a more general setting where, for some > 0,
lim
x
xe
x
|
G
(x)| = .
The authors have not attempted to modify Theorem 1 to t this case.
INTEGERS: 11 (2011) 4
1.3. Future Lines of Enquiry
Test of Arithmetic Randomness. The authors do not know whether
N
2
max
1j<kN
{
j,k
}
has a limit in distribution as N . An anonymous referee points out that, if it
does, then a test for the arithmetic randomness of a sequence of N integers would
result: namely, compute the maximum of the pairwise GCDs, divide by N
2
, and
compute a p-value.
Ecient Computation. How might the maximum of the pairwise GCDs of N
large random integers be computed eciently? Perhaps smaller prime factors could
be removed by a sieve. Is there an ecient way to detect a squared prime of size
about N
2
_
log(N) in the product of all the integers? To detect the largest common
prime factor among all pairs of integers, is it better to compute for k = 1, . . . , N1
the GCD of the product T
1
T
2
T
k
with T
k+1
, rather than to compute each of the
pairwise GCDs?
2. Pairwise Minima in a Geometric Probability Model
2.1. Geometric Random Vectors
Let P = {p
1
, p
2
, . . .} denote the rational primes {2, 3, 5, . . .} in increasing order. Let
I denote the set of non-negative integer vectors (e
1
, e
2
, . . .) for which

e
i
< .
Let X
1
, X
2
, . . . be (possibly dependent) positive integer random variables, whose
joint law has the property that, for every k N, and every (e
1
, e
2
, . . .) I for
which e
k
= 0,
P
_
X
k
m |

i=k
{X
i
= e
i
}
_

_
1
p
k
_
m
. (5)
Consider X
1
, X
2
, . . . as a general model for prime multiplicities in the prime
factorization of a random integer, without specifying exactly how that integer will
be sampled. Let denote the random vector:
= (X
1
, X
2
, . . .) N
N
. (6)
Let
(1)
,
(2)
, . . . ,
(N)
be independent random vectors, all having the same law
as in (6). Write
(k)
as
_
X
k
1
, X
k
2
, . . .
_
. Then
L
j,k
=

i
min
_
X
k
i
, X
j
i
_
log (p
i
)
INTEGERS: 11 (2011) 5
is a model for the log of the GCD of two such random integers. We shall now derive
an upper bound for

N
= max
1k<jN
{L
j,k
} ,
which models the log maximum of the pairwise GCD among a set of N large,
random integers.
Proposition 2. Assume the joint law of the components of satises (5). Then
(i) For every s (0, 1), the following expectation is nite:
E
_
e
sL
k,j

<

i
_
1 +
p
s
i
1
p
2
i
p
s
i
_
= C
s
< . (7)
(ii) For any s (0, 1) and b > C
s
/ 2, with C
s
as in (7), there is an upper bound:
P
_

N
log(N
2/s
) + s
1
log(b)
_

C
s
2b
< 1. (8)
Proof. Consider rst the case where X
1
, X
2
, . . . are independent Geometric random
variables, and
P[X
k
m] =
_
1
p
k
_
m
, m = 1, 2, . . .
It is elementary to check that, for s (0, 1), and any p P, if X

, X

are indepen-
dent Geometric random variables with
P[X

m] = p
m
= P[X

m] , m = 1, 2, . . . ,
then their minimum is also a Geometric random variable, which satises
E
_
p
s min{X

,X

}
_
= 1 +
p
s
1
p
2
p
s
< 1 + p
s2
.
It follows from the independence assumption that
E
_
e
sL
k,j

= E
_

i
p
s min{X
k
i
,X
j
i
}
i
_
=

i
_
1 +
p
s
i
1
p
2
i
p
s
i
_
= C
s
.
This veries the assertion (7).
Markovs inequality aP[X a] E[X] shows that, for any s (0, 1),
C
s
e
st
P[L
k,j
t] .
Furthermore
P
_
max
1k<jN
{L
k,j
} t
_
= P
_

1k<jN
{L
k,j
t}
_

1k<jN
P[L
k,j
t] =
N(N 1)
2
P[L
k,j
t] .
INTEGERS: 11 (2011) 6
It follows that, for s (0, 1), b > 0, and t = s
1
log
_
bN
2
_
,
P
_

N
log(N
2/s
) + s
1
log(b)
_

N
2
2
e
st
C
s
=
C
s
2b
.
It remains to consider the case where X
1
, X
2
, . . . satises (5) without the inde-
pendence assumption. This will be easy because, under (5), large values of the X
i
are less likely than in the independent Geometric case, so the same upper bound
remains valid. A coupling construction will be used to build dependent random vari-
ables on the same probability space as the one for independent random variables.
By taking products of probability spaces, construct a probability space (, F, P)
on which independent Geometric random variables X

1
, X

2
, . . . and X

1
, X

2
, . . . are
dened, such that for all i 1,
P[X

i
m] = p
m
i
= P[X

i
m] , m = 1, 2, . . . .
We propose to construct
(1)
=
_
X
1
1
, X
1
2
, . . .
_
and
(2)
=
_
X
2
1
, X
2
2
, . . .
_
by induc-
tion on this probability space (, F, P), so that for each n 1,
__
X
1
i
, X
2
i
__
1in
have the correct joint law, and
X
1
i
X

i
, X
2
i
X

i
, i = 1, 2, . . . .
Once this is achieved, monotonicity implies
E
_
e
sL1,2

E
_

i
p
s min{X

i
,X

i
}
i
_
,
so the desired result will follow from the previous one for independent Geometric
random variables.
Since
(1)
and
(2)
are independent, it suces to construct
(1)
in terms of
X

1
, X

2
, . . . so that X
1
i
X

i
for all i. Let (U
i,j
, i 1, j 0) be independent
Uniform(0, 1) random variables. Suppose either i = 1, or else some values X
1
1
=
e
1
, X
1
2
= e
2
, . . . , X
1
i1
= e
i1
have already been determined. By assumption, there
exist parameters
q
i,k
= P
_
X
i
k |

j<i
{X
j
= e
j
}
_

_
1
p
i
_
k
, k = 1, 2, . . . .
Use these to construct X

i
and X
1
i
as follows:
X

i
= min
_
k : U
i,0
U
i,1
. . . U
i,k
>
_
1
p
i
_
k
_
X
1
i
= min{k : U
i,0
U
i,1
. . . U
i,k
> q
i,k
} X

i
.
This completes the construction and the proof, giving the result (8).
INTEGERS: 11 (2011) 7
3. Lower Bound for Largest Collision
3.1. Random Vectors with Independent Components
Let P = {p
1
, p
2
, . . .} denote the rational primes {2, 3, 5, . . .} in increasing order, and
let a
j
= (log (p
j
))
1/2
. Instead of the Geometric model (5), switch to a Bernoulli
model in which Z
1
, Z
2
, . . . are independent Bernoulli random variables, with
P[Z
j
= 1] =
1
p
j
. (9)
Let denote the random vector
= (a
1
Z
1
, a
2
Z
2
, . . .) [0, )
N
(10)
under this new assumption, and let
(1)
,
(2)
, . . . ,
(N)
be independent random vec-
tors, all having the same law as . Note that
(1)

(2)
is not a suitable model
for the GCD of two random integers, because the independence assumption (9) is
not realistic. However it is a useful context to develop the techniques which will
establish the lower bound in Theorem 1.
Write
(k)
=
_
a
1
Z
k
1
, a
2
Z
k
2
, . . .
_
. Informally, we seek a lower bound on the log

N
of the largest prime p
i
at which a collision occurs; collision means that
Z
j
i
= 1 = Z
k
i
for some j, k. Formally,

N
= max
1k<jN
_
max
i
_
Z
j
i
Z
k
i
log (p
i
)
_
max
1k<jN
_

(k)

(j)
_
.
Proposition 3. Given (0, ), and N such that N
2
> 8, we may dene

N
=
N
() implicitly by the identity
2
N
_

N
N
2
dx
2x
2
log(x)
= . (11)
Under the assumption of independence of the components of the random vector (10),
lim
N
P[

N
log (
N
())] 1 e

. (12)
Remark. Existence of such a
N
is assured by the fact that the integral of
x
2
/ log x from 2 to 4 is greater than 1/4. From the integration bounds
1
2
N
log (2
N
)
=
1
log (2
N
)
2
N
_

N
dx
x
2
<
2
N
2
<
1
log (
N
)
2
N
_

N
dx
x
2
=
1
2
N
log (
N
)
,
INTEGERS: 11 (2011) 8
it follows that
N
, dened in (11), satises
N
log (
N
) /N
2
0.25/. Hence for
all suciently large N,
N
< N
2
_
2, and

N
>
N
2
4 log (2
N
)
>
N
2
8 log(N)
. (13)
The proof uses the following technical lemma, which the reader may treat as a
warm-up exercise for the more dicult Proposition 5.
Lemma 4. Let P
N
denote the set of primes p such that
N
< p 2
N
. Let
_
Z
k
p
, p P
N
, 1 k N
_
be independent Bernoulli random variables, where
P
_
Z
k
p
= 1

= 1/p. Take D
p
= Z
1
p
+ . . . + Z
N
p
. Then
lim
N
P
_

pP
N
{D
p
2}
_
= 1 e

. (14)
Proof. Binomial probabilities give:
P[D
p
1] =
_
1
1
p
_
N
+
N
p
_
1
1
p
_
N1
=
_
1
1
p
_
N
_
1 +
N
p 1
_
=
_
1
N
p
+
N(N 1)
2p
2
. . .
__
1 +
N
p 1
_
= 1
N
2
2p
2
+ O
_
N

2
N
_
+ O
_
_
N

N
_
3
_
.
Independence of
_
Z
k
p
, p P
N
, 1 k N
_
implies independence of {D
p
, p P
N
},
so
log P
_

pP
N
{D
p
1}
_
=

pP
N
log (P[D
p
1])
=

pP
N
log
_
1
N
2
2p
2
_
+ O
_
N |P
N
|

2
N
_
+ O
_
N
3
|P
N
|

3
N
_
.
Using the estimates
N
log
N
= O
_
N
2
_
, |P
N
| = O(
N
/ log
N
), and p/
N
2,
the last expression becomes
=

pP
N
N
2
2p
2
+ O
_
N
2

2
N
_
+ O
_
N

N
log (
N
)
_
+ O
_
N
3

2
N
log (
N
)
_
.
All terms but the rst vanish in the limit, while the Prime Number Theorem ensures
that
lim
N

pP
N
N
2
2p
2
= .
INTEGERS: 11 (2011) 9
Therefore
lim
N
P
_

pP
N
{D
p
1}
_
= e

,
and the limit (14) follows.
Proof of Proposition 3. According to our model, if D
p
2 for some p = p
i
P
N
,
then there are indices 1 k < j N for which Z
j
i
= 1 = Z
k
i
. Since log (p
i
)
log (
N
()),
lim
N
P[

N
log (
N
())] lim
N
P
_

pP
N
{D
p
2}
_
= 1 e

.
This veries (12).
4. Application: Pairwise GCDs of Many Uniform Random Integers
We shall now prove an analogue of Lemma 4 which applies to random integers, drop-
ping the independence assumption for the components of the random vector (10).
Proposition 5. Suppose > 0, and T
1
, . . . , T
N
is a random sample, drawn with
replacement, from the integers
_
n N : n e
N
_
. Given (0, ), dene
N
=

N
() implicitly by the identity (11). Let P
N
denote the set of primes p such that

N
< p 2
N
; for p P
N
let D
p
denote the number of elements of {T
1
, . . . , T
N
}
which are divisible by p. Then
lim
N
P
_

pP
N
{D
p
2}
_
= 1 e

. (15)
Proof. As noted above, the Prime Number Theorem ensures that
lim
N

pP
N
N
2
2p
2
= .
More generally, the alternating series for the exponential function ensures that there
is an even integer d 1 such that, given (0, 1), for all suciently large N,
1 e
/(1+)
<
d

r=1
(1)
r+1
I
r
< 1 e
/(1)
where, for {p
1
, . . . , p
r
} P
N
I
r
=

p1<...<pr
N
2r
2
r
(p
1
. . . p
r
)
2
, r = 1, 2, . . . , d.
INTEGERS: 11 (2011) 10
Because
N
/N
2
0, it follows that, for every {p
1
, . . . , p
d
} P
N
,
p
1
. . . p
d
e
N
<
(
N
)
d
e
N
< e
2d log(N)N
0.
Suppose that, for this constant value of d, we x some {p
1
, . . . , p
d
} P
N
; in-
stead of sampling T
1
, . . . , T
N
uniformly from integers up to e
N
, sample T

1
, . . . , T

N
uniformly from integers up to
p
1
. . . p
d
_
e
N
/ (p
1
. . . p
d
)
_
.
From symmetry considerations, the Bernoulli random variables B

1
, . . . , B

d
are in-
dependent, with parameters 1 /p
1
, . . . , 1 /p
d
, respectively, where B

i
is the indicator
of the event that p
i
divides T

1
. By elementary reasoning,
P[D
p
2] =
N
2
2p
2
+ O
_
(N/
N
)
3
_
;
P[D
p1
2, . . . , D
pr
2] =
N
2r
2
r
(p
1
. . . p
r
)
2
+ O
_
(N /
N
)
2r+1
_
,
for r = 1, 2, . . . , d.
If we were to sample T
1
, . . . , T
N
instead of T

1
, . . . , T

N
, the most that such a
probability could change is
P
_
N

i=1
{T
i
= T

i
}
_

Np
1
. . . p
d
e
N
< e
(2d+1) log(N)N
.
The same estimate holds for any choice of {p
1
, . . . , p
d
} P
N
. By the inclusion-
exclusion formula, taken to the rst d terms,
P
_

pP
N
{D
p
2}
_

pP
N
P[D
p
2]

p1<p2
P[D
p1
2, D
p2
2]
+ . . .

p1<...<p
d
P[D
p1
2, . . . , D
p
d
2]
=
d

r=1
(1)
r+1
I
r
+ O
_
(N /
N
)
3
_
+
_
N
d
_
e
(2d+1) log(N)N
.
So under this simplied model, the reasoning above combines to show that, for all
suciently large N,
1 e
/(1+)
< P
_

pP
N
{D
p
2}
_
< 1 e
/(1)
.
Since can be made arbitrarily small, this veries the result.
INTEGERS: 11 (2011) 11
4.1. Proof of Theorem 1
Suppose > 0, and T
1
, . . . , T
N
is a random sample, drawn with replacement, from
the integers
_
n N : n e
N
_
. Let
j,k
denote the largest common prime factor
of T
j
and T
k
. Take

N
= max
1k<jN
{log (
j,k
)} .
In the language of Proposition 5, if D
p
2 for some p P
N
, then there are
indices 1 k < j N for which
j,k
>
N
. So inequality (13) and Proposition 5
imply that, for any = 8 > 0
lim
N
P
_

N
2 log(N) log
_
log
_
N

__
lim
N
P[

N
log (
N
[/8])]
lim
N
P
_

pP
N
{D
p
2}
_
= 1 e
/8
.
This is precisely the lower bound (3). For any > 0, the lower bound in (1) follows
from:
lim
N
P[

N
> (2 ) log(N)] = 1.
Let
j,k

j,k
denote the Greatest Common Divisor of T
j
and T
k
. To obtain
the upper bound (2) on
j,k
, it suces by Proposition 2 to check that condition (5)
is valid, when X
i
denotes the multiplicity to which prime p
i
divides T
1
. Take any
positive integer r 1, any prime p
k
coprime to r, and any m 1. The conditional
probability that p
m
k
divides T
1
, given that r divides T
1
, is
_
e
N
/ (rp
m
k
)
_
e
N
/ r

_
1
p
k
_
m
.
So condition (5) holds. Thus (8) holds, which is equivalent to (2).
Finally we derive the upper bound in (1), for an arbitrary > 0. Fix (0, 1)
and > 0. Select s (0, 1) to satisfy 2/s = 2 + /2. Then choose b = C

s
/ .
According to (8),
P
_

N
(2 + /2) log(N) + s
1
log(b)

/2.
For any N suciently large so that (/2) log(N) > s
1
log(b),
P[
N
(2 + ) log(N)] /2.
This yields the desired bound (1).
INTEGERS: 11 (2011) 12
References
[1] Patrick Billingsley, Convergence of Probability Measures, Wiley, 1999.
[2] Ernest Ces` aro,

Etude moyenne du plus grand commun diviseur de deux nombres, Annali di
Matematica Pura e Applicada, 13(2), 233 - 268, 1885.
[3] Persi Diaconis and Paul Erd os, On the distribution of the greatest common divisor, A
Festschrift for Herman Rubin, 56 - 61, IMS Lecture Notes Monograph Series 45, Institute
of Mathematical Statistics, 2004.
[4] John Knopfmacher, Abstract Analytic Number Theory, Dover, New York, 1990.
[5] J. E. Nymann, On the probability that k positive integers are relatively prime, Journal of
Number Theory, 4(5), 469 - 473, 1972.

Вам также может понравиться