Академический Документы
Профессиональный Документы
Культура Документы
a r t i c l e i n f o a b s t r a c t
Keywords: There is an increasing focus on methods for network data analysis that consider temporal aspects of the
Graph mining data. We propose a method of network analysis based on the idea of a time-respecting subgraph com-
Social network analysis posed of paths of consecutive edge activations. We present an algorithm to identify these structures
Temporal network analysis and apply the algorithm to a network comprising data from The Prosper Marketplace, an online peer-
Peer-to-peer lending
to-peer lending system. To examine the ow of funds in the network, we extract time-respecting sub-
graphs. In the larger time-respecting structures, some members act as both borrowers and lenders, pos-
sibly attempting to prot from the difference between interest rates of incoming and outgoing loans. We
present an analysis of the distribution of time-respecting structures over the lifetime of The Prosper Mar-
ketplace and we examine some structures in detail to show that they do represent arbitrage.
2012 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2012.12.077
3716 U. Redmond, P. Cunningham / Expert Systems with Applications 40 (2013) 37153721
2.1. Temporal network analysis between any two adjacent communications, it appears that the
closer in time they take place, the more likely they are to be about
Although temporal information may be available, much analysis the same topic (Zhao et al., 2010).
is performed as if the network were static. This yields problems
even for basic connectivity. In a static network, given directed 2.2. Peer-to-peer lending
edges (u, v) and (v, w), then u is connected indirectly to w via a path
through v. The same is not necessarily true in a temporal network Peer-to-peer lending provides an online platform for the lend-
context. In this case, if the edge (v, w) activates before (u, v), then u ing of money directly to borrowers, without the intermediation
and w are disconnected, since nothing can propagate between of traditional banks. There is not necessarily a prior relationship
them via v. between lenders and borrowers, although members may join
In much of the literature, network data is segmented into time groups with similar afliations or interests. Lenders choose loans
windows, within which the network is regarded as static. This ap- in which to invest, based on the credit prole of borrowers and
proach suffers from the connectivity issues mentioned, as well as the potential return on investment.
the difculty in choosing the time window size correctly. Certain Social lending among such peers is also inuenced by network
aggregations may fail to capture network properties of interest. effects. Since there is very little contact between lenders and
For example, in a network of mobile telephone calls, networks borrowers, the problem of information asymmetry is a factor. A
aggregated over different time intervals yielded different insights borrower may offset this by being as descriptive as possible in their
(Krings et al., 2012). To counteract this, the use of sets of window loan listing and joining a group (Lerner et al., 2011). Lenders also
lengths has been proposed, to provide an accurate multi-level view choose borrowers based on their friendship network and their
of temporal structure (Lijfjt et al., 2012). endorsements from other members (Lin et al., 2009). This selection
Another problem is that the broader context outside of each criterion sometimes leads to irrational behaviour, with lenders
time window, which may be extremely relevant, is lost. Also, inde- funding the loans of borrowers who have a higher risk of default.
pendent processes may be grouped together in the same time- It is interesting to see from Lin et al. (2009) that members acting
slice, which may not be useful. Our approach counteracts this ef- as both borrowers and lenders tend to outperform pure borrowers
fect by, instead of grouping interactions within a time window, on loan repayment.
grouping interactions that are part of the same process. This means
that the broader context of the interactions is captured. 3. Theory
A review of network concepts that include temporal informa-
tion is introduced by Holme and Saramki (2011). Some of the There have been varied approaches to the representation of
key denitions are presented here. In a temporal graph, Kempe temporal networks in studies thus far, as noted in Section 2. We
et al. (2002) dene a time-respecting path as a sequence of contacts choose the following denition, since it maintains the original
with non-decreasing times that connect sets of vertices. According topology of the network, while incorporating the available tempo-
to Nicosia et al. (2012), two vertices i and j are strongly connected if ral information.
there is a directed, time-respecting path connecting i to j and vice
versa. Vertices i and j are weakly connected if there are undirected Denition 1. A directed temporal graph G consists of a set V of
time-respecting paths from i to j and vice versa. In this case, the vertices and a set E of ordered pairs of vertices representing
directions of the contacts are not observed (Nicosia et al., 2012). interactions. An interaction ei in E is represented by a four-tuple
In a reachability graph, a directed edge exists between vertices i ei = (ui, vi, ti, di), in which ui is the source vertex, vi is the target
and j if there is a time-respecting path between them. The algo- vertex, ti is the initiation time of the interaction and di is the
rithm supplied by Moody (2002) to identify these structures re- duration of the interaction.
veals the vertices which are reachable from each other. Bearman
et al. (2004) analyze a reachability graph constructed from a dating Thus, time is encoded as an explicit part of the representation.
network of high-school students. Another interpretation of this In this setting, the terms edge, interaction and event may
type of graph is the associated inuence digraph of a time-stamped be understood to mean the same thing. Each interaction begins
graph, which encodes the ability of vertices to inuence, or reach, at a given time, and lasts for a given duration. This allows us to
each other (Cheng et al., 2003). Dynamic reachability sets have also generalize to many applications, such as in telephone contact net-
been introduced (Macropol and Singh, 2012). In this formulation, works (in which the durations of calls differ), or in networks of epi-
the reachability set of a node is found by traversing edges outward demics (in which a disease may render an individual infectious for
from the node, with each step incrementally later in time. The variable amounts of time).
edges in the set must occur within a given time interval. A path in this context is only meaningful if composed of edges
In the formulation of Zhao et al. (2010), the lifespan of a piece of whose activations follow each other in time. Thus, we introduce
information is dened as the time between the end of one commu- the notion of time-respecting edge pairs, which are the building
nication and the beginning of another. In another work, the relay blocks of larger time-respecting graph structures, such as paths
time of an edge is dened as the time taken for a newly infected and subgraphs. Fig. 1 illustrates the denition.
node to further spread the infection via the next interaction that
the link participates in Kivel et al. (2012). Thus, limiting the time
allowed between interactions is an important concept in a variety
of network types.
Correspondingly in this work, we require the time delay be-
tween contacts on a time-respecting path not to exceed a threshold
d. To see why, consider a shortest path between vertices i and j via
a vertex k. The centrality of the vertex k is vulnerable, in that it de-
pends on the interval between the communication from i to k and
Fig. 1. A pair of edges (ei, ej) which are time-respecting. Vertex ui initiates an
from k to j. The longer this interval, the higher the chance that the interaction at time ti. The interaction concludes after an amount of time di, at time
information intended for j will be disrupted (Tang et al., 2010). ti + di. After some time delay d, interaction ej begins at time tj. It can be seen that
Also, although it is possible that there is no causality relationship 0 6 tj (ti + di) 6 d. Thus, 0 6 tj ti di 6 d.
U. Redmond, P. Cunningham / Expert Systems with Applications 40 (2013) 37153721 3717
Denition 2. Let ei and ej be edges in a temporal graph. The edges From a reachability point of view, it can be seen in Fig. 3 that
are time-respecting if vi = uj and 0 6 tj ti di 6 d, for some from any vertex in the time-respecting subgraph, at least one of
threshold d. the vertices on the rightmost frontier can be reached via a time-
A time-delay threshold between interactions, d, is incorporated respecting path. The lengths of the paths that comprise a time-
to model a variety of real-world scenarios. For example, in an epi- respecting subgraph are not constrained by their overall duration
demic network, an individual is infectious for some time after the (Pan and Saramki, 2011), but rather by the duration of edge pairs
contact which resulted in their infection, during which time an- along each path, so that in theory a path may originate at the
other contact may spread the infection further. In a communica- graphs inception, and terminate at the last interaction. This avoids
tion network, a piece of information may be propagated further effectively time-slicing the graph, as in the case of the Dynamic
some time after it has been received. Various models for waiting Reachability Set (DRS) (Macropol and Singh, 2012). In the DRS set-
times between two consecutive interactions have been proposed, ting, a threshold D species the longest permitted duration of a
including exponential and power law (Barabasi, 2005). For the path in a set of vertices reachable from a given source vertex. In
purposes of this work, we specify a constant value. a DRS, consecutive events must occur in integer increments, with
To simplify our theoretical development, we set to zero the no further delay time allowed between the interactions. In our
value of d for each edge. This implies that the duration of each model, the encoding of the duration of an interaction and an in-
interaction is instantaneous. This is the correct interpretation for ter-interaction waiting time allows for more general application.
scenarios such as nancial transactions between individuals, Henceforth, for ease of exposition, we abbreviate time-respecting
which we will see more of later. path and time-respecting subgraph to path and subgraph,
respectively.
Denition 3. Let v and w be two vertices in a temporal graph. A
directed time-respecting path between v and w in the graph is a
nite alternating sequence v = v0, e1, v1, e2, . . . , en, vn = w of non- 3.1. Algorithm
repeating vertices and edges of the graph such that each pair of
adjacent edges is time-respecting. The search for subgraphs in a temporal network is composed of
A path composed of time-respecting edge pairs is illustrated in repeated applications of a breadth-rst-search (BFS) on the edges,
Fig. 2. Each edge is labeled with the day on which the interaction as outlined in Algorithm 1. The search starts with an edge not part
took place. This path describes a non-decreasing sequence of edge of a previously found subgraph, and expands from each edge e via
activations, as introduced in Pan and Saramki (2011). out-edges from the target vertex of e which obey the time-respect-
A motivating example of a time-respecting subgraph in a tem- ing property, and via in-edges to the same target which happened
poral graph is illustrated in Fig. 3. Each directed path in the sub- at the same time. We allow the expansion via in-edges since these
graph is time-respecting. Where the paths intersect at a vertex, will also be time-respecting when compared with out-edges found
any incoming edges occur before any outgoing edges. in that step. These in-edges also allow for correct merging within
the BFS, as illustrated in Fig. 4.
Denition 4. A time-respecting subgraph S = (V0 , E0 ) of a temporal After all subgraphs have been found from repeated applications
graph G = (V, E) is composed of a vertex set V0 # V, and a set of of BFS, a nal merge step is enacted. This is required since each BFS
interactions E0 # E such that every edge pair (ei, ej) in which vi = uj begins with a single edge, while a subgraph may be initiated by an
is time-respecting. individual interacting with many other individuals within a short
space of time. Hence, any nodes on the node-frontier of the sub-
graphs that match, and whose out-edges occur within time d of
each other, will lead to the merging of the subgraphs they inhabit.
The remaining subgraphs are maximal (see Fig. 5).
A BFS may incorporate edges which have already been included
Fig. 2. A time-respecting path, each edge with time labels, and d = 5. in another subgraph, with which a merge is not permitted. Allow-
ing an interaction to reside in multiple internally consistent sub-
graphs is important, since the individual initiating such an
interaction may have been inuenced by an interaction in any
one of those subgraphs. Fig. 6 illustrates this situation. These two
subgraphs should not be merged, since the in-edge at time 3 hap-
pens after the out-edge at time 2. The correct behaviour emerges
since a BFS from the edge at time 3 will not merge with the rst
Fig. 3. A time-respecting subgraph, each edge with time labels, and d = 5. subgraph found, since the edge encountered (at time 4) is not on
the edge-frontier of that subgraph.
Fig. 4. The merge step, with d = 2. A BFS starts at ea, and nds eb via the out-edges from va. From eb, ec is found via the in-edges to vb. The next iteration of BFS starts at ed, and
nds ec via the out-edges from vd. Since ec is on the edge-frontier of the rst subgraph, the two subgraphs are merged.
3718 U. Redmond, P. Cunningham / Expert Systems with Applications 40 (2013) 37153721
Algorithm 1. Find maximal time-respecting subgraph (G, e, d) 4.1. The Prosper Marketplace
function bfs_augmented (G, e, d) The Prosper Marketplace (henceforth Prosper) opened to the
q e, s e public in February 1996. It closed for regulatory reasons in
while q is non-empty do November 2008 but relaunched in July 2009. The Prosper website
t q.dequeue() provides a nightly snapshot of all data pertaining to listings, bids,
adj_edges get_out_edges (G, t, d) + get_in_edges (G, t) users, groups and loans, in order to facilitate the statistical analysis
for all o 2 adj_edges, o R s then of the system. As of September 2011, there were 8 916 105 bids on
if o 2 edge-frontier of some subgraph r then 401 180 listings between 1 207 418 members. Of those listings,
s merge(s, r) 43 576 were accepted as loans.
else A member of the Prosper system is a registered user, who may
q o, s o have roles including that of borrower, lender, group leader or tra-
end if der. A borrower creates a listing in order to solicit bids. If enough
end for bids are received to reach the amount requested, the listing be-
end while comes a loan after the listing period ends. A lender creates a bid,
return s specifying an amount and a minimum rate required, should the
end function bid win the auction and the listing become a loan. The possible sta-
function get_out_edges (G, ei, d) tuses of a loan include current, late, paid, defaulted upon and can-
out_edges ;, vi ei.target() celled. For a further discussion on the institutional background of
for all ej 2 G.out_edges(vi) do social lending on Prosper, the interested reader is referred to Lin
if 0 6 tj ti di 6 d then et al. (2009).
out_edges ej
end if 4.2. Network data
end for
return out_edges A network composed of Prosper members and their loans may
end function be constructed. The vertex set consists of members who received
function get_in_edges (G, ei) or contributed to loans. The edge set captures the ow of funds
in_edges ;, vi ei.target() from members who acted as lenders to those who acted as borrow-
for all ej 2 G.in_edges(vi) do ers for the purpose of the transaction. The edge is time-stamped
if 0 6 tj ti di 6 0 then with the origination date of the loan, which marks the time when
in_edges ej the borrower received funds and amortization began. For further
end if analysis, also included in the edge data is the status, amount and
end for grade or rating of the loan, along with the lender rate and borrower
return in_edges rate of interest. The network takes the form of a directed graph
end function with parallel edges, but no self-loops.
The data we analyze comes from transactions occurring from
November 2005 to September 2011. Since the Prosper system
was temporarily closed for regulatory reasons, we have split the
4. Methods data set into a pre- and post-closure network. Before the closure
there were 1 995 399 edges and 72 334 vertices. Afterwards, there
This section introduces the data set which we examined using were 1 399 580 edges and 32 191 vertices.
the methods described in Section 3. We also outline the implemen- In order to present a meaningful comparison, without seasonal
tation framework under which the results were gathered. variation, between behaviour before and after the temporary clo-
sure, we selected a calendar year of activity from each time frame
from which to begin the algorithm. The algorithm explored the
network via BFS until termination, which may have occurred at
any date within the overall time frame. This allows each time-
respecting subgraph to represent an entire process, without being
articially restricted to a given time window. We compare 2007
(169 162 edges) with 2010 (91 435 edges), so as to avoid the initial
periods in which the user bases were not yet rmly established.
4.3. Implementation
Fig. 5. The nal merging step, with d = 2. Given that ua and ub are the same node,
and that their out-edges occur within time d of each other, their subgraphs are
merged. To facilitate our later analysis of arbitrage strategies in social
lending, we select a threshold for the time-delay between interac-
tions which is meaningful in this context. Since the rst repayment
on a loan is due one month after the origination date, we require
that reinvestment occurs before this month has elapsed. This
makes it more likely that the member will cover the cost of bor-
rowing with the money earned on investment. Since transactions
occur instantaneously, our value for d is set to zero.
Fig. 6. A temporal graph, in which two subgraphs are found at d = 3. The edge at
The Python programming language (Python Software Founda-
time 4 is incorporated into both, since it is time-respecting when compared with tion, 2012), which provides the NetworkX Developers (2010) and
both the edge at time 1 and that at time 3. matplotlib (Hunter et al., 2011) libraries, was used to generate
U. Redmond, P. Cunningham / Expert Systems with Applications 40 (2013) 37153721 3719
5. Results
Given that many loans which originated in 2010 are still cur- Table 1
rent, the strategies analyzed here are from before the closure. In This table shows a winning strategy from before the closure, in which the arbitrageur
gained $218.40, given a return on investment of $1237.27 and a borrowing cost of
$1018.87.
References
Barabasi, A. (2005). The origin of bursts and heavy tails in human dynamics. Nature,
435, 207211.
Bearman, P., Moody, J., & Stovel, K. (2004). Chains of affection: The structure of
adolescent romantic and sexual networks. American Journal of Sociology, 110,
4491.
Cheng, E., Grossman, J. W., & Lipman, M. J. (2003). Time-stamped graphs and their
associated inuence digraphs. Discrete Applied Mathematics, 128, 317335.
Gephi. (2012). <Gephi.gephi.org>.
Greene, D., Doyle, D., & Cunningham, P. (2010). Tracking the evolution of
communities in dynamic social networks. In N. Memon & R. Alhajj (Eds.),
International conference on advances in social networks analysis and mining
Fig. 12. The time-respecting ego-centric subgraph of the losing strategy presented (ASONAM 2010) (pp. 176183). IEEE Computer Society.
in Table 2. The day on which each loan originated is displayed on each edge. Holme, P., & Saramki, J. (2011). Temporal networks. CoRR, abs/1108.1780.
Hunter, J., Dale D., & Droettboom, M. (2011). Matplotlib. <http://
or for which payment is so late that any future repayment is as- www.matplotlib.sourceforge.net>.
Kempe, D., Kleinberg, J., & Kumar, A. (2002). Connectivity and inference problems
sumed non-forthcoming.) This is not an unexpected outcome, since for temporal networks. Journal of Computer and System Sciences, 76.
the arbitrageur has chosen borrowers with very high interest rates, Kivel, M., Pan, R. K., Kaski, K., Kertsz, J., Saramki, J., & Karsai, M. (2012). Multiscale
caused by their very low credit grade. Fig. 12 illustrates the part of analysis of spreading in a large communication network. Journal of Statistical
Mechanics: Theory and Experiment, 3, P03005.
a time-respecting subgraph which represents this arbitrage
Krings, G., Karsai, M., Bernharsson, S., Blondel, V., & Saramki, J. (2012). Effects of
attempt. time window size and placement effects of time window size and placement on
the structure of aggregated networks. EPJ Data Science, 1.
Lerner, J., Brandes, U. & Nick, B. Network Effects on Interest Rates in Online Social
5.5. Reasons for the decline in arbitrage attempts Lending. (2011). In Proceedings from Sunbelt XXXI.
Lijfjt, J., Papapetrou, P., & Puolamki, K. (2012). Size matters: Finding the most
informative set of window lengths. In ECML/PKDD.
In the early stages of peer-to-peer lending, people were excited Lin, M., Prabhala, N. R., & Viswanathan, S. (2009). Judging Borrowers by the company
about the potential for earning big returns through arbitrage. As it they keep: Social networks and adverse selection in online Peer-to-Peer lending.
turned out, a signicant amount of effort was involved for a rela- SSRN eLibrary.
Macropol, K., & Singh, A. (2012). Reachability analysis and modeling of dynamic
tively small return. Success was not likely, since borrowers could event networks. In ECML/PKDD (Vol. 1).
default at will. This fact is illustrated in Fig. 11. By the time Prosper Moody, J. (2002). The importance of relationship timing for diffusion. Social Forces,
was relaunched, it had already been established that social lending 81, 2556.
NetworkX Developers. (2010). NetworkX.
arbitrage was not lucrative, so members partook less. This is also Nicosia, V., Tang, J., Musolesi, M., Russo, G., Mascolo, C., & Latora, V. (2012).
reected in the decline in the sizes of subgraphs, whose backbone Components in time-varying graphs. CoRR, abs/1106.2134.
is composed of members who may be attempting arbitrage. Pan, R. K., & Saramki, J. (2011). Path lengths, correlations, and centrality in
temporal networks. CoRR, abs/1101.5913v2.
Prosper Marketplace. (2012). Prosper.
Python Software Foundation. (2012). Python.
6. Conclusions Redmond, U., Harrigan, M., & Cunningham, P. (2012). Mining dense structures to
uncover anomalous behaviour in nancial network data. Modeling and Mining
This paper has developed the idea of examining a network with Ubiquitous Social Media, 6076.
Tang, J., Musolesi, M., Mascolo, C., Latora, V., & Nidosia, V. (2010). Analysing
temporal information as an explicit property of the edges. This ap- information ows and key mediators through temporal centrality metrics. In
proach maintains the original network topology, unlike other Proceedings of the third workshop on social networks systems (Vol. 3).
methodologies which slice the network in various manners. Zhao, Q., Tian, Y., He, Q., Oliver, N., Jin, R., & Lee, W. C. (2010). Communication
motifs: A tool to characterize social communications. In Proceedings of the 19th
In order to explore sections of the network which facilitate ow, ACM international conference on information and knowledge management (p.
we dened a time-respecting subgraph, composed of sequences of 1645).
consecutive edge activations. We presented an algorithm to iden-