The inhomogeneous evolution of subgraphs and cycles in complex networks

Subgraphs and cycles are often used to characterize the local properties of complex networks. Here we show that the subgraph structure of real networks is highly time dependent: as the network grows, the density of some subgraphs remains unchanged, while the density of others increase at a rate that is determined by the network's degree distribution and clustering properties. This inhomogeneous evolution process, supported by direct measurements on several real networks, leads to systematic shifts in the overall subgraph spectrum and to an inevitable overrepresentation of some subgraphs and cycles.

Subgraphs and cycles are often used to characterize the local properties of complex networks. Here we show that the subgraph structure of real networks is highly time dependent: as the network grows, the density of some subgraphs remains unchanged, while the density of others increase at a rate that is determined by the network's degree distribution and clustering properties. This inhomogeneous evolution process, supported by direct measurements on several real networks, leads to systematic shifts in the overall subgraph spectrum and to an inevitable overrepresentation of some subgraphs and cycles. Subgraphs, representing a subset of connected vertices in a graph, provide important information about the structure of many real networks. For example, in cellular regulatory networks feed-forward loops play a key role in processing regulatory information [1], while in protein interaction networks highly connected subgraphs represent evolutionary conserved groups of proteins [2]. In a similar vain, cycles, a special class of subgraphs, offer evidence for autonomous behavior in ecosystems [3], cyclical exchanges give stability to social structures [4], and cycles contribute to reader orientation in hypertext [5]. Finally, understanding the nature and frequency of cycles is important for uncovering the equilibrium properties of various network models [6].
Motivated by these practical and theoretical questions, recently a series of statistical tools have been introduced to evaluate the abundance of subgraphs [1,2,7] and cycles [8,9,10,11], offering a better description of a network's local organization. Yet, most of these methods were designed to capture the subgraph structure of a specific snapshot of a network, characterizing static graphs. Most real networks, however, are the result of a growth process, and continue to evolve in time [12]. While growth often leaves some of the network's global features unchanged, it does alter its local, subgraph based structure, potentially modifying everything from subgraph densities to cycle abundance. Yet, the currently available statistical methods cannot anticipate or describe such potential changes.
In this paper we show that during growth the subgraph structure of complex networks undergoes a systematic reorganization. We find that the evolution of the relative subgraph and cycle abundance can be predicted from the degree distribution P (k) and the degree dependent average clustering coefficient C(k). The results indicate that the subgraph composition of complex networks changes in a very inhomogeneous manner: while the density of many subgraphs is independent of the network size, they coexist with a class of subgraphs whose density increases at a subgraph dependent rate as the network expands. Therefore in the thermodynamic limit a few subgraphs will be highly overrepresented [1], a prediction that is supported by direct measurements on a number of real networks for which time resolved network topologies are available. This finding questions our ability to characterize networks based on the subgraph abundance obtained from a single topological snapshot. We show that a combined understanding of network evolution and subgraph abundance offers a more complete picture.

Subgraphs:
We consider subgraphs with n vertices and n − 1 + t edges, whose central vertex has links to n − 1 neighbors, which in turn have t links among themselves (Fig. 1a). The total number of n-node subgraphs that can pass by a node with degree k is k n−1 . Each of these n-node subgraphs can have at most n p = (n−1)(n−2)/2 edges between the n − 1 neighbors of the central node. The probability that there is an edge between two neighbors of a degree k vertex is given by the clustering coefficient C(k). Therefore, the probability to obtain t connected pairs and n p − t disconnected pairs is given by the binomial distribution of n p trials with probability C(k). The expected number of (n, t) subgraphs in the network is obtained after averaging over the degree distribution, resulting in (1) where k max is the maximum degree and the geometric factor g nt takes into account that the same subgraph can have more than one central vertex. For instance, a triangle will be counted three times since each vertex is connected to the others, therefore g 31 = 1/3. For networks where P (k) ∼ k −γ and C(k) ∼ k −α , where γ and α are the degree distribution and clustering hierarchy exponents, in the thermodynamic limit k max → ∞ Eq. (1) predicts the existence of two subgraph classes [7] Therefore, for the Type I subgraphs the N nt /N density increases with increasing network size, and N nt /N is independent of N for Type II subgraphs. In the following we provide direct evidence for the two subgraph types in several real networks for which varying network sizes are available: co-authorship network of mathematical publications [13], the autonomous system representation of the Internet [14,15], and the semantic web of English synonyms [16]. In each of these networks the maximum degree increases as k max ∼ N δ . We estimated δ from the scaling of the degree distribution moments with the graph size, k n ∼ N δ(n+1−γ) , with n = 2, 3, 4. Furthermore, we find that C 0 from C(k) = C 0 k −α also depends on the network size as C 0 ∼ N θ , where θ can be estimated using C 0 = k≥2 C(k)/ k≥2 k −α , giving a better estimate than a direct fit of C(k). The exponents characterizing each network are summarized in Table. I. In Fig. 2 we show the density of all five vertex subgraphs (n = 5) as a function of t. For the Internet and Language networks C 0 increases with N , therefore the subgraph's density increases with the network size for all subgraphs. This consequence of the non-stationarity of the clustering coefficient is subtrated by normalizing N nt by C t 0 . For the co-authorship graph with α = 0 (Table I), only Type I subgraphs are observed, as predicted by (2). In contrast, for the Internet and semantic networks α > 0, therefore the overrepresented Type I phase is expected to end approximately at the phase boundary predicted by (2). Indeed, left to the arrow denoting the n − γ − αt phase boundary we continue to observe a systematic increase in N 5t /N C t 0 , as expected for Type I subgraphs. In contrast, beyond the phase boundary the subgraph densities obtained for different network sizes are independent of N , collapsing into a single curve.
We compared our predictions with direct counts in a growing deterministic network model [17] as well, characterized by a degree exponent γ = 1 + ln 3/ ln 2 ≈ 2.6 and a degree dependent clustering coefficient C(k) = C 0 k −α , with C 0 = 2 and α = 1. In Fig. 2d we show the number of (n = 5,t) subgraphs for different values of t and graph sizes. The arrow indicating the predicted phase transition point n−γ−αt = 0 clearly separates the Type I from the Type II subgraphs, a numerical finding that is supported by exact calculations as well. Note that only one Type II n = 5 subgraph is present in the deterministic network, due to its particular evolution rule.
Cycles: The formalism developed above can be generalized to predict cycle abundance as well. Consider the set of centrally connected cycles shown in Fig. 1b. If the central vertex has degree k, we can form k h−1 different groups of h vertices, h − 1 selected from its k neighbors and the central vertex. Each ordering of the h−1 selected neighbors corresponds to a different cycle, therefore we multiply with half of the number of their permutations (h − 1)! (assuming that 123 is the same as 321). Finally, to obtain the number of h-cycles we multiply the result with the probability of having h − 2 edges between consecutive neighbors, C(k) h−2 , and sum over the degree distribution P (k), finding where g h is again a geometric factor correcting multiple counting of the same cycle. Note that (3) represents a lower bound for the total number of h-cycles, which also include cycles without a central vertex. Depending on the values of h, γ and α the sum in (3) may converge or diverge in the limit k max → ∞. When it converges, the density of h-cycles is independent of N (Type II), otherwise it grows with N (Type I). Since in preferential attachment models without clustering the density of hcycles decreases with increasing N [18], we conclude that clustering is the essential feature that gives rise to the observed high h-cycle number in such real networks like the Internet [8]. To further characterize the cycle spectrum, we need distinguish two different cases, 0 < α < 1 and α ≥ 1. 0 < α < 1: In the k max → ∞ limit the cycle density follows where h c = (γ −2α)/(1−α). Therefore, large cycles (h > h c ) are abundant, their density growing with the network size N . As α → 1 the threshold h c → ∞, therefore the range of h for which the density is size independent expands significantly. Direct calculations using (3) show that N h exhibits a maximum at some intermediate value of h (see Fig. 3a, already reported for the deterministic model [10]. The maximum represents a finite size effect, as the characteristic cycle length h * , corresponding to the maximum of N h , scales as h * ∼ k max (Fig. 3b). Yet, next we show that this behavior is not generic, but depends on the value of α. α ≥ 1: For all γ > 2 only Type II subgraphs are expected (N h /N ∼ C h−2 0 ), as suggested by the divergence of h c in the α → 1 limit. If C 0 > 1 the number of h-cycles continues to exhibit a maximum and the characteristic cycle length h * scales as h * ∼ k max . If C 0 < 1, however, the number of h-cycles decrease with h, although a small local minima is seen for small cycles. More important, in this case N h /N is independent of the network size (see Fig. 3c), in contrast with the size dependence observed earlier ( Fig. 3a and [10]). Thus, for networks with α > 1 or α = 1 and C 0 < 1 the cycle spectrum is stationary, independent of the stage of the growth process in which we inspect the network.
Our predictions for the cycle abundance are based on centrally connected cycles, in which a central vertex is connected to all vertices of the cycle (Fig. 1b). In the following we show that our predictions capture the scaling of all h-cycles as well, not only those that are centrally connected. For this in Fig. 4 we plot the number of h = 3, 4, 5 cycles (i.e. all cycles as well as those that are centrally connected) as a function of the graph size for the studied real and model networks, together with our predictions (continuous line). First we note that in many cases (h = 3 and 4) the full cycle density and the density of the centrally connected cycles overlap. In the few cases (h = 5) where there are systematic differences between the two densities the N -dependence of the two quantities is the same, indicating that our calculations correctly predict the scaling of all cycles. For the co-authorship and Internet graphs α < 1 and h c < 3, therefore the h = 3, 4, 5 cycles are predicted to be in the Type I regime (h > h c ). In this case For the language graph α = 1, therefore ζ h = θ(h − 2). For the deterministic model a direct count of the h-cycles reveals that they are of Type II, i.e. their density is independent of N [10], in agreement with our predictions for α ≥ 1. These predictions are shown as continuous lines in Fig. 4, indicating a good agreement with the real measurements.
Our results offer evidence of a quite complex subgraph dynamics. As the network grows, the density of the Type II subgraphs remains unchanged, being independent of the system size. In contrast, the density of the Type I subgraphs increases in an inhomogeneous fashion. Indeed, each (n,t) subgraph has its own growth exponent ζ nt , which means that their density increases in a differentiated manner: the density of some Type I subgraphs will grow faster than the density of the other Type I subgraphs. Thus, inspecting the system at several time intervals one expects significant shifts in subgraphs densities. As a group, with increasing network size the Type I graphs will significantly outnumber the constant density Type II graphs. Therefore the inspection of the graph density at a given moment will offer us valuable, but limited information about the overall local structure of a complex network. However, P (k) and the C(k) functions allow us to predict with high precision the future shifts in subgraph densities, indicating that a precise knowledge of the global network characteristics is needed to fully understand the local structure of the network at any moment. These results will eventually force us reevaluate a number of concepts, ranging from the potential characterization of complex networks based on their subgraph spectrum to our understanding of the impact of subgraphs on processes taking place on complex networks [19,20].