Dataset | Type | Number of nodes (N) | Number of edges (M) | Diameter | Average clustering coefficient | Max(k) |
---|---|---|---|---|---|---|
DS1 | Synthetic | 10,000 | 70,622 | 4 | 0.3977 | 33 |
DS2 | Synthetic | 20,000 | 144,741 | 4 | 0.3935 | 38 |
DS3 | Synthetic | 50,000 | 365,883 | 4 | 0.3929 | 42 |
DS4 | Synthetic | 100,000 | 734,416 | 4 | 0.3908 | 46 |
ego-Facebook | Real | 4,039 | 88,234 | 8 | 0.6055 | 115 |
email-Enron | Real | 36,692 | 183,831 | 11 | 0.4970 | 43 |
roadNet-TX | Real | 1,379,917 | 1,921,660 | 1,054 | 0.0470 | 3 |
roadNet-CA | Real | 1,965,206 | 2,766,607 | 849 | 0.0464 | 3 |
com-LiveJournal | Real | 3,997,962 | 34,681,189 | 17 | 0.2843 | 296 |
soc-LiveJournal1 | Real | 4,847,571 | 68,993,773 | 16 | 0.2742 | 318 |
Real datasets are provided by Stanford University (SNAP). Synthetic datasets were created using the graph data generator proposed by [3] [4]. |
||||||
We have implemented our approach on top of the akka framework, a toolkit and runtime for building highly concurrent, distributed, resilient message-driven applications. In order to evaluate the performance of our approach, we used 17 m3.medium instances on Amazon EC2. Each m3.medium instance contained 1 virtual 64-bit CPU, 3.75 GB of main memory a 4 GB of local instance storage.
The HBase-based solution [1] was tested using 9 m3.medium instances on Amazon EC2: 1 acting as hmaster, namenode and zookeeper node, 8 as datanode and hregion.
The approach of Li et al.’s [2] was tested on a machine equipped with two Intel(R) Xeon(R) E5-2440 CPUs (2.40GHz) and 192 GB of memory.
Dataset | Number of cut edges | Average Insertion Time (ms) | Average Deletion Time (ms) | ||||||
---|---|---|---|---|---|---|---|---|---|
inter-partition | intra-partition | HBase-based approach* | Sequential approach | inter-partition | intra-partition | HBase-based approach* | Sequential approach | ||
DS1 | 61,803 (87.51%) | 27 | 6 | 851 | 9 | 20 | 4 | 73 | 0,45 |
DS2 | 126,720 (87.54%) | 39 | 16 | 1994 | 8 | 27 | 9 | 166 | 0,69 |
DS3 | 320,318 (87.54%) | 42 | 10 | 8463 | 8 | 32 | 8 | 461 | 1,42 |
DS4 | 643,189 (87.57%) | 30 | 10 | 25279 | 13 | 25 | 8 | 1067 | 1,9 |
ego-Facebook | 77,253 (87.55%) | 38 | 15 | 27 | 1 | 32 | 10 | 10 | 0,17 |
email-Enron | 161,055 (87.61%) | 32 | 8 | 1082 | 1 | 28 | 6 | 11 | 0,29 |
roadNet-TX | 1,681,830 (87.51%) | 28 | 9 | 192 | 9201 | 25 | 7 | 11 | 4605 |
roadNet-CA | 2,420,674 (87.49%) | 30 | 12 | 136 | 12970 | 26 | 10 | 11 | 6680 |
com-LiveJournal | 30,348,426 (87.50%) | 256 | 30 | 128 | 263 | 205 | 27 | 9 | 95 |
soc-LiveJournal1 | 59,916,050 (86.84%) | 579 | 27 | 35 | 377 | 499 | 25 | 8 | 201 |
*The presented runtime values correspond to the execution of the HBase-based approach with a fixed k value (k=max(k)). |
In this section you can download our Java implementations of the solutions tested in our experimental study.
The implementation of our proposed Akka-based approach is available here
The implementation of the HBase-based [1] approach is available here
The implementation of the sequential approach [2] is available here.
We also provide a tutorial for HBase installation and configuration.