• Distributed k-core Decomposition and Maintenance in Large Dynamic Graphs
    • Sabeur Aridhi, Martin Brugnara, Alberto Montresor and Yannis Velegrakis
  • Datasets   |   Results   |   Downloads

    • Datasets:
    • In order to evaluate and test the effectiveness of our approach we performed an extensive set of experiments on a number of different real and synthetic datasets. Datasets
      Dataset Type Number of nodes (N) Number of edges (M) Diameter Average clustering coefficient Max(k)
      DS1Synthetic10,00070,62240.397733
      DS2Synthetic20,000144,74140.393538
      DS3Synthetic50,000365,88340.392942
      DS4Synthetic100,000734,41640.390846
      ego-FacebookReal4,03988,23480.6055115
      email-EnronReal36,692183,831110.497043
      roadNet-TXReal1,379,9171,921,6601,0540.04703
      roadNet-CAReal1,965,2062,766,6078490.04643
      com-LiveJournalReal3,997,96234,681,189170.2843296
      soc-LiveJournal1Real4,847,57168,993,773160.2742318
      Real datasets are provided by Stanford University (SNAP).
      Synthetic datasets were created using the graph data generator proposed by [3] [4].
    • Results:

      • Experiments setup

      We have implemented our approach on top of the akka framework, a toolkit and runtime for building highly concurrent, distributed, resilient message-driven applications. In order to evaluate the performance of our approach, we used 17 m3.medium instances on Amazon EC2. Each m3.medium instance contained 1 virtual 64-bit CPU, 3.75 GB of main memory a 4 GB of local instance storage.

      The HBase-based solution [1] was tested using 9 m3.medium instances on Amazon EC2: 1 acting as hmaster, namenode and zookeeper node, 8 as datanode and hregion.

      The approach of Li et al.’s [2] was tested on a machine equipped with two Intel(R) Xeon(R) E5-2440 CPUs (2.40GHz) and 192 GB of memory.


      • Speedup
       
      DatasetNumber of cut edgesAverage Insertion Time (ms)Average Deletion Time (ms)
      inter-partitionintra-partitionHBase-based approach*Sequential approachinter-partitionintra-partitionHBase-based approach*Sequential approach
      DS161,803 (87.51%)2768519204730,45
      DS2126,720 (87.54%)3916199482791660,69
      DS3320,318 (87.54%)4210846383284611,42
      DS4643,189 (87.57%)3010252791325810671,9
      ego-Facebook77,253 (87.55%)38152713210100,17
      email-Enron161,055 (87.61%)32810821286110,29
      roadNet-TX1,681,830 (87.51%)2891929201257114605
      roadNet-CA2,420,674 (87.49%)3012136129702610116680
      com-LiveJournal30,348,426 (87.50%)2563012826320527995
      soc-LiveJournal159,916,050 (86.84%)5792735377499258201
      *The presented runtime values correspond to the execution of the HBase-based approach with a fixed k value (k=max(k)).
      Insertion\Deletion
    • Downloads:
    • In this section you can download our Java implementations of the solutions tested in our experimental study.

      The implementation of our proposed Akka-based approach is available here
      The implementation of the HBase-based [1] approach is available here
      The implementation of the sequential approach [2] is available here.


      We also provide a tutorial for HBase installation and configuration.

    • References:
    • [1] Aksu H., Canim M., Yuan-Chi Chang, Korpeoglu I., Ulusoy O. Distributed k-Core View Materialization and Maintenance for Large Dynamic Graphs. Knowledge and Data Engineering, IEEE Transactions on , vol.26, no.10, pp.2439-2452, Oct. 2014
      [2] Rong-Hua Li and Jeffrey Xu Yu and Rui Mao. Efficient Core Maintenance in Large Dynamic Graphs. IEEE Trans. Knowl. Data Eng. 26(10):2453–2465, 2014.
      [3] Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, and Ben Y. Zhao. Measurement-calibrated Graph Models for Social Network Experiments. Proceedings of The 19th World Wide Web Conference (WWW 2010). ACM, New York, NY, USA, 861-870.
      [4] Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing Graphs using Differentially Private Graph Models. In Proceedings of The 11th ACM SIGCOMM Internet Measurement Conference (IMC 2011). ACM, New York, NY, USA, 81-98.