Skip to main content
Log in

GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The continual increase in the amount of generated data by social media, IoT devices, and monitoring systems have motivated the use of Distributed Data Stream Processing (DSP) systems to harness data in a real-time manner. The scheduling of processing tasks in DSP systems across the machines in a cluster or cloud environment is an NP-Hard problem. Different scheduling schemes have been proposed to address the scheduling problem, but most fail to take into account the runtime adaptation and workload changes after initial scheduling. In this paper, we propose a new scheduler (GT-Scheduler) that leverages a heuristic and rule-based algorithm to schedule tasks at near-optimal performance alongside using a meta-heuristic algorithm to make runtime adaptation. Firstly, K-way graph partitioning divides the tasks in an application graph according to the communication patterns. It places tasks with the highest amount of communication near each other to limit an increase in the topology response time. Secondly, instead of assigning tasks to the worker nodes, a partition of tasks is assigned to the nodes by adopting a greedy strategy. If the capacity of nodes is insufficient to host a specific partition of tasks, this partition is iteratively divided by the k-partitioning algorithm to assign to a proper node. The idea of runtime adaptation lies in detecting overutilized worker nodes and reassigning their tasks by exploiting a Tabu-Search and a new scoring strategy to find the best solution in a way that no worker node is overutilized. GT-Scheduler is implemented on the standard Apache Storm and using the standard benchmarks, it is shown that GT-Scheduler outperforms the R-Storm and the Online-Scheduler by at least 35% in reducing the topology response time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

  1. http://www.gutenberg.org/ebooks/11mhfh.

  2. https://www.kaggle.com/kazanova/sentiment140.

References

  1. Hiessl, T., Karagiannis, V., Hochreiner, C., Schulte, S., Nardelli, M.: Optimal placement of stream processing operators in the fog. IEEE 3rd international conference on fog and edge computing (ICFEC), 1–10 May 2019

  2. Hadian, H., Farrokh, M., Sharifi, M., Jafari, A.: An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clusters. J. Supercomput. 79, 1–38 (2022)

    Google Scholar 

  3. Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data. 5(1), 46–59 (2017)

    Article  Google Scholar 

  4. De Assuncao, M.D., da Silva Veith, A., Buyya, R.: Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103, 1–17 (2018)

    Article  Google Scholar 

  5. Moulik, S., Devaraj, R., Sarkar, A. COST: A cluster-oriented scheduling technique for heterogeneous multi-cores. IEEE international conference on systems, man, and, cybernetics: (SMC), 1951–1957 October 2018 

  6. Moulik, S., Das, Z., Devaraj, R., Chakraborty, S.: SEAMERS: a semi-partitioned energy-aware scheduler for heterogeneous multicore real-time systems. J. Syst. Architect. 114, 101953 (2021)

    Article  Google Scholar 

  7. Sharma, Y., Chakraborty, S., Moulik, S.: ETA-HP: an energy and temperature-aware real-time scheduler for heterogeneous platforms. J. Supercomputing 78(8), 1–25 (2022)

    Article  Google Scholar 

  8. Sharma, Y., Moulik, S.: CETAS: a cluster based energy and temperature efficient real-time scheduler for heterogeneous platforms. In proceedings of the 37th ACM/SIGAPP symposium on applied computing, 501–509 April  2022 

  9. Liu, X., Buyya, R.: D-Storm: Dynamic resource-efficient scheduling of stream processing applications. IEEE 23rd international conference on parallel and distributed systems (ICPADS), 485–492 December 2017

  10. Cardellini, V., Grassi, V., Presti, F.L., Nardelli, M.: Optimal operator replication and placement for distributed stream processing systems. ACM SIGMETRICS Perform. Eval. Rev. 44(4), 11–22 (2017)

    Article  Google Scholar 

  11. Afzal, S., Tashtarian, F., Hadian, H., Erfanian, A., Timmerer, C., Prodan, R.: OTEC: an optimized transcoding task scheduler for cloud and fog environments. In proceedings of the 2nd international workshop on design, deployment, and evaluation of network-assisted video streaming, 21–26 December 2022

  12. Farrokh, M., Hadian, H., Sharifi, M., Jafari, A.: SP-ant: An ant colony optimization-based operator scheduler for high performance distributed stream processing on heterogeneous clusters. Expert Syst. Appl. 191, 116322 (2022)

    Article  Google Scholar 

  13. Nardelli, M., Cardellini, V., Grassi, V., Presti, F.L.: Efficient operator placement for distributed data stream processing applications. IEEE Trans. Parallel. Distrib. Syst. 30(8), 1753–1767 (2019)

    Article  Google Scholar 

  14. Moulik, S., Das, Z., Saikia, G.: CEAT: a cluster based energy aware scheduler for real-time heterogeneous systems. IEEE international conference on systems, man, and cybernetics (SMC), 1815–1821 October 2020 

  15. Moulik, S., Chaudhary, R., Das, Z., Sarkar, A.: EA-HRT: An energy-aware scheduler for heterogeneous real-time systems. Asia and south pacific design automation IEEE conference (ASP-DAC), 500–505 January 2020 

  16. Sharma, Y., Moulik, S.: FATS-2TC: A Fault tolerant real-time scheduler for energy and temperature aware heterogeneous platforms with two types of cores. Microprocess. Microsyst. 96, 104744 (2023)

    Article  Google Scholar 

  17. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., et al.: Storm@ twitter. In proceedings of the ACM SIGMOD international conference on management of data, 147–156 2014

  18. Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter heron: stream processing at scale. In proceedings of the ACM SIGMOD international conference on management of data, 239–250 May 2015

  19. Flink.apache.org:. Apache Flink: Stateful computations over data streams. http://flink.apache.org/  (2021). Accessed 19 Aug 2021

  20. Setayesh, A., Hadian, H., Prodan, R.: An efficient online prediction of host workloads using pruned GRU neural nets. arXiv preprint arXiv:2303.16601 (2023)

  21. Cardellini, V., Nardelli, M., Luzi, D.: Elastic stateful stream processing in storm. In international conference on high performance computing & simulation (HPCS), 583–590 July 2016

  22. Farahabady, M.R.H., Samani, H.R.D., Wang, Y., Zomaya, A.Y., Tari, Z.: A QOS-aware controller for apache storm. In IEEE 15th international symposium on network computing and applications (NCA), 334–342 October 2016

  23. Eskandari, L., Huang, Z., Eyers, D.: P-Scheduler: Adaptive hierarchical scheduling in Apache Storm. In Proceedings of the Australasian Computer Science Week Multiconference, 1–10 February 2016

  24. Zookeeper.apache.org:. Apache zookeeper.  https://zookeeper.apache.org/ (2021). Accessed 19 August 2021

  25. Bilal, M., Canini, M.: Towards automatic parameter tuning of stream processing systems. In proceedings of the symposium on cloud computing, 189–200 September 2017

  26. Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In proceedings of the 7th ACM international conference on distributed event-based systems, 207–218 June 2013

  27. Liu, S., Weng, J., Wang, J.H., An, C., Zhou, Y., Wang, J.: An adaptive online scheme for scheduling and resource enforcement in storm. IEEE/ACM Trans. Netw. 27(4), 1373–1386 (2019)

    Article  Google Scholar 

  28. Khandekar, R., Hildrum, K., Parekh, S., Rajan, D., Wolf, J., Wu, K.L., and, Gedik, B.: COLA: Optimizing stream processing applications via graph partitioning. In middleware: ACM/IFIP/USENIX, 10th international middleware conference, Urbana, IL, USA. proceedings 10,  308–327 2009

  29. Fischer, L., Bernstein, A.: Workload scheduling in distributed stream processors using graph partitioning. IEEE international conference on big data (big data), 124–133 October 2015

  30. Ghaderi, J., Shakkottai, S., Srikant, R.: Scheduling storms and streams in the cloud. In proceedings of the 2015 ACM SIGMETRICS international conference on measurement and modeling of computer systems, 439–440 June 2015

  31. Eskandari, L., Mair, J., Huang, Z., Eyers, D.: T3-Scheduler: a topology and traffic aware two-level scheduler for stream processing systems in a heterogeneous cluster. Future Gener. Comput. Syst. 89, 617–632 (2018)

    Article  Google Scholar 

  32. Eskandari, L., Mair, J., Huang, Z., Eyers, D.: I-Scheduler: iterative scheduling for distributed stream processing systems. Future Gener. Comput. Syst. 117(219), 233 (2020)

    Google Scholar 

  33. Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. IEEE 34th international conference on distributed computing systems, 535–544 June 2014

  34. Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: resource-aware scheduling in storm, In proceedings of the 16th annual middleware conference,  149–161 November 2015

  35. Stanoi, I., Mihaila, G., Palpanas, T., Lang, C.: Whitewater: distributed processing of fast streams. IEEE Trans. Knowl. Data Eng. 19(9), 1214–1226 (2007)

    Article  Google Scholar 

  36. Smirnov, P., Melnik, M., Nasonov, D.: Performance-aware scheduling of streaming applications using genetic algorithm. Procedia Comput. Sci. 108, 2240–2249 (2017)

    Article  Google Scholar 

  37. Russo, G.R., Cardellini, V., Presti, F.L.: Reinforcement learning based policies for elastic stream processing on heterogeneous resources. In proceedings of the 13th ACM international conference on distributed and event-based systems, 31–42 June 2019

  38. da Silva Veith, A., De Souza, F.R., de Assuncao, M.D., Lefèvre, L., Anjos, D.: J. C. S. Multi-objective reinforcement learning for reconfiguring data stream analytics on edge computing. In proceedings of the 48th international conference on parallel processing, 1–10 August 2019

  39. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  Google Scholar 

  40. Fukunaga, A.S., Korf, R.E.: Bin-completion algorithms for multicontainer packing and covering problems.  IJCAI Int. Joint Conf. Artif. Intel. 28, 117–124 (2005)

    Google Scholar 

  41. Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. Proceedings of the 1995 ACM/IEEE conference on supercomputing. ACM, 28 December 1995

  42. Glover, F., Laguna, M.: Handbook of combinatorial optimization, pp. 2093–2229. Springer, Berlin (1998)

    Book  Google Scholar 

  43. Nasim, R., Kassler, A.J.: A robust tabu search heuristic for VM consolidation under demand uncertainty in virtualized datacenters. IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID), 170–180 May 2017

  44. Gendreau, M.: An introduction to tabu search. In handbook of metaheuristics, 37–54 July 2003

  45. Validi, A., Kashansky, V., Khiari, J., Hadian, H., Prodan, R., Li, J., Wang, F.Y., Olaverri-Monreal, C.: Hybrid on/off blockchain approach for vehicle data management, processing and visualization exemplified by the adapt platform. In: IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp. 3152–3158 (2022)

  46. Illecker, M.: SentiStorm, https://github.com/millecker/senti-storm(2015)

  47. Lombardi, F., Aniello, L., Bonomi, S., Querzoni, L.: Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel Distrib. Syst. 29(3), 572–585 (2018)

    Article  Google Scholar 

  48. Ziekow, H., Jerzak, Z.: The DEBS 2014 grand challenge. In proceedings of the 8th ACM international conference on distributed event-based systems, DEBS, 10-1145 July 2014

Download references

Funding

Enquiries about data availability should be directed to the authors.

Author information

Authors and Affiliations

Authors

Contributions

Dear Editor-in-Chief,Please find enclosed the manuscript: " GT-Scheduler: A Hybrid Graph-Partitioning and Tabu-Search Based Task Scheduler for Distributed Data Stream Processing Systems”, by Hamid Hadian and Mohsen Sharifi, which we wish to submit. All co-authors have contributed to the manuscript and have seen and agreed with the final version. There is no financial interest to report. We certify that the submission was not previously published and is not under review in any other publication.

Corresponding author

Correspondence to Hamid Hadian.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Fig. 17

Fig. 17
figure 17

The process of scheduling in GT-Scheduler

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hadian, H., Sharifi, M. GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systems. Cluster Comput (2024). https://doi.org/10.1007/s10586-023-04260-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-023-04260-y

Keywords

Navigation