Skip to main content
Log in

A hierarchical multi-objective task scheduling approach for fast big data processing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Due to the rapid growth of production and dissemination of big data from various sources, the speed of data processing must inevitably increase. In distributed big data processing systems such as cloud computing, the task scheduler is responsible for mapping a large set of various tasks to a set of possibly heterogeneous computing nodes in a way to raise resource efficiency and data locality and reduce makespan. Scheduling strategies that try to achieve these goals in one pass have lower performance than multi-pass strategies. To achieve higher performance, we propose MOTS (a hierarchical multi-objective task scheduling scheme) by first clustering tasks using the K-means algorithm alongside a load balancing equation to increase resource efficiency and then optimizing clusters to reduce makespan using evolutionary algorithms. The latter is achieved by using the state of physical machines and sending related consecutive tasks to a physical machine to eliminate data transfer. We have simulated and tested our scheme in Cloudsim. Our experiments show reduction of approximately 10% makespan and 4% higher CPU efficiency compared to Mai’s reinforcement learning approach and Bugerya’s parallel implementation method. The cost of data transfer between consecutive tasks is also decreased by 10% compared to Bugerya’s methods. With respect to the results and the fact that our proposed task scheduling scheme is inspired by the iHadoop method for parallel implementation, it is suitable for use in distributed big data processing systems. Information about previous executions of tasks and current status of computing nodes is highly influential in efficient mapping of tasks to computing nodes. Predictions of future resource needs of tasks and available capacities of computing nodes can complement the historical information in the way of finding a more near-to-optimal mapping, resulting in faster data processing. This issue and evaluation of our proposed scheme using real data will be pursued in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Singh T, Srivastava DK, Aggarwal A (2017) A novel approach for CPU utilization on a multicore paradigm using parallel quicksort. In: IEEE International Conference on "Computational Intelligence and Communication Technology. pp. 1–6

  2. Gao Ch, Ma J, Shen Y, Li T, Li F, Gao Y (2019) Cloud computing task scheduling based on improved differential evolutionary. IEEE Int Conf Netw Network Appl. https://doi.org/10.1109/NaNA.2019.00084

    Article  Google Scholar 

  3. Jena RK (2015) Multi objective task scheduling in cloud environment using nested PSO framework. Proc Comput Sci 57:1219–1227. https://doi.org/10.1016/j.procs.2015.07.419

    Article  Google Scholar 

  4. Arunarani A, Manjula D, Sugumaran V (2019) Task scheduling techniques in cloud computing: a literature survey. J Fut Gener Comput Systs 91:407–415

    Article  Google Scholar 

  5. Elnikety E, Elsayed T, Ramadan HE (2011) iHadoop: asynchronous iterations for MapReduce. In: Third IEEE International Conference on Cloud Computing Technology and Science. pp. 81–90

  6. L. Mai, N. Dao, M. Park (2018) Real-time task assignment approach leveraging reinforcement learning with evolutionary strategies for long-term latency minimization in fog computing. J Sensors. pp. 1–19.

  7. Bugerya AB, Kim ES, Solovev MA (2019) Parallelization of ımplementations of purely sequential algorithms. J Program Comput Softw 7:381–389

    Article  Google Scholar 

  8. Tian Q, Li J, Xue D, Wu W, Wang J, Chen L, Wang J (2020) A hybrid task scheduling algorithm based on task clustering J. Mobile Netw Appl. https://doi.org/10.1007/s11036-019-01356-xpp.1-10

    Article  Google Scholar 

  9. Abuallgah L, Diabat A (2020) A novel hybrid AntLion optimization algorithm for multi-objective task J. Clust Comput. https://doi.org/10.1007/s10586-020-03075-5,pp.1-19

    Article  Google Scholar 

  10. Narayanan D, Santhanam K, Kazhamiaka F, Phanishayee A, Zaharia M (2020) Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: http://arxiv.org/abs/2008.09213v1 pp. 1–19

  11. Azumah KK, Kosta S, Sorensen LT (2018) Scheduling in the hybrid cloud constrained by process mining. In: IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

  12. Azumah KK, Sorensen LT, Montella R, Kosta S (2020) Process mining-constrained scheduling in the hybrid cloud. J WILEY. https://doi.org/10.1002/cpe.6025,pp.1-20

    Article  Google Scholar 

  13. Jafar RA (2015) Best-worst multi-criteria decision-making method. Omega 53:49–57. https://doi.org/10.1016/j.omega.2014.11.009

    Article  Google Scholar 

  14. Ablhubaishy A, Aljuhani A (2020) The best-worst method for resource allocation and task scheduling in cloud computing. J IEEE Xplore. 978–1–7281–4213–5/20, pp. 1–6

  15. Ullah I, Youn HY (2020) Task classification and scheduling based on K-Means clustring for edge computing. J Wireless Personal Commun. https://doi.org/10.1007/s11277-020-07343-w

    Article  Google Scholar 

  16. Suresh S, Mani V, Omkar SN, Kim HJ (2006) Divisible load scheduling in distributed system with buffer constraints: genetic algorithm and linear programming approach. Int J Parallel Emerg Distrib Syst 21(5):303–321

    Article  MathSciNet  Google Scholar 

  17. Velliangiri S, Karthikeyan P, Arul Xavier VM, Baswaraj D (2021) Hybrid electro search with genetic algorithm for task scheduling in cloud computing. Ain Shams Eng J 12(1):631–639. https://doi.org/10.1016/j.asej.2020.07.003

    Article  Google Scholar 

  18. Motlagh AA, Movaghar A, Rahmani AM (2019) Task scheduling mechanisms in cloud computing: a systematic review. J WILEY. https://doi.org/10.1002/dac.4302,pp.1-23

    Article  Google Scholar 

  19. Silva EC, Gabriel PHR (2020) A comprehensive review of evolutionary algorithms for multiprocessor dag scheduling. J Comput 26:1–16

    Google Scholar 

  20. Ggasemnezhad SMK, Rahmani AAH, Saemi B, Babazadeh M, Sangaiah AK, Bian G (2019) An enhancement of task scheduling in cloud computing based on imperialist competitive algorithm and firefly algorithm. J Supercomput. https://doi.org/10.1007/s11227-019-02816-7

    Article  Google Scholar 

  21. Sharma P, Shilakari S, Chourasia U, Dixit P, Pandey A (2020) A survey on various types of task scheduling algorithm in cloud computing environment. Int J Sci Technol Res 1:1513–1521

    Google Scholar 

  22. Yin S, Bao J, Li J, Zhang J (2019) Real-time task processing method based on edge computing. J Front Mech Eng. https://doi.org/10.1007/s11465-019-0542-1,no.3,pp.320-331

    Article  Google Scholar 

  23. Utrera G, Farreras M, Fornes J (2019) Task packing: efficient task scheduling in unbalanced parallel programs to maximize CPU utilization. J Parallel Distributed Comput 134:37–49

    Article  Google Scholar 

  24. Bulchandani N, Chourasia U, Agrawal S, Dixit P, Pandey A (2020) A survey on task scheduling algorithms ın cloud. Int J Sci Technol Res 1:460–464

    Google Scholar 

  25. Liang B, Dong X, Wang Y, Zhang X (2020) A low-power task scheduling algorithm for heterogeneous cloud computing. J Supercomput. https://doi.org/10.1007/s11227-020-03163-8,pp.1-25

    Article  Google Scholar 

  26. Aljarah I, Ludwig SA (2012) Parallel particle swarm optimization clustering algorithm based on mapreduce methodology. J IEEE, pp 1–8

  27. Jalalian Z, Sharifi M (2017) Autonomous task scheduling for fast big data processing. In: TopHPC Conference, pp. 1–4, 2018

  28. Wang S, Li Y, Pang S, Lu Q, Wang S, Zhao J (2020) A task scheduling strategy in edge-cloud collaborative scenario based on deadline. J Sci Program. https://doi.org/10.1155/2020/3967847pp 1–9

    Article  Google Scholar 

  29. Tsai F, Huang C-H, Lin MH (2021) An optimal task assignment strategy in cloud-fog computing environment. J Appl Sci. https://doi.org/10.3390/app11041909

    Article  Google Scholar 

  30. Singh H, Tyagi S, Kumar P (2021) Comparative analysis of various simulations tools used in a cloud environment for task-resource mapping. In: International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems. https://doi.org/10.1007/978-981-15-7533-4_32

  31. Rodriguez MA, Buyya R (2018) Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms. J Futuer Gener Comput Syst 79:739–750. https://doi.org/10.1016/j.future.2017.05.009

    Article  Google Scholar 

  32. Bulaja D, Bozic K, Penevski N, Dzakula NB (2019) Introduction to Cloudsim. J Adv Comput Cloud Comput. https://doi.org/10.15308/Sinteza,pp.189-194

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers of the journal whose valuable comments helped us to make the revised version of paper stronger.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohsen Sharifi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jalalian, Z., Sharifi, M. A hierarchical multi-objective task scheduling approach for fast big data processing. J Supercomput 78, 2307–2336 (2022). https://doi.org/10.1007/s11227-021-03960-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03960-9

Keywords

Navigation