A Survey on Optimal Scheduler: Improving Efficiency in Parallel Execution Tasks in Hadoop

Aishwarya S, Kirthikka Devi D, Nandhini J, Renown Manjuna G


Hadoop’s implementation of the Map Reduce programming model pipelines the data processing and provides fault tolerance. Input data is partitioned and distributed as map tasks to individual cluster nodes for parallel execution. Map task splits the input data that is on the Hadoop Distributed File System and map function is applied to the input data. iShuffle finds the number of map output partitions and it places map output partition to nodes. Shufflers and the shuffle manager are the components used in iShuffle. The shuffler implements an operation which pushes the output data of mapping process to different nodes. Here, multiple servers are used to produce results in a short time. Data sets related to air pollution are collected. They are processed by the servers. This increases the efficiency and reduces the job completion time.

Full Text:



YanfeiGuo, Member, JiaRao, Member, Dazhao Cheng, and Xiaobo Zhou “iShuffle: Improving Hadoop Performance with Shuffle-on-Write”, IEEE Transactions on Parallel and Distributed Systems, VOL. 28, NO. 6, JUNE 2017

Fan Zhang, Majd F. Sakr Kai Hwang, and Samee U. Khan“ Empirical Discovery of Power-Law Distribution in

MapReduce Scalability”, 2017 IEEE Transactions on Cloud Computing

HaripriyaAyyalasomayajula, Edgar Gabriel" Air Quality Simulations using Big Data Programming Models", 2016 IEEE Second International Conference on Big Data Computing Service and Applications 4. Lamari and SlaouiJ Big Data (2017)" Clustering categorical data based on the relational analysis approach and MapReduce", Journal of Big Data

M. Brahmwar, M. Kumar and G. Sikka (2016), "Tolhit – A Scheduling Algorithm for Hadoop Cluster ", Twelfth International Multi-Conference on Information Processing

Zacheilas and KalogerakiEURASIP Journal on Embedded Systems (2017), " A Pareto-based scheduler for

exploring cost-performance trade-offs for MapReduce workloads", EURASIP Journal on Embedded Systems

Alberto Fernández, Sara del Río ,Nitesh V. Chawla , Francisco Herrera, “An insight into imbalanced Big Data classification: outcomes and challenges”, Complex Intell. Syst. (2017)

Xu-qing Chai, Yong-liang Dong and Jun-fei Li (2016),“ Profit-oriented task scheduling algorithm in Hadoop cluster”, Journal on Embedded Systems

Anandkrishna, R,Dhananjay Kumar (2016), “Improving Mapreduce for Incremental Processing Using Map Data Storage”, 4th International Conference on Recent Trends in Computer Science & Engineering 10. Sreedharet al. J Big Data (2017), "Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop", Journal of Big Data


  • There are currently no refbacks.