Harsha Vardhan Reddy Goli
The exponential growth of data generated by modern digital systems poses significant challenges for traditional data processing architectures. While Hadoop and its MapReduce paradigm offer scalable storage and batch processing, their disk-based execution model limits efficiency, especially in iterative and real- time applications. Conversely, Apache Spark provides high-performance, in-memory processing and supports a wide range of analytics tasks, including streaming, machine learning, and interactive queries. This paper proposes an integrated Hadoop-Spark architecture that combines the storage resilience of Hadoop Distributed File System (HDFS) with Spark’s advanced in-memory computation capabilities. We present a case study involving a 100 GB semi-structured web log dataset to evaluate the performance, scalability, and resource efficiency of the integrated approach. Benchmark results show that the hybrid model significantly reduces execution time and improves CPU utilizati
Home
About Us
Editorial Board
Authors
Topics
Current Issue
October 2023
Impact Factor
Indexing
FAQ
Policies
Contact Us
Copyright © 2021 IJMRSET All Rights Reserved