• Monday, Sep 30th, 2024 |
  • ISSN: 2456-1134 |
  • +91 9940572462 |
  • isjcresm@gmail.com

International Scientific Journal of Contemporary Research in

Engineering Science and Management

|ISSN Approved Journal | Impact factor: 7.521 | Follows UGC CARE Journal Norms and Guidelines|
|Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal|Impact
factor 7.521 (Calculated by Google Scholar and Semantic Scholar| AI-Powered Research Tool| Indexing)
in all Major Database & Metadata, Citation Generator

Abstract

ANALYZING LARGE DATASETS WITH HADOOP AND SPARK: AN INTEGRATED APPROACH

Harsha Vardhan Reddy Goli

Abstract

The exponential growth of data generated by modern digital systems poses significant challenges for traditional data processing architectures. While Hadoop and its MapReduce paradigm offer scalable storage and batch processing, their disk-based execution model limits efficiency, especially in iterative and real- time applications. Conversely, Apache Spark provides high-performance, in-memory processing and supports a wide range of analytics tasks, including streaming, machine learning, and interactive queries. This paper proposes an integrated Hadoop-Spark architecture that combines the storage resilience of Hadoop Distributed File System (HDFS) with Spark’s advanced in-memory computation capabilities. We present a case study involving a 100 GB semi-structured web log dataset to evaluate the performance, scalability, and resource efficiency of the integrated approach. Benchmark results show that the hybrid model significantly reduces execution time and improves CPU utilizati

Home

About Us

Editorial Board

Authors

Topics

Current Issue

October 2023

Impact Factor

Indexing

FAQ

Policies

Contact Us

Copyright © 2021 IJMRSET All Rights Reserved