• Monday, Sep 30th, 2024 |
  • ISSN: 2456-1134 |
  • +91 9940572462 |
  • isjcresm@gmail.com

International Scientific Journal of Contemporary Research in

Engineering Science and Management

|ISSN Approved Journal | Impact factor: 7.521 | Follows UGC CARE Journal Norms and Guidelines|
|Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal|Impact
factor 7.521 (Calculated by Google Scholar and Semantic Scholar| AI-Powered Research Tool| Indexing)
in all Major Database & Metadata, Citation Generator

Abstract

Big Data Pipeline Optimization using Apache Kafka and Spark Streaming

Choon Lin Tan

Abstract

Real-time data processing has become essential in domains such as finance, e-commerce, and IoT, where high-velocity data must be ingested, processed, and analyzed with minimal delay. This paper explores the design and optimization of big data pipelines using Apache Kafka for data ingestion and Apache Spark Streaming for distributed processing. We implement a prototype pipeline that simulates a stock market feed handling millions of events per hour across a 10-node cluster. Key optimizations include Kafka topic partition tuning, Spark batch interval adjustments, memory and executor configuration, and fault-tolerant checkpointing. Performance is evaluated using metrics such as throughput, end-to-end latency, and resource utilization. Our results demonstrate that aligning Kafka’s partition-to-consumer mapping with Spark’s task parallelism, along with fine-tuned micro-batching, yields a 27% increase in throughput and a 30% reduction in latency. We also analyze fault recovery,

Home

About Us

Editorial Board

Authors

Topics

Current Issue

October 2023

Impact Factor

Indexing

FAQ

Policies

Contact Us

Copyright © 2021 IJMRSET All Rights Reserved