AI-Based Data Engineering Pipelines for Real-Time Cybersecurity Threat Detection
DOI:
https://doi.org/10.15662/IJEETR.2023.0506028Keywords:
Cybersecurity Threat Detection, AI-Driven Security Analytics, Real-Time Threat Detection, ML-Based Detection Systems, Security Data Pipelines, Threat Intelligence Systems, Feature Extraction for Security, Data Ingestion Pipelines, Security Data Transformation, Data Quality in Security, Security Data Governance, Data Provenance Tracking, Real-Time Analytics Systems, Detection Accuracy Optimization, Security Pipeline Architecture, Threat Detection Metrics, AI-Enhanced Data Engineering, Near Real-Time Monitoring, Security Compliance Systems, Adaptive Threat DetectionAbstract
Timely detection and mitigation of cybersecurity threats is critical for organizations and the broader ecosystem. Numerous incident types, a wide attack surface and a dispersed threat landscape necessitate advanced detection capabilities across multiple areas. AI and ML techniques offer promise in improving the accuracy and efficiency of detection functions, in addition to enhancing elements of the data engineering processes that support them. Fundamentals of detection tasks, as well as Data Engineering in a Real-Time Analysis context, provide foundational guidance for research. A framework is proposed for deploying AI-enhanced data pipelines capable of supporting detection activities in real-time or near-real-time.
The pipeline architecture and individual components emphasize those aspects of data engineering necessary for timely threat detection—ingestion, feature extraction, transformation, quality, governance, compliance, and provenance. Challenge factors affecting these areas are identified, along with appropriate metrics, and a consolidation of support mechanisms and techniques that assist with real-time capability. Together with a set of design, classification, and evaluation guidelines, these provide a comprehensive foundation for the pipeline aspects of Data Engineering within Real-Time Analytics (swiftness, completeness, correctness, and cost-effectiveness).
References
[1] Kolla, S. K. (2023). Explainable AI and ML Models for Transparent Clinical Decision Support. Journal for ReAttach Therapy and Developmental Diversities, 6, 2444-2460.
[2] Davuluri, P. N. Integrating Artificial Intelligence into Event-Driven Financial Crime Compliance Platforms.
[3] Aitha, A. R. (2023). CloudBased Microservices Architecture for Seamless Insurance Policy Administration. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 607-632.
[4] Pamisetty, A. (2022). Big Data can Generate Major Opportunities for Manufacturing Supply Chains. International Journal of Scientific Research and Modern Technology, 1(12), 238–251. https://doi.org/10.38124/ijsrmt.v1i12.1186
[5] Moustafa, N., & Slay, J. (2015). UNSW-NB15 dataset. Military Communications Conference.
[6] Garapati, R. S. (2022). AI-Augmented Virtual Health Assistant: A Web-Based Solution for Personalized Medication Management and Patient Engagement. Available at SSRN 5639650.
[7] Inala, R. Designing Scalable Technology Architectures for Customer Data in Group Insurance and Investment Platforms.
[8] Kolla, S. H. (2021). Rule-Based Automation for IT Service Management Workflows. Online Journal of Engineering Sciences, 1(1), 1-14.
[9] Segireddy, A. R. (2020). Cloud Migration Strategies for High-Volume Financial Messaging Systems.
[10] Yandamuri, U. S. (2023). An Intelligent Analytics Framework Combining Big Data and Machine Learning for Business Forecasting. International Journal Of Finance, 36(6), 682-706.
[11] Singireddy, J. (2023). Finance 4.0: Predictive analytics for financial risk management using AI. European Journal of Analytics and Artificial Intelligence (EJAAI) p-ISSN, 3050-9556.
[12] Somasundaram, P. (2023). Improving real-time job monitoring for cloud-based data pipelines. International Journal of Computer Engineering and Technology, 14(3), 39–47.
[13] Davuluri, P. N. (2020). Event-Driven Architectures for Real-Time Regulatory Monitoring in Global Banking.
[14] Kolla, S. H. (2023). Deep Learning–Driven Retrieval-Augmented Generation for Enterprise ITSM Automation: A Governance-Aligned Large Language Model Architecture. Journal of Computational Analysis and Applications, 31(4).
[15] Singireddy, J. (2022). Leveraging Artificial Intelligence and Machine Learning for Enhancing Automated Financial Advisory Systems: A Study on AIDriven Personalized Financial Planning and Credit Monitoring. Mathematical Statistician and Engineering Applications, 71(4), 16711-16728.
[16] Amistapuram, K. Energy-Efficient System Design for High-Volume Insurance Applications in Cloud-Native Environments. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI, 10.
[17] Mahesh Recharla, (2020), "Targeted Gene Therapy for Spinal Muscular Atrophy: Advances in Delivery Mechanisms and Clinical Outcomes", International Journal of Science and Research (IJSR), 9(12), 1921-1934. https://dx.doi.org/10.21275/SR20126161624, https://www.ijsr.net/getabstract.php?paperid=SR20126161624
[18] Kulkarni, A. R., Kumar, N., & Rao, K. R. (2023). Big data analytics and monitoring frameworks for scalable data pipelines. Big Data Mining and Analytics, 6(2), 139–153.
[19] Botlagunta Preethish Nandan, "Data Analytics-Driven Approaches to Yield Prediction in Semiconductor Manufacturing," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2021.91217.
[20] Garapati, R. S. (2023). Optimizing Energy Consumption in Smart Build-ings Through Web-Integrated AI and Cloud-Driven Control Systems.
[21] Chowdhury, R. H. (2021). Cloud-based data engineering for scalable business analytics solutions: designing scalable cloud architectures to enhance the efficiency of big data analytics in enterprise settings. Journal of Technological Science & Engineering (JTSE), 2(1), 21-33.
[22] Vamsee Pamisetty, Lahari Pandiri, Sneha Singireddy, Venkata Narasareddy Annapareddy, Harish Kumar Sriram. (2022). Leveraging AI, Machine Learning, And Big Data For Enhancing Tax Compliance, Fraud Detection, And Predictive Analytics In Government Financial.
[23] Gottimukkala, V. R. R. (2021). Digital Signal Processing Challenges in Financial Messaging Systems: Case Studies in High-Volume SWIFT Flows.
[24] Aitha, A. R. (2023). Cloud-Native Big Data AI/ML Framework for Risk Intelligence and Fraud Control in Banking and Insurance Ecosystems. Available at SSRN 6157967.
[25] Sheelam, G. K., & Nandan, B. P. (2021). Machine Learning Integration in Semiconductor Research and Manufacturing Pipelines. International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI, 10.
[26] Chakilam, C., Suura, S. R., Koppolu, H. K. R., & Recharla, M. (2022). From Data to Cure: Leveraging Artificial Intelligence and Big Data Analytics in Accelerating Disease Research and Treatment Development. Journal of Survey in Fisheries Sciences. https://doi.org/10.53555/sfs.v9i3.3619.
[27] Nagabhyru, K. C. (2023). Accelerating Digital Transformation with AI Driven Data Engineering: Industry Case Studies from Cloud and IoT Domains. Educational Administration: Theory and Practice, 29(4), 5898-5910
[28] Bonawitz, K., et al. (2023). Secure aggregation for federated learning. Google Research.
[29] Kalisetty, S., & Singireddy, J. (2023). Optimizing Tax Preparation and Filing Services: A Comparative Study of Traditional Methods and AI Augmented Tax Compliance Frameworks. Available at SSRN 5206185.
[30] Goutham Kumar Sheelam. (2022). Reconfigurable Semiconductor Architectures For AI-Enhanced Wireless Communication Networks. Kurdish Studies, 10(2), 1027–1040. https://doi.org/10.53555/ks.v10i2.3867.
[31] Yandamuri, U. S. (2021). A Comparative Study of Traditional Reporting Systems versus Real-Time Analytics Dashboards in Enterprise Operations. Universal Journal of Business and Management
[32] Dwaraka Nath Kummari, Srinivasa Rao Challa, “Big Data and Machine Learning in Fraud Detection for Public Sector Financial Systems,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2020.91221
[33] Sheelam, G. K., & Nandan, B. P. (2022). Integrating AI And Data Engineering For Intelligent Semiconductor Chip Design And Optimization. Migration Letters, 19, 2178-2207.
[34] Mangalampalli, B. M. (2023). AI-Driven Anomaly Detection in Healthcare Claims Data: A Business Intelligence Perspective. Journal of Rare Cardiovascular Diseases.
[35] Mukesh, A., & Aitha, A. R. (2021). Insurance Risk Assessment Using Predictive Modeling Techniques. International Journal of Emerging Research in Engineering and Technology, 2(4), 68-79.
[36] Palanichamy, R. S. T. (2023). AI and data governance: Enhancing security, privacy, and accountability. International Journal on Science and Technology, 14(1), 1–10
[37] Dwaraka Nath Kummari,. (2022). Machine Learning Approaches to Real-Time Quality Control in Automotive Assembly Lines. Mathematical Statistician and Engineering Applications, 71(4), 16801–16820. Retrieved from https://philstat.org/index.php/MSEA/article/view/2972
[38] Meda, R. End-to-End Data Engineering for Demand Forecasting in Retail Manufacturing Ecosystems.
[39] Mangala, N. (2022). Real-Time Data Quality Monitoring and Gating Frameworks in Cloud-Based Data Pipelines. International Journal of Research and Applied Innovations, 5(6), 8197-8219.
[40] Nasiri, S., Rahmani, A. M., & Rezaei, M. (2023). A systematic review of big data stream processing frameworks and applications. Journal of Big Data, 10(1), 67.
[41] Inala, R. (2021). A New Paradigm in Retirement Solution Platforms: Leveraging Data Governance to Build AI-Ready Data Products. Journal of International Crisis and Risk Communication Research, 286-310.
[42] Gottimukkala, V. R. R. (2023). Privacy-Preserving Machine Learning Models for Transaction Monitoring in Global Banking Networks. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 633-652.
[43] Malempati, M., Pandiri, L., Paleti, S., & Singireddy, J. (2023). Transforming financial and insurance ecosystems through intelligent automation, secure digital infrastructure, and advanced risk management strategies. Jeevani, Transforming Financial And Insurance Ecosystems Through Intelligent Automation, Secure Digital Infrastructure, And Advanced Risk Management Strategies (December 03, 2023).
[44] Pamisetty, A. (2022). Integrating Big Data, AI, and Financial Modeling in Cloud-Based Insurance and Banking Ecosystems. AI, and Financial Modeling in Cloud-Based Insurance and Banking Ecosystems (December 05, 2022).
[45] Sriram, H. K., ADUSUPALLI, B., Singreddy, S., & Malempati, M. (2021). Revolutionizing Risk Assessment and Financial Ecosystems with Smart Automation, Secure Digital Solutions, and Advanced Analytical Frameworks. Murali, Revolutionizing Risk Assessment and Financial Ecosystems with Smart Automation, Secure Digital Solutions, and Advanced Analytical Frameworks (December 27, 2021).
[46] Kolla, T. (2023). Predictive ETL Failure Detection in Healthcare Data Pipelines Using Anomaly Detection Algorithms. International Journal of Medical Toxicology & Legal Medicine.
[47] Mangalampalli, B. M. Intelligent Data Profiling for Healthcare Data Lakes Using AI-Enhanced Analytics.
[48] Recharla, M., & Chitta, S. AI-Enhanced Neuroimaging and Deep Learning-Based Early Diagnosis of Multiple Sclerosis and Alzheimer’s.
[49] Nasiri, S., et al. (2023). A systematic review of big data stream processing frameworks and applications. Journal of Big Data, 10(1), 67.
[50] Botlagunta, P. N., & Sheelam, G. K. (2020). Data-Driven Design and Validation Techniques in Advanced Chip Engineering. Global Research Development (GRD) ISSN, 2455-5703.
[51] Meda, R. (2020). Designing Self-Learning Agentic Systems for Dynamic Retail Supply Networks. Online Journal of Materials Science, 1(1), 1-20.
[52] Valiki, D., & Kummari, D. N. (2021). Rule-Based Decision Systems for the Automation of Audit Sampling. International Journal of Emerging Trends in Computer Science and Information Technology, 2(4), 105-114
[53] Mangala, N. (2021). CI/CD Pipeline Automation for Enterprise Data Artifacts Using Azure DevOps. Universal Journal of Business and Management, 1(1), 1-18. https://doi.org/10.31586/ujbm.2021.1363
[54] Nagubandi, A. R. (2023). Advanced Multi-Agent AI Systems for Autonomous Reconciliation Across Enterprise Multi-Counterparty Derivatives, Collateral, and Accounting Platforms. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 653-674
[55] Amistapuram, K. (2022). Fraud Detection and Risk Modeling in Insurance: Early Adoption of Machine Learning in Claims Processing. Available at SSRN 5741982.
[56] Gadi, A. L. , Gadi, A. L. Kannan, S. , Kannan, S. Nandan, B. P. , Nandan, B. P. Komaragiri, V. B. , & Komaragiri, V. B. (2021). Advanced Computational Technologies in Vehicle Production, Digital Connectivity, and Sustainable Transportation: Innovations in Intelligent Systems, Eco-Friendly Manufacturing, and Financial Optimization. Universal Journal of Finance and Economics, 1(1), 87-100. https://doi.org/10.31586/ujfe.2021.1296.
[57] Inala, R. Advancing Group Insurance Solutions Through Ai-Enhanced Technology Architectures And Big Data Insights.
[58] Kannan, S., Nuka, S. T., Pamisetty, V., Gadi, A. L., Krishna, H., & Koppolu, R. ENHANCING AGRICULTURAL EQUIPMENT AND MEDICAL DEVICES Pamisetty, V. (2020). Optimizing tax compliance and fraud prevention through intelligent systems: The role of technology in public finance innovation. Available at SSRN 5250796.
[59] Kummari, D. N., & Burugulla, J. K. R. (2023). Decision Support Systems for Government Auditing: The Role of AI in Ensuring Transparency and Compliance. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 493-532.
[60] Ring, M., Wunderlich, S., Grüdl, D., Landes, D., & Hotho, A. (2019). Flow-based intrusion detection datasets. Computers & Security, 86, 147–167.
[61] Adusupalli, B., Singireddy, S., & Pandiri, L. Implementing Scalable Identity and Access Management Frameworks in Digital Insurance Platforms. International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI, 10.
[62] Segireddy, A. R. (2022). Terraform and Ansible in Building Resilient Cloud-Native Payment Architectures. International Journal of Intelligent Systems and Applications in Engineering, 10, 444-455.
[63] Gottimukkala, V. R. R. (2020). Energy-Efficient Design Patterns for Large-Scale Banking Applications Deployed on AWS Cloud. power, 9(12).
[64] Garapati, R. S., & Kanna, S. R. A Digital Twin‑Enabled Predictive Maintenance Framework Leveraging Multi‑Agent Reinforcement Learning and Industrial IoT Data.
[65] Pamisetty, V., Dodda, A., Lakarasu, P., Singireddy, J., & Challa, K. (2022). Optimizing Digital Finance and Regulatory Systems Through Intelligent Automation, Secure Data Architectures, and Advanced Analytical Technologies. Secure Data Architectures, and Advanced Analytical Technologies (December 10, 2022).
[66] Nagabhyru, K. C. (2023). From Data Silos to Knowledge Graphs: Architecting CrossEnterprise AI Solutions for Scalability and Trust. Available at SSRN 5697663.
[67] Pamisetty, A. (2021). A comparative study of cloud platforms for scalable infrastructure in food distribution supply chains.
[68] Aiswarya, K., Reddy, P., & Kumar, V. (2023). Fault detection and mitigation strategies in data pipeline systems. International Journal of Data Engineering, 14(1), 22–34.
[69] Yandamuri, U. S. (2022). Big Data Pipelines for Cross-Domain Decision Support: A Cloud-Centric Approach. International Journal of Scientific Research and Modern Technology (IJSRMT).
[70] Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). KDD Cup dataset issues. CISDA.





