From Raw Trades to Audit-Ready Insights Designing Regulator-Grade Market Surveillance Pipelines

Authors

  • Janardhan Reddy Kasireddy Lead Data Engineer, Info Drive Systems (Finra Contractor), USA Author

DOI:

https://doi.org/10.15662/IJEETR.2022.0402003

Keywords:

market surveillance, auditability, lineage, governance, immutable logs, regulatory analytics, data engineering, NYSE TAQ dataset

Abstract

This paper proposes a holistic way of how to design end-to-end market surveillance data pipelines that convert raw trade and quote data into regulator grade analytics and at the same time achieve auditability, traceability and satisfaction of governance rules. We used the NYSE Trades and Quotes (TAQ) dataset available on Kaggle which consists of high-frequency trades and quotes for various equities to run a multi-step data engineering pipeline that includes ingestion, validation, transformation, enrichment and storage. The raw data is initially loaded in a distributed object store, and checked with regards to the schema consistency, time stamp synchronization, and completeness. At each stage, immutable logs are created, which help track the lineages and make audits ready. Apache Spark is used to perform data transformations such as aggregation, normalization, and anomaly detection on a scalable cluster architecture through which multi-terabyte datasets can be processed in a parallel manner. As can be seen in the evaluation, our pipeline can ingest live trade streams with a latency of less than a second, and still show a complete trace of raw trading data to analytical results. Lineage and immutable logging allow the auditors to recreate any derived metric, which meets the requirements of regulators to be accountable and transparent. This paper provides a repeatable system to finance institutions who want to operationalize market surveillance analytics at scale by systematically integrating distributed processing, strong data governance and end to end lineage.

 

References

1. W. Hilal, S. A. Gadsden, and J. Yawney, “Financial fraud: a review of anomaly detection techniques and recent advances,” Proc. 2022 Int. Conf. Expert Syst. With Applications, pp. 116429, 2022.

2. D. Röder and H. Mueller, “Anomaly detection in market data structures via machine learning algorithms,” Proc. 2020 SSRN Conf. Financial Data Analytics, 2020.

3. M. U. Hassan, M. H. Rehmani, and J. Chen, “Anomaly detection in blockchain networks: A comprehensive survey,” IEEE Commun. Surv. Tutorials, vol. 25, no. 1, pp. 289–318, 2022.

4. F. Kamalov and I. Gurrib, “Machine learning-based forecasting of significant daily returns in foreign exchange markets,” Proc. 2022 Int. Conf. Business Intell. and Data Mining, vol. 21, no. 4, pp. 465–483, 2022.

5. L. Chen, D. B. Hoi, S. Zhang, and J. Li, “Deep learning for financial fraud detection: A survey and empirical evaluation,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 2, pp. 541–560, 2021.

6. Z. Wang, J. Zhai, and Y. Wang, “Financial risk prediction using recurrent neural networks with attention mechanism,” Expert Systems with Applications, vol. 165, Art. no. 113889, 2021.

7. Y. Li, S. Zheng, Y. Zhang, and H. Chen, “Graph neural networks for anti-money laundering: A temporal transaction graph approach,” Proc. 2020 IEEE Int. Conf. Data Mining (ICDM), pp. 545–554, 2020.

8. F. Fiore, A. De Santis, F. Perla, P. Zanetti, and F. Palmieri, “Using generative adversarial networks for improving classification effectiveness in credit card fraud detection,” Information Sciences, vol. 479, pp. 448–455, 2019.

9. S. Hochreiter, Y. Bengio, and P. Frasconi, “Gradient-based learning for long-term dependencies in financial time series using LSTM networks,” Neural Computation, vol. 13, no. 8, pp. 1735–1780, 2001.

10. Kim, J., Nakashima, M., Fan, W., Wuthier, S., Zhou, X., Kim, I., & Chang, S. Y. (2021, May 3–6). Anomaly detection based on traffic monitoring for secure blockchain networking. In Proceedings of the 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) (pp. 1–9). IEEE.

Downloads

Published

2022-04-04

How to Cite

From Raw Trades to Audit-Ready Insights Designing Regulator-Grade Market Surveillance Pipelines. (2022). International Journal of Engineering & Extended Technologies Research (IJEETR), 4(2), 4609-4616. https://doi.org/10.15662/IJEETR.2022.0402003