Governed Lakehouse Architecture: Leveraging Databricks Unity Catalog for Scalable, Secure Data Mesh Implementation
DOI:
https://doi.org/10.15662/IJEETR.2023.0502007Keywords:
Data Mesh, Governed Lakehouse Architecture, Federated Data Ecosystems, Enterprise Architecture, Domain-Driven Design, Semantic Federation, Metadata Management, FAIR PrinciplesAbstract
The trajectory of enterprise data architecture has long been defined by a dialectical tension between the rigid order of the warehouse and the scalable anarchy of the data lake, a cycle currently manifesting in the sociologically seductive but technically immature concept of the Data Mesh. While the existing literature privileges the organizational benefits of domain ownership relying heavily on "gray literature" and blog posts rather than rigorous academic scrutiny it remains dangerously silent on the engineering mechanics required to prevent governance fragmentation. This study operationalizes the "Governed Lakehouse" architecture, utilizing a Federated Data Insights Platform to enforce global invariants security, semantic federation, and auditability across a heterogeneous environment. By acknowledging the fundamental distinction between global schema (presentation) and local schemas (storage), the proposed architecture replaces ad-hoc provisioning with a structured mediation mechanism. However, our analysis indicates that while this control plane successfully enforces syntactic interoperability, true integration requires semantic enrichment to bridge disparate systems. Ultimately, this research demonstrates that a viable Data Mesh is not merely a decentralized organizational model but requires a rigid, centralized metadata substrate specifically, a Minimal Lakehouse Architecture to function, redefining governance not as a bureaucratic gatekeeper but as a computed property of the infrastructure itself.References
1. Armbrust, M., Das, T., Paranjpye, S., Xin, R., Zhu, S., Ghodsi, A., Yavuz, B., Murthy, M., Torres, J., Sun, L., Boncz, P. A., Mokhtar, M., Van Hovell, H., Ionescu, A., Luszczak, A., Switakowski, M., Ueshin, T., Li, X., Szafranski, M., Senster, P., & Zaharia, M. (2020). Delta lake. Proceedings of the VLDB Endowment, 13(12), 3411–3424. https://doi.org/10.14778/3415478.3415560
2. Machado, I., Costa, C. A., & Santos, M. Y. (2022). Data Mesh: Concepts and Principles of a Paradigm Shift in Data Architectures. Procedia Computer Science, 207, 2088–2097. https://doi.org/10.1016/j.procs.2021.12.013
3. Machado, I., Costa, C. A., & Santos, M. Y. (2022). Advancing Data Architectures with Data Mesh Implementations. Advanced Information Systems Engineering. CAiSE 2022. Lecture Notes in Computer Science, 13296. https://doi.org/10.1007/978-3-031-07481-3_2
4. Oreščanin, D., & Hlupić, T. (2021). Data Lakehouse - a Novel Step in Analytics Architecture. 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), 126–131. https://doi.org/10.23919/mipro52101.2021.9597091
5. Janssen, M., Brous, P., Estevez, E., Barbosa, L., & Janowski, T. (2020). Data governance: Organizing data for trustworthy Artificial Intelligence. Government Information Quarterly, 37(3), 101493. https://doi.org/10.1016/j.giq.2020.101493
6. Abraham, R., Schneider, J., & Brocke, J. (2019). Data governance: A conceptual framework, structured review, and research agenda. International Journal of Information Management, 49, 424–438. https://doi.org/10.1016/J.IJINFOMGT.2019.07.008
7. Syed, S., & Nampalli, R. C. R. (2020). Data Lineage Strategies – A Modernized View. K?¿nye Y?¿netimi ve Enformatik Y?¿ll?¿??, 26(4), 1–11. https://doi.org/10.53555/kuey.v26i4.8104
8. Garcia, R. D., Ramachandran, G., Jurdak, R., & Ueyama, J. (2022). Blockchain-Aided and Privacy-Preserving Data Governance in Multi-Stakeholder Applications. IEEE Transactions on Network and Service Management, 19(6), 6667–6682. https://doi.org/10.1109/TNSM.2022.3225255
9. Chen, Z., Shao, H., Li, Y., Lu, H., & Jin, J. (2022). Policy-Based Access Control System for Delta Lake. 2022 IEEE International Conference on Big Data (BigData). https://doi.org/10.1109/CBD58033.2022.00020
10. Braun, S., Bieniusa, A., & Elberzhager, F. (2021). Advanced Domain-Driven Design for Consistency in Distributed Data-Intensive Systems. Proceedings of the 16th Workshop on Hot Topics in System Dependability (HotDep '21). https://doi.org/10.1145/3447865.3457969
11. Sawadogo, P. N., Kibata, T., & Darmont, J. (2019). Metadata Management for Textual Documents in Data Lakes. Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS), 72–83. https://doi.org/10.5220/0007706300720083
12. Ravat, F., & Zhao, Y. (2019). Metadata Management for Data Lakes. New Trends in Database and Information Systems. ADBIS 2019. Communications in Computer and Information Science, 1062, 49–57. https://doi.org/10.1007/978-3-030-30278-8_5
13. Megdiche, I., Ravat, F., & Zhao, Y. (2021). Metadata Management on Data Processing in Data Lakes. Theory and Practice of Computer Science. SOFSEM 2021. Lecture Notes in Computer Science, 12607, 481–492. https://doi.org/10.1007/978-3-030-67731-2_40
14. Karkoskov??, S. (2022). Data Governance Model To Enhance Data Quality In Financial Institutions. Informatica Economica Journal, 26(4), 1–11. https://doi.org/10.1080/10580530.2022.2042626
15. Guay, R., & Birch, K. (2022). A comparative analysis of data governance: Socio-technical imaginaries of digital personal data in the USA and EU (2008–2016). Big Data & Society, 9(2). https://doi.org/10.1177/20539517221112925
16. Yu, G., Zha, X., Wang, X., Yu, K., Yu, P., Zhang, J., Ping, R., Liu, I. Y. J., & Guo. (2020). Enabling Attribute Revocation for Fine-Grained Access Control in Blockchain-IoT Systems. IEEE Transactions on Engineering Management, 68(3), 856–868. https://doi.org/10.1109/TEM.2020.2966645
17. Gao, H., Ma, Z., Luo, S., Xu, Y., & Wu, Z. (2021). BSSPD: A Blockchain-Based Security Sharing Scheme for Personal Data with Fine-Grained Access Control. Wireless Communications and Mobile Computing, 2021. https://doi.org/10.1155/2021/6658920
18. Oukhouya, L., El Haddadi, A., Er-Raha, B., & Asri, H. (2021). A generic metadata management model for heterogeneous sources in a data warehouse. E3S Web of Conferences, 297, 01069. https://doi.org/10.1051/e3sconf/202129701069
19. Freche, J., Heijer, M. D., & Wormuth, B. (2021). Data Lineage. In: Data Engineering, Business Intelligence, and Data Science (pp. 3–15). Springer. https://doi.org/10.1007/978-3-030-78821-6_1
20. Begoli, E., Goethert, I., & Knight, K. (2021). A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks. 2021 IEEE International Conference on Big Data (BigData), 126–135. https://doi.org/10.1109/BigData52589.2021.9671535
21. Lis, D., & Otto, B. (2021). Towards a Taxonomy of Ecosystem Data Governance. Proceedings of the 54th Hawaii International Conference on System Sciences (HICSS), 6422–6432. https://doi.org/10.24251/HICSS.2021.733
22. Khairunisak, S. T., Kusumasari, T. F., & Fauzi, R. (2021). Design Guidelines and Process of Metadata Management Based on Data Management Body of Knowledge. 2021 6th International Conference on Information Management (ICIM), 280–284. https://doi.org/10.1109/ICIM52229.2021.9417154
23. Wang, S., Zhang, Y., & Zhang, Y. (2018). A Blockchain-Based Framework for Data Sharing With Fine-Grained Access Control in Decentralized Storage Systems. IEEE Access, 6, 59555–59567. https://doi.org/10.1109/ACCESS.2018.2851614
24. Wang, Y., Li, S., Liu, H., Zhang, H., & Pan, B. (2022). A Reference Architecture for Blockchain-based Traceability Systems Using Domain-Driven Design and Microservices. 2022 29th Asia-Pacific Software Engineering Conference (APSEC), 219–228. https://doi.org/10.1109/APSEC57359.2022.00039
25. Buyya, R., Ramamohanarao, K., Leckie, C., Calheiros, R., Dastjerdi, A. V., & Versteeg, S. (2015). Big Data Analytics-Enhanced Cloud Computing: Challenges, Architectural Elements, and Future Directions. Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS), 151–160. https://doi.org/10.1109/ICPADS.2015.18





