Modelling and Data Analysis
2023. Vol. 13, no. 2, 180–193
doi:10.17759/mda.2023130210
ISSN: 2219-3758 / 2311-9454 (online)
Development of an ETL Process Based on Open Source Technologies to Solve the Problem of Data Delivery to Consumers
Abstract
The article discusses the issues of developing an ETL process for a data warehouse based on open source technologies, instead of private software supplied by the vendor. The process allows you to deliver data from the source to the consumer, focusing on the speed of delivery, the resources spent and the convenience of development. The architecture for solving the problem with a description of the processes being replaced is presented, data transmission over a new process is implemented. Modern tools used to work with data are involved, methods of interaction with them and selection of technical characteristics for the process are described.
General Information
Keywords: database, open source, software, ETL process, data delivery
Journal rubric: Software
Article type: scientific article
DOI: https://doi.org/10.17759/mda.2023130210
Received: 12.04.2023
For citation: Starkov V.V., Gorbatova S.S., Vodolaga V.I. Development of an ETL Process Based on Open Source Technologies to Solve the Problem of Data Delivery to Consumers. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2023. Vol. 13, no. 2, pp. 180–193. DOI: 10.17759/mda.2023130210. (In Russ., аbstr. in Engl.)
References
- David Loshin. ETL (Extract, Transform, Load) . Business Intelligence. — 2nd. — Morgan Kaufmann, 2012. — 400 p
- Ralph Kimball, Joe Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. — John Wiley & Sons, 2004. — 528 p.
- David Haertzen. ETL Tools . The Analytical Puzzle: Profitable Data Warehousing, Business Intelligence and Analytics. — Technics Publications, 2012. — 346 p.
- S. Riza, U. Lezerson, Sh. Ouen, D. Uills. Spark dlya professionalov: sovremennye patterny obrabotki bol'shikh dannykh = Advanced Analytics with Spark. Patterns for Learning from Data at Scale (O’Reilly, 2015). 2017. — 272 p.
- Uorren R., Karau Kh. Effektivnyi Spark. Masshtabirovanie i optimizatsiya = High Performance Spark. Best Practices for Scaling and Optimizing Apache Spark. 2018. — 352 s.
- Kh. Karau, E. Konvinski, P. Vendell, M. Zakhariya. Izuchaem Spark. Molnienosnyi analiz dannykh = Learning Spark: Lightning-Fast Big Data Analytics (O’Reilly, 2015). 2015. — 304 s.
- Narkhid Niya, Shapira Gven, Palino Todd. Apache Kafka. Potokovaya obrabotka i analiz dannykh. — SPb., 2019 p = 320.
- Vohra, Deepak (October 2016). Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools (1st ed.). Apress. p. 429.
Information About the Authors
Metrics
Views
Total: 243
Previous month: 14
Current month: 10
Downloads
Total: 139
Previous month: 10
Current month: 7