Development of an ETL Process Based on Open Source Technologies to Solve the Problem of Data Delivery to Consumers

V.V. Starkov; S.S. Gorbatova; V.I. Vodolaga

doi:10.17759/mda.2023130210

Modelling and Data Analysis
2023. Vol. 13, no. 2, 180–193
doi:10.17759/mda.2023130210
ISSN: 2219-3758 / 2311-9454 (online)

Development of an ETL Process Based on Open Source Technologies to Solve the Problem of Data Delivery to Consumers

324

V.V. Starkov, S.S. Gorbatova, V.I. Vodolaga

Abstract

The article discusses the issues of developing an ETL process for a data warehouse based on open source technologies, instead of private software supplied by the vendor. The process allows you to deliver data from the source to the consumer, focusing on the speed of delivery, the resources spent and the convenience of development. The architecture for solving the problem with a description of the processes being replaced is presented, data transmission over a new process is implemented. Modern tools used to work with data are involved, methods of interaction with them and selection of technical characteristics for the process are described.

General Information

Keywords: database, open source, software, ETL process, data delivery

Journal rubric: Software

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2023130210

Received 12.04.2023

Published 12.05.2023

For citation: Starkov, V.V., Gorbatova, S.S., Vodolaga, V.I. (2023). Development of an ETL Process Based on Open Source Technologies to Solve the Problem of Data Delivery to Consumers. Modelling and Data Analysis, 13(2), 180–193. (In Russ.). https://doi.org/10.17759/mda.2023130210

License: CC BY-NC 4.0

References

David Loshin. ETL (Extract, Transform, Load) . Business Intelligence. — 2nd. — Morgan Kaufmann, 2012. — 400 p
Ralph Kimball, Joe Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. — John Wiley & Sons, 2004. — 528 p.
David Haertzen. ETL Tools . The Analytical Puzzle: Profitable Data Warehousing, Business Intelligence and Analytics. — Technics Publications, 2012. — 346 p.
S. Riza, U. Lezerson, Sh. Ouen, D. Uills. Spark dlya professionalov: sovremennye patterny obrabotki bol'shikh dannykh = Advanced Analytics with Spark. Patterns for Learning from Data at Scale (O’Reilly, 2015). 2017. — 272 p.
Uorren R., Karau Kh. Effektivnyi Spark. Masshtabirovanie i optimizatsiya = High Performance Spark. Best Practices for Scaling and Optimizing Apache Spark. 2018. — 352 s.
Kh. Karau, E. Konvinski, P. Vendell, M. Zakhariya. Izuchaem Spark. Molnienosnyi analiz dannykh = Learning Spark: Lightning-Fast Big Data Analytics (O’Reilly, 2015). 2015. — 304 s.
Narkhid Niya, Shapira Gven, Palino Todd. Apache Kafka. Potokovaya obrabotka i analiz dannykh. — SPb., 2019 p = 320.
Vohra, Deepak (October 2016). Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools (1st ed.). Apress. p. 429.

Information About the Authors

Viacheslav V. Starkov, e-mail: starkov.viatcheslav@yandex.ru

Svetlana S. Gorbatova, Senior Lecturer, Moscow Institute of Steel and Alloys (National Research Technological University) (NUST MISIS), Moscow, Russian Federation, ORCID: https://orcid.org/0009-0005-5213-6780, e-mail: ssgorbatova@misis.ru

Victoria I. Vodolaga, Master's Degree, Lomonosov Moscow State University (MSU), Moscow, Russian Federation, ORCID: https://orcid.org/0009-0003-1816-0088, e-mail: vikavodolaga1@gmail.com

Metrics

Web Views

Whole time: 677
Previous month: 13
Current month: 14

PDF Downloads

Whole time: 324
Previous month: 11
Current month: 1

Total

Whole time: 1001
Previous month: 24
Current month: 15

PlumX

article metrics