Identification of linear regression parameters and a mathematical rule for generating values of a dummy variable included in the model

 
Audio is AI-generated
37

Abstract

Context and relevance. The task of constructing regression models in a situation where explanatory variables are categorical is currently relevant. Categories can reflect, for example, gender (male and female), level of higher education (bachelor's degree, specialist degree, and master's degrees), or seasons (winter, spring, summer, and autumn). To introduce dichotomous categories into the model, dummy variables are used. These take the value 1 if a certain feature is present, and 0 if it is absent. The values ​​of these dummy variables are often determined by predefined rules. Objective. The aim of the work is to formulate a problem of simultaneous identification of parameters of multiple linear regression models and values of dummy variables included in them. Hypothesis. The hypothesis is that the formulated problem can be formalized in terms of the apparatus of mathematical programming. Methods and materials. The least absolute deviations method is used to estimate the unknown regression parameters. Results. The task of identifying these features is formalized as a mixed 0-1 integer linear programming problem. Its solution gives not only estimates of the linear regression and optimal values of the dummy variable, but also a rule by which these values are identified. The rule is based on the transformation of a linear combination of explanatory variables using the integer function floor. Based on known data on the operation of an evaporator at a large industrial enterprise, 4 types of linear regressions with a dummy variable were constructed. Conclusions. Since the process of constructing linear regressions using our method also identifies a rule for forming a dummy variable, it allows them to be used for solving forecasting problems. After identifying the rule, it is possible, if necessary, to re-estimate the obtained linear regressions, for example, using the ordinary least squares method, which has a more extensive set of model validation criteria.

General Information

Keywords: regression analysis, dummy variable, integer function floor, least absolute deviations method, mixed 0-1 integer linear programming problem

Journal rubric: Numerical Methods

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2025150208

Received 10.04.2025

Accepted

Published

For citation: Bazilevskiy, M.P. (2025). Identification of linear regression parameters and a mathematical rule for generating values of a dummy variable included in the model. Modelling and Data Analysis, 15(2), 139–151. (In Russ.). https://doi.org/10.17759/mda.2025150208

© Bazilevskiy M.P., 2025

License: CC BY-NC 4.0

References

  1. Базилевский, М.П. (2024). Интеграция ограничений на коэффициенты интеркорреляций в оптимизационную задачу и условия построения вполне интерпретируемых неэлементарных линейных регрессий. Вестник Томского государственного университета. Управление, вычислительная техника и информатика, (69), 31––41.
    Bazilevskiy, M.P. (2024). Integration of constraints on intercorrelation coefficients into the optimization problem and conditions for constructing quite interpretable non-elementary linear regressions. Tomsk State University Journal of Control and Computer Science, (69), 31—41. (In Russ.).
  2. Базилевский, М.П. (2024). Оценивание с помощью метода наименьших модулей регрессионных моделей с целочисленными функциями пол и потолок. International Journal of Open Information Technologies, 12(10), 56––61.
    Bazilevskiy, M.P. (2024). Estimation using least absolute deviations method of regression models with integer floor and ceiling functions. International Journal of Open Information Technologies, 12(10), 56—61. (In Russ.).
  3. Базилевский, М.П., Носков, С.И. (2017). Программный комплекс построения линейной регрессионной модели с учётом критерия согласованности поведения фактической и расчетной траекторий изменения значений объясняемой переменной. Вестник Иркутского государственного технического университета, 21(9), 37––44.
    Bazilevskiy, M.P., Noskov, S.I. (2017). Program complex for linear regression model construction considering behavior consistency criterion of actual and calculated trajectories of explained variable value change. Proceedings of Irkutsk State Technical University, 21(9), 37—44. (In Russ.).
  4. Базилевский, М.П., Ойдопова, А.Б. (2023). Оценивание модульных линейных регрессионных моделей с помощью метода наименьших модулей. Вестник Пермского национального исследовательского политехнического университета. Электротехника, информационные технологии, системы управления, (45), 130––146.
    Bazilevskiy, M.P., Oydopova, A.B. (2023). Estimation of modular linear regression models using the least absolute deviations. Bulletin of Perm National Research Polytechnic University. Electrotechnics, Informational Technologies, Control Systems, (45), 130—146. (In Russ.).
  5. Горидько, Н.П. (2016). Методика сценарного прогнозирования изменения экономических показателей региона с использованием фиктивных переменных. Известия Волгоградского государственного технического университета, (7), 12––17.
    Gorid'ko, N.P. (2016). Methods of scenario forecasting for changes of regional economic indicators using dummy variables. Izvestia VSTU, (7), 12––17. (In Russ.).
  6. Грэхем, Р., Кнут, Д., Паташник, О. (1998). Конкретная математика. Основание информатики. М.: Мир.
    Grekhem, R., Knut, D., Patashnik, O. (1998). Concrete Mathematics. A Foundation for Computer Science. Moscow: Mir. (In Russ.).
  7. Дрейпер, Н., Смит, Г. (1986). Прикладной регрессионный анализ. Книга 1. М.: Финансы и статистика.
    Dreyper, N., Smit, G. (1986). Applied Regression Analysis. Part 1. Moscow: Finance & Statistics. (In Russ.).
  8. Дрейпер, Н., Смит, Г. (1987). Прикладной регрессионный анализ. Книга 2. М.: Финансы и статистика.
    Dreyper, N., Smit, G. (1987). Applied Regression Analysis. Part 2. Moscow: Finance & Statistics. (In Russ.).
  9. Елисеева, И.И., Курышева, С.В. (2010). Фиктивные переменные в анализе данных. Социология: методология, методы, математическое моделирование, (30), 43­––63.
    Eliseeva, I.I., Kurysheva, S.V. (2010). Dummy variables in data analysis. Sociology: Methodology, Methods, Mathematical Modeling, (30), 43—63. (In Russ.).
  10. Кривошеева, М.В., Подкур, П.Н. (2021). Исследование цен на однокомнатные квартиры с помощью фиктивных переменных. Россия молодая, 95020.1––95020.4.
    Krivosheeva, M.V., Podkur, P.N. (2021). A Study of one-room apartment prices using dummy variables. Young Russia, 1––95020.4. (In Russ.).
  11. Крылова, Е.А., Ефимова, Н.Р. (2019). Использование фиктивных переменных при моделировании числа зарегистрированных преступлений по Республике Саха (Якутия). Южно-Сибирский научный вестник, (2), 111––116.
    Krylova, E.A., Efimova, N.R. (2019). Application of indicator variables in modeling of number of recorded crimes in the Republic of Sakha (Yakutia). South-Siberian Scientific Bulletin, (2), 111—116. (In Russ.).
  12. Носков, С.И. (1996). Технология моделирования объектов с нестабильным функционированием и неопределенностью в данных. Иркутск: РИЦ ГП «Облинформпечать».
    Noskov, S.I. (1996). Technology for modeling objects with unstable functioning and uncertainty in data. Irkutsk: RITZ GP Oblinformpechat’. (In Russ.).
  13. Шумилина, В.Е., Цвиль, М.М. (2020). Построение модели регрессии по временным рядам с целью прогнозирования индекса производительности труда в Российской Федерации. Вестник евразийской науки, 12(1), 73.
    Shumilina, V.E., Tsvil', M.M. (2020). Building a time series regression model with the aim of predicting the labor productivity index in the Russian Federation. The Eurasian Scientific Journal, 12(1), 73. (In Russ.).
  14. Cui, J., Zhang, J., Liu, C., Liu, Y., Guo, S. (2024). Dummy variable threshold effect model and its economic applications. Journal of Nonlinear and Convex Analysis, 25(4), 779––788.
  15. Khan, F., Muhammadullah, S., Sharif, A., Lee, C.C. (2024). The role of green energy stock market in forecasting China's crude oil market: An application of IIS approach and sparse regression models. Energy Economics, 130, 107269. https://doi.org/10.1016/j.eneco.2023.107269
  16. Koch, T., Berthold, T., Pedersen, J., Vanaret, C. (2022). Progress in mathematical programming solvers from 2001 to 2020. EURO Journal on Computational Optimization, 10, 100031. https://doi.org/10.1016/j.ejco.2022.100031
  17. Konno, H., Yamamoto, R. (2009). Choosing the best set of variables in regression analysis using integer programming. Journal of global optimization, 44, 273––282. https://doi.org/10.1007/s10898-008-9323-9
  18. Sahoo, M. (2021). COVID‐19 impact on stock market: Evidence from the Indian stock market. Journal of Public Affairs, 21(4), e https://doi.org/10.1002/pa.2621
  19. Singh, H., Das, A., Dey, S., Narsimhaiah, L., Pandit, P., Sinha, K., Sahu, P.K., Mishra, P. (2023). A study on academic attainment of agriculture students and its correlates: a dummy regression approach. Annals of Data Science, 10(1), 129––152. https://doi.org/10.1007/s40745-020-00275-z
  20. Wagner, H.M. (1959). Linear programming techniques for regression analysis. Journal of the American Statistical Association, 54(285), 206––212. https://doi.org/10.1080/01621459.1959.10501506
  21. Wang, S., Chen, Y., Cui, Z., Lin, L., Zong, Y. (2024). Diabetes risk analysis based on machine learning LASSO regression model. Journal of Theory and Practice of Engineering Science, 4(01), 58––64. https://doi.org/10.53469/jtpes.2024.04(01).08
  22. Wang, Z.X., He, L.Y., Zhao, Y.F. (2021). Forecasting the seasonal natural gas consumption in the US using a gray model with dummy variables. Applied Soft Computing, 113, 108002. https://doi.org/10.1016/j.asoc.2021.108002
  23. Zhou, W., Cheng, Y., Ding, S., Chen, L., Li, R. (2021). A grey seasonal least square support vector regression model for time series forecasting. ISA transactions, 114, 82––98. https://doi.org/10.1016/j.isatra.2020.12.024

Information About the Authors

Mikhail P. Bazilevskiy, Candidate of Science (Engineering), Associate Professor, Department of Mathematics, Irkutsk State Transport University (ISTU), Irkutsk, Russian Federation, ORCID: https://orcid.org/0000-0002-3253-5697, e-mail: mik2178@yandex.ru

Metrics

 Web Views

Whole time: 115
Previous month: 45
Current month: 8

 PDF Downloads

Whole time: 37
Previous month: 4
Current month: 0

 Total

Whole time: 152
Previous month: 49
Current month: 8