RT info:eu-repo/semantics/bookPart T1 Deploying a scalable data science environment using Docker A1 Colebrook Santamaría, Marcos Alejandro A1 Martín Santana, Sergio A1 Pérez González, Carlos Javier A1 Roda García, José Luis A1 González Yanes, Pedro A2 Ingeniería Informática y de Sistemas A2 Investigación Operativa TARO: Ingeniería del Software y Bases de Datos AB Within the Data Science stack, the infrastructure layer supporting thedistributed computing engine is a key part that plays an important role in order toobtain timely and accurate insights in a digital business. However, sometimes theexpense of using such Data Science facilities in a commercial cloud infrastructureis not affordable to everyone. In this sense, we present a computing environmentbased on free software tools over commodity computers. Thus, we show how todeploy an easily scalable Spark cluster using Docker including both Jupyter andRStudio that support Python and R programming languages. Moreover, we presenta successful case study where this computing framework has been used to analyzestatistical results using data collected from meteorological stations located in theCanary Islands (Spain) YR 2019 FD 2019 LK http://riull.ull.es/xmlui/handle/915/26945 UL http://riull.ull.es/xmlui/handle/915/26945 LA en DS Repositorio institucional de la Universidad de La Laguna RD 18-nov-2024