RT info:eu-repo/semantics/bookPart
T1 Deploying a scalable data science environment using Docker
A1 Colebrook Santamaría, Marcos Alejandro
A1 Martín Santana, Sergio
A1 Pérez González, Carlos Javier
A1 Roda García, José Luis
A1 González Yanes, Pedro
A2 Ingeniería Informática y de Sistemas
A2 Investigación Operativa
TARO: Ingeniería del Software y Bases de Datos
AB Within the Data Science stack, the infrastructure layer supporting thedistributed computing engine is a key part that plays an important role in order toobtain timely and accurate insights in a digital business. However, sometimes theexpense of using such Data Science facilities in a commercial cloud infrastructureis not affordable to everyone. In this sense, we present a computing environmentbased on free software tools over commodity computers. Thus, we show how todeploy an easily scalable Spark cluster using Docker including both Jupyter andRStudio that support Python and R programming languages. Moreover, we presenta successful case study where this computing framework has been used to analyzestatistical results using data collected from meteorological stations located in theCanary Islands (Spain)
YR 2019
FD 2019
LK http://riull.ull.es/xmlui/handle/915/26945
UL http://riull.ull.es/xmlui/handle/915/26945
LA en
DS Repositorio institucional de la Universidad de La Laguna
RD 04-abr-2025