Deploying a scalable data science environment using Docker

Colebrook Santamaría, Marcos Alejandro; Martín Santana, Sergio; Pérez González, Carlos Javier; Roda García, José Luis; González Yanes, Pedro

doi:10.1007/978-3-319-95651-0_7

Ver/Abrir

Exportar Citas

Colecciones

Registro completo

Mostrar el registro completo del ítem

Autor

Colebrook Santamaría, Marcos Alejandro

; Martín Santana, Sergio; Pérez González, Carlos Javier

; Roda García, José Luis

; González Yanes, Pedro

Fecha

2019

URI

http://riull.ull.es/xmlui/handle/915/26945

Resumen

Within the Data Science stack, the infrastructure layer supporting the distributed computing engine is a key part that plays an important role in order to obtain timely and accurate insights in a digital business. However, sometimes the expense of using such Data Science facilities in a commercial cloud infrastructure is not affordable to everyone. In this sense, we present a computing environment based on free software tools over commodity computers. Thus, we show how to deploy an easily scalable Spark cluster using Docker including both Jupyter and RStudio that support Python and R programming languages. Moreover, we present a successful case study where this computing framework has been used to analyze statistical results using data collected from meteorological stations located in the Canary Islands (Spain)

Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)

Excepto si se señala otra cosa, la licencia del ítem se describe como Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)