Desarrollo de un sistema de lectura de textos sobre imágenes de video

Merino Gracia, Carlos

dc.contributor.advisor	Sigut Saavedra, José Francisco
dc.contributor.advisor	González Mora, José Luis
dc.contributor.author	Merino Gracia, Carlos
dc.contributor.other	Programa de doctorado en Física e Ingeniería
dc.date.accessioned	2022-02-14T16:19:36Z
dc.date.available	2022-02-14T16:19:36Z
dc.date.issued	2015
dc.identifier.uri	http://riull.ull.es/xmlui/handle/915/26489
dc.description.abstract	Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text region¿s identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions.	en_EN
dc.format.mimetype	application/pdf
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	Desarrollo de un sistema de lectura de textos sobre imágenes de video	es_ES
dc.title.alternative	Development of a text reading system on video images	en
dc.type	info:eu-repo/semantics/doctoralThesis
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.subject.keyword	Imagen	es_ES
dc.subject.keyword	Tecnología	es_ES
dc.subject.keyword	Informática	es_ES

Ficheros en el ítem

Nombre:: Carlos Merino García.pdf
Tamaño:: 95.29Mb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

TD. Arquitectura e Ingenierías
Tesis de Arquitectura Técnica, Ingeniería Agraria, Ingeniería Civil, Náutica, Máquinas y Radioelectrónica Naval y de Ingeniería Electrónica, Industrial y Automática, Ingeniería Mecánica e Ingeniería Química Industrial, etc.

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional