Mostrar el registro sencillo del ítem
Machine learning and NLP approaches in address matching
dc.contributor.advisor | Sánchez Berriel, Isabel | |
dc.contributor.advisor | Moreno de Antonio, Luz Marina | |
dc.contributor.author | Syne, Lamine | |
dc.contributor.other | Máster Universitario en Ciberseguridad e Inteligencia de Datos Por la Ull | |
dc.date.accessioned | 2023-02-27T10:15:18Z | |
dc.date.available | 2023-02-27T10:15:18Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | http://riull.ull.es/xmlui/handle/915/31641 | |
dc.description.abstract | The object of this project is to explore machine learning and NLP potenal to the address matching sub-field of geographic informaon science. To achieve this a deep study about word and sentence embeddings models was made, how they work and how they can be used to generate numerical representaons of an address. For each word or sentence embedding model we generate vector representaon of addresses in the database and calculate the cosine similarity between them in order to know which ones represent the same geographic posion or not. On the other hand we introduce the confusion matrix for evaluang performance of each model on a dataset of already matched addresses created from ISTAC [1] data sources and make a comparison study between the models. Finally, a use case example will be shown by choosing the most performing model among those one studied above. This last one can be a debut for building a powerful tool for matching address pairs in all Canary Islands. Key words : machine learning, NLP, language model, address matching, word embedding, similarity | en |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.rights | Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es_ES | |
dc.title | Machine learning and NLP approaches in address matching | |
dc.type | info:eu-repo/semantics/masterThesis |