Show simple item record

dc.contributor.advisorSánchez Berriel, Isabel 
dc.contributor.advisorMoreno de Antonio, Luz Marina 
dc.contributor.authorSyne, Lamine
dc.contributor.otherMáster Universitario en Ciberseguridad e Inteligencia de Datos Por la Ull
dc.date.accessioned2023-02-27T10:15:18Z
dc.date.available2023-02-27T10:15:18Z
dc.date.issued2022
dc.identifier.urihttp://riull.ull.es/xmlui/handle/915/31641
dc.description.abstractThe object of this project is to explore machine learning and NLP potenal to the address matching sub-field of geographic informaon science. To achieve this a deep study about word and sentence embeddings models was made, how they work and how they can be used to generate numerical representaons of an address. For each word or sentence embedding model we generate vector representaon of addresses in the database and calculate the cosine similarity between them in order to know which ones represent the same geographic posion or not. On the other hand we introduce the confusion matrix for evaluang performance of each model on a dataset of already matched addresses created from ISTAC [1] data sources and make a comparison study between the models. Finally, a use case example will be shown by choosing the most performing model among those one studied above. This last one can be a debut for building a powerful tool for matching address pairs in all Canary Islands. Key words : machine learning, NLP, language model, address matching, word embedding, similarityen
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.rightsLicencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.es_ES
dc.titleMachine learning and NLP approaches in address matching
dc.typeinfo:eu-repo/semantics/masterThesis


Files in this item

This item appears in the following Collection(s)

Show simple item record

Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)
Except where otherwise noted, this item's license is described as Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)