RT info:eu-repo/semantics/masterThesis T1 Machine learning and NLP approaches in address matching A1 Syne, Lamine A2 Máster Universitario en Ciberseguridad e Inteligencia de Datos Por la Ull AB The object of this project is to explore machine learning and NLP potenal to theaddress matching sub-field of geographic informaon science. To achieve this a deepstudy about word and sentence embeddings models was made, how they work andhow they can be used to generate numerical representaons of an address.For each word or sentence embedding model we generate vector representaon ofaddresses in the database and calculate the cosine similarity between them in orderto know which ones represent the same geographic posion or not.On the other hand we introduce the confusion matrix for evaluang performance ofeach model on a dataset of already matched addresses created from ISTAC [1] datasources and make a comparison study between the models.Finally, a use case example will be shown by choosing the most performing modelamong those one studied above. This last one can be a debut for building a powerfultool for matching address pairs in all Canary Islands.Key words : machine learning, NLP, language model, address matching, wordembedding, similarity YR 2022 FD 2022 LK http://riull.ull.es/xmlui/handle/915/31641 UL http://riull.ull.es/xmlui/handle/915/31641 LA en DS Repositorio institucional de la Universidad de La Laguna RD 24-nov-2024