MODELING THE DATABASE ARCHITECTURE FOR PUBLIC TRANSPORTATION BIG DATA WITH A FOCUS ON DATA INTEGRATION-ETL
The process of analyzing and planning public transport systems has some aspects that must be treated as an integrated part, from understanding the phenomenon to analyzing the data that represents it. There is a gap in the work that proposes to use massive transport data, regarding the inadequate treatment of the data or lack of structuring to store it. The objective of this work is to present a method for structuring a public transport database, using transformation, mining and natural language processing techniques. The method is divided into: Contextualization; Cleaning and transformation; and Loading and evaluation. The results demonstrate that data produced by humans presents more inconsistency than data generated by machines. The GPS and Ticketing bases, in addition to being integrated, achieved a compatibility and treatment rate above the usual average of 65%.
MODELING THE DATABASE ARCHITECTURE FOR PUBLIC TRANSPORTATION BIG DATA WITH A FOCUS ON DATA INTEGRATION-ETL
-
DOI: https://doi.org/10.22533/at.ed.317452429016
-
Palavras-chave: BIG DATA; PUBLIC TRANSPORT; ETL; ELECTRONIC TICKETING
-
Keywords: BIG DATA; PUBLIC TRANSPORT; ETL; ELECTRONIC TICKETING
-
Abstract:
The process of analyzing and planning public transport systems has some aspects that must be treated as an integrated part, from understanding the phenomenon to analyzing the data that represents it. There is a gap in the work that proposes to use massive transport data, regarding the inadequate treatment of the data or lack of structuring to store it. The objective of this work is to present a method for structuring a public transport database, using transformation, mining and natural language processing techniques. The method is divided into: Contextualization; Cleaning and transformation; and Loading and evaluation. The results demonstrate that data produced by humans presents more inconsistency than data generated by machines. The GPS and Ticketing bases, in addition to being integrated, achieved a compatibility and treatment rate above the usual average of 65%.
- Kaio Gefferson de Almeida Mesquita
- Kleberson Leandro da Rocha
- Gabriel Amorim Rabelo Nobre