Portal de Programas de Pós-Graduação (UFBA)

SIGAA - Sistema Integrado de Gestão de Atividades Acadêmicas

PGCOMP/IC PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO (PGCOMP) INSTITUTO DE COMPUTAÇÃO Phone: (71) 3283-5750 E-mail: pgcomp@ufba.br https://posgraduacao.ufba.br/pgcomp

Banca de DEFESA: CLICIA DOS SANTOS PINTO

Uma banca de DEFESA de DOUTORADO foi cadastrada pelo programa.
DISCENTE : CLICIA DOS SANTOS PINTO
DATA : 28/07/2020
HORA: 15:00
LOCAL: Virtual (Google Meet)
TÍTULO:

Exploiting heterogeneous computing techniques to address probabilistic big data linkage

PALAVRAS-CHAVES:

Data linkage. Load balancing. Heterogeneous parallel computing. Graphical accelerators.

PÁGINAS: 85
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
SUBÁREA: Sistemas de Computação
RESUMO:

Although heterogeneous computing is a powerful approach to solve computationally intensive problems, its performance and efficiency highly depend on the workload to which they are exposed. Managing large volumes of data in heterogeneous environments involves choosing efficient scheduling and partitioning algorithms that minimize the response time and the volume of communication among processing units while ensuring scalability. This requirement has become more urgent as the devices composing such heterogeneous platforms become more numerous and diversified. This work presents a methodology for using heterogeneous computing techniques over hybrid CPU+GPU environments to allow for data and task distribution within big data linkage applications. This methodology was integrated into the AtyImo tool, which was partially developed during this research to provide probabilistic record linkage. As proof of concept, the implemented solution was used to integrate a large-scale (100 million records) socioeconomic database with public health data from disparate governmental sources. The proposed methodology is able to perform 1x10ˆ12 pairwise comparison in around one hour, which is a quite prominent result amongst existing data linkage tools. Observed results evidence that the developed solution achieves good performance and can be an alternative to solve scalability issues in data linkage contexts. The possibility of probabilistically linking massive datasets using hybrid architectures and exploring the heterogeneous nature of available resources with an efficient execution time are the main contributions of this work.

MEMBROS DA BANCA:
Presidente - 2810986 - MARCOS ENNES BARRETO
Interno - 2215121 - GEORGE MARCONI DE ARAUJO LIMA
Interno - 1850683 - MAYCON LEONE MACIEL PEIXOTO
Externo à Instituição - ESBEL TOMÁS VALERO ORELLANA - UESC-BA
Externo à Instituição - RODRIGO DA ROSA RIGHI - Unisinos

Notícia cadastrada em: 10/11/2020 15:39