Banca de DEFESA: PÉTALA GARDÊNIA DA SILVA ESTRELA TUY

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
DISCENTE : PÉTALA GARDÊNIA DA SILVA ESTRELA TUY
DATA : 15/04/2020
HORA: 10:00
LOCAL: https://eu.yourcircuit.com/guest?token=40fd2a9a-0272-440e-a08c-e46875641a07
TÍTULO:

On the use of fuzzy clustering to build fuzzy rule-based systems to address Big Data


PALAVRAS-CHAVES:

1.Fuzzy Clustering. 2. Big Data. 3. MapReduce


PÁGINAS: 91
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
SUBÁREA: Sistemas de Computação
ESPECIALIDADE: Arquitetura de Sistemas de Computação
RESUMO:

Big Data is a trending topic that has gained attention in the business and academic environments. The term refers to the huge amount of data being generated every day in a variety of sources and formats. An expressive part of Big Data is in the format of text that can be used to solve various real life problems, such as spam detection, author identification, web pages classification and sentiment analysis. Text datasets are specially complicated since its high dimensionality can extend from vertical high dimensionality (high number of instances) to horizontal high dimensionality (high number of attributes). In order to extract useful knowledge from such high dimensional datasets, data analysis techniques must be able to cope with its new challenges: volume, velocity, variety and variability. Fuzzy Rule-Based Classification Systems (FRBCS) have shown to effectively deal with the uncertainty, vagueness, and noise inherent to data. However, the performance of FRBCSs is highly affected by the increasing number of instances and attributes present in Big Data. Previously proposed approaches try to adapt FRBCSs to Big Data by distributing data processing with the MapReduce paradigm, by which the data is processed in two stages: Map and Reduce. In the Map stage, the data is divided into multiple blocks and distributed among processing nodes that process each block of data independently. In the Reduce stage, the results coming from every node in the Map stage are aggregated and a final result is returned. This methodology tackles vertical high dimensionality, but it does not approach datasets with simultaneous vertical and horizontal high dimensionality, as it is the case of text datasets. Horizontal high dimensionality reduction could be done by using common feature selection techniques, such as MI and Chi-squared. However, using such feature selection techniques may not be the best alternative since model accuracy might be affected by the loss of information when keeping only a subset of attributes. In this work, we deal with the aforementioned drawbacks by proposing Summarizer, an approach for building reduced feature spaces for horizontally high dimensional data.  To this end, we carry out an empirical study that compares a well-known classifier proposed for vertical high dimensionality datasets with and without the horizontal dimensionality reduction process proposed by Summarizer. Our findings show that existing classifiers that tackles vertical Big Data problems can be improved by adding the Summarizer approach to the learning process, which suggests that an unified learning algorithm for datasets with a high number of instances as well as a high number of attributes might be possible.


MEMBROS DA BANCA:
Interno - 2810986 - MARCOS ENNES BARRETO
Interno - 2115505 - TATIANE NOGUEIRA RIOS
Externo à Instituição - MATHEUS GIOVANNI PIRES - UEFS
Notícia cadastrada em: 18/05/2020 12:52
SIGAA | STI/SUPAC - - | Copyright © 2006-2024 - UFBA