Outlier detection: a procedure to capture atypical groups of observations

Tavares, Ana Helena; Vera Afreixo; Brito, Paula

Please use this identifier to cite or link to this item: http://hdl.handle.net/10773/35321

Full metadata record

DC Field	Value	Language
dc.contributor.author	Tavares, Ana Helena	pt_PT
dc.contributor.author	Vera Afreixo	pt_PT
dc.contributor.author	Brito, Paula	pt_PT
dc.date.accessioned	2022-11-28T11:15:20Z	-
dc.date.available	2022-11-28T11:15:20Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://hdl.handle.net/10773/35321	-
dc.description.abstract	In this work, we introduce the concept of atypical group of observations and propose a procedure for its identification. By atypical group, we mean a cluster of observations whose ‘mean’ pattern stands out from the majority of the ‘mean’ patterns of the remaining clusters. Challenges that arise in atypical group detection are firstly to identify a meaningful segmentation of the data, and secondly to flag the atypical segments. Our work focus on data whose elements are discrete distributions. If heterogeneous datasets, where distinct patterns coexist, can validly be clustered, then the class prototypes provide a simplified description of data. Thus, the key idea of our proposal is to combine a clustering method with a functional outlyingness criterion to capture atypical class prototypes. To identify a segmentation of the distributional data we iteratively combine two steps. The first creates a hierarchy of clusters, while the second flags atypical curves within each cluster, based on a measure of functional outlyingness which accounts for the shape of the distributions [1]. Segments with atypical curves, are forwarded for (sub)clustering, and the procedure is repeated until no outlying curves are identified in clusters. Once the final partition is obtained, each cluster is represented by a class prototype, whose outlyingness is evaluated according to the same functional approach. Clusters with an atypical class prototype are pointed as atypical. We apply our procedure to investigate clusters of genomic words in human DNA by studying their inter-word lag distributions. These experiments demonstrate the potential of the new method for identifying clusters of words with outlying patterns.	pt_PT
dc.language.iso	eng	pt_PT
dc.publisher	CLAD, FEUP	pt_PT
dc.relation	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04106%2F2020/PT	pt_PT
dc.relation	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F04106%2F2020/PT	pt_PT
dc.rights	restrictedAccess	pt_PT
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	pt_PT
dc.subject	Outlyingness	pt_PT
dc.subject	Clustering	pt_PT
dc.subject	Distributional data	pt_PT
dc.subject	Functional data	pt_PT
dc.title	Outlier detection: a procedure to capture atypical groups of observations	pt_PT
dc.type	conferenceObject	pt_PT
dc.description.version	published	pt_PT
dc.peerreviewed	yes	pt_PT
ua.event.date	19-23 july, 2022	pt_PT
degois.publication.location	Porto	pt_PT
degois.publication.title	17th Conference of the International Federation of Classification Societies, IFCS 2022	pt_PT
Appears in Collections:	CIDMA - Comunicações ESTGA - Comunicações PSG - Comunicações

Files in This Item:

File	Description	Size	Format
IFCS2022_AnaTavares.pdf		101.17 kB	Adobe PDF

Show simple item record