Please use this identifier to cite or link to this item: http://hdl.handle.net/10773/35321
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTavares, Ana Helenapt_PT
dc.contributor.authorVera Afreixopt_PT
dc.contributor.authorBrito, Paulapt_PT
dc.date.accessioned2022-11-28T11:15:20Z-
dc.date.available2022-11-28T11:15:20Z-
dc.date.issued2022-
dc.identifier.urihttp://hdl.handle.net/10773/35321-
dc.description.abstractIn this work, we introduce the concept of atypical group of observations and propose a procedure for its identification. By atypical group, we mean a cluster of observations whose ‘mean’ pattern stands out from the majority of the ‘mean’ patterns of the remaining clusters. Challenges that arise in atypical group detection are firstly to identify a meaningful segmentation of the data, and secondly to flag the atypical segments. Our work focus on data whose elements are discrete distributions. If heterogeneous datasets, where distinct patterns coexist, can validly be clustered, then the class prototypes provide a simplified description of data. Thus, the key idea of our proposal is to combine a clustering method with a functional outlyingness criterion to capture atypical class prototypes. To identify a segmentation of the distributional data we iteratively combine two steps. The first creates a hierarchy of clusters, while the second flags atypical curves within each cluster, based on a measure of functional outlyingness which accounts for the shape of the distributions [1]. Segments with atypical curves, are forwarded for (sub)clustering, and the procedure is repeated until no outlying curves are identified in clusters. Once the final partition is obtained, each cluster is represented by a class prototype, whose outlyingness is evaluated according to the same functional approach. Clusters with an atypical class prototype are pointed as atypical. We apply our procedure to investigate clusters of genomic words in human DNA by studying their inter-word lag distributions. These experiments demonstrate the potential of the new method for identifying clusters of words with outlying patterns.pt_PT
dc.language.isoengpt_PT
dc.publisherCLAD, FEUPpt_PT
dc.relationinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04106%2F2020/PTpt_PT
dc.relationinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F04106%2F2020/PTpt_PT
dc.rightsrestrictedAccesspt_PT
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectOutlyingnesspt_PT
dc.subjectClusteringpt_PT
dc.subjectDistributional datapt_PT
dc.subjectFunctional datapt_PT
dc.titleOutlier detection: a procedure to capture atypical groups of observationspt_PT
dc.typeconferenceObjectpt_PT
dc.description.versionpublishedpt_PT
dc.peerreviewedyespt_PT
ua.event.date19-23 july, 2022pt_PT
degois.publication.locationPortopt_PT
degois.publication.title17th Conference of the International Federation of Classification Societies, IFCS 2022pt_PT
Appears in Collections:CIDMA - Comunicações
ESTGA - Comunicações
PSG - Comunicações

Files in This Item:
File Description SizeFormat 
IFCS2022_AnaTavares.pdf101.17 kBAdobe PDFrestrictedAccess


FacebookTwitterLinkedIn
Formato BibTex MendeleyEndnote Degois 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.