Please use this identifier to cite or link to this item: http://hdl.handle.net/10773/17427
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGouveia, Sóniapt
dc.contributor.authorScotto, Manuel G.pt
dc.contributor.authorWeiss, Christian H.pt
dc.contributor.authorFerreira, Paulo Jorge S. G.pt
dc.date.accessioned2017-05-15T13:00:24Z-
dc.date.issued2017-02-
dc.identifier.issn1467-9876pt
dc.identifier.urihttp://hdl.handle.net/10773/17427-
dc.description.abstractSymbolic or categorical sequences occur in any contexts and can be characterized, for example, by integer-valued intersymbol distances or binary-valued indicator sequences. The analysis of these numerical sequences often sheds light on the properties of the original symbolic sequences. This work introduces new statistical tools for exploring auto-correlation structure in the indicator sequences, for the specific case of deoxyribonucleic acid (DNA) sequences. It is known that the probability distribution of internucleotide distances of DNA sequences deviates significantly from the distribution obtained by assuming independent random placement (i.e. the geometric distribution) and that the deviations can be used either to discriminate between species or to build phylogenetic trees. To investigate the extent to which auto-correlation structure explains these deviations, the 0–1 indicator sequence of each nucleotide (A, C, G and T) is endowed with a binary auto-regressive (AR) model of optimum order. The corresponding binary AR geometric distribution is derived analytically and compared with the observed internucleotide distance distribution by appropriate goodness-of-fit testing. Results in 34 mitochondrial DNA sequences show that the hypothesis of equal observed/expected frequencies is seldom rejected when a binary AR model is considered instead of independence (76/136 versus 125/136 rejections at the 1% level), in spite of chi-square testing tending to reject for large samples, regardless of how close observed/expected values are. Furthermore, binary AR structure also leads to a median discrepancy reduction of 90% for G, 80% for C, 60% for T and 30% for nucleotide A. Therefore, these models are useful to describe the dependences within a given nucleotide and encourage the development of a model-based framework to compact internucleotide distance information and to understand DNA differences among species further.pt
dc.language.isoengpt
dc.publisherWileypt
dc.relationFCT-UID/CEC/00127/2013pt
dc.relationFCT-UID/MAT/04106/2013pt
dc.relationFCT - SFRH/BPD/87037/2012pt
dc.rightsrestrictedAccesspor
dc.subjectBinary auto-regressive modelspt
dc.subjectχ2-testingpt
dc.subjectDNA sequence analysispt
dc.subjectGeometric distributionpt
dc.subjectInternucleotide distancespt
dc.titleBinary auto-regressive geometric modelling in a DNA contextpt
dc.typearticlept
dc.peerreviewedyespt
ua.distributioninternationalpt
degois.publication.firstPage253pt
degois.publication.issue2pt
degois.publication.lastPage271pt
degois.publication.titleJournal of the Royal Statistical Society: Series Cpt
degois.publication.volume66pt
dc.date.embargo10000-01-01-
dc.identifier.doi10.1111/rssc.12172pt
Appears in Collections:CIDMA - Artigos
IEETA - Artigos
PSG - Artigos

Files in This Item:
File Description SizeFormat 
2017 Gouveiaetal.pdf778.66 kBAdobe PDFrestrictedAccess


FacebookTwitterLinkedIn
Formato BibTex MendeleyEndnote Degois 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.