Coding sequences (CDSs) of genomes are generally not structured. This is in agreement with all the benefits of RANDFOLD analysis: most ( out of intergenic families with aligned SLSs (Table are enriched in very structured SLSs,though this can be accurate for only one particular genic family members,Myp. These observations support the all round hypothesis that a lot of of those sequence households fold inside a secondary structure at the RNA level,especially these positioned in intergenic regions,where the translation machinery is not expected to interfere with secondary structure formation.situations of exogenous DNA uptake signal sequences . Some novel structured families are located within CDSs. They usually include repetitive motifs of one particular or even a few coding regions,including Lac in L. johnsonii,Pae in P. aeruginosa and Efa in E. faecalis. Interestingly,the Cod household defines an extremely little repeat,discovered inside numerous CDSs,encoding distinct peptides in diverse frames. Cod repeats resemble repetitive sequence elements located by Claverie and coworkers in protein coding genes of R. conorii . Five genic families found in M. pneumoniae are part of substantial ( kb),possibly mobile repeated DNA sequences CP-533536 free acid supplier obtaining coding capacity . About one particular third on the identified households are located to be “unstructured”. These sequences were not the object from the original search; a possible explanation of their detection would be the incidental presence of SLSs inside substantial repeated sequences. Most such families fall within CDSs (see Table ,and Myt in Figure as an example). Ten of them are contributed by only two genomes: M. tuberculosis and M. pneumoniae. Other unstructured families are clustered within the same CDS (Bor and Bor in B. bronchiseptica) or are dispersed within numerous CDSs,sharing a popular protein domain (Bor and Bor in B. bronchiseptica,Pae and Ppu in P. aeruginosa and P. putida,respectively).ConclusionA systematic analysis of bacterial genomes is presented,aimed to recognize repeated sequence households,sharing a typical predicted secondary structure. This procedure identified virtually all already described households meeting these constraints,as well as a bigger variety of novel,undescribed nucleic acid repeats. About two thirds in the households shared a predicted conserved secondary structure,frequently a stemloop based 1. Interestingly,these families are mainly composed by elements situated inside intergenic regions. This localization reflects the hypothesis that RNA folding,within these regions,is additional likely to occur,not becoming affected by the translation machinery. The identification of repetitive sequence households,able to fold into secondary structures and preferentially located PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/24778222 inside intergenic regions,reinforces the notion that also in prokaryotic genomes,generally much more compact than eukaryotic ones,a fairly substantial fraction,not coding for proteins,is probably to play a biological part,by encoding functional RNAs.Three novel intergenic structured families,Hin in H. influenzae,Nem in N. meningitidis and Pam in P. multocida are composed of similar sequences,characterized by the repetition of short,abundant oligonucleotides,referred to as DUS . The recurrence,at precise short distances,of this fundamental oligonucleotide module,shorter than the searched pattern,produces a conserved SLS bigger than the required threshold. It is achievable that these sequences function as transcriptional terminators,and it has been recently reported that terminator hairpins are indeed frequently formed by closely spaced,complementaryPage of(p.