Author Topic: genbank 2020/01/28  (Read 104 times)

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 223
    • View Profile
genbank 2020/01/28
« on: January 29, 2020, 09:59:10 pm »
genbank release 235 of 2019/12/15 had 35 files with viral sequences
see https://ftp.ncbi.nlm.nih.gov/genbank/gbrel.txt
this gave xxxxx genbank records with 15.1 GB in total

31686 of these had the keyword "orona" in them , making 284 MB for the records,
120.2MB for the nucleotid sequences
one was removed, which had >300000 nucleotides

file oro2.nuc   , 31685 nucleotide sequences of lengths  40-33578 2911 of these with lengths > 25000

8766 of the 31685 were filtered as having some common epitope with the new Wuhan strain.
(my program epifox from 2013 , epifox wuhan oro2.nuc 28 15 m5 )
file oro3.nuc

better,stronger filtering gives   366 in file oro4.nuc , lengths 130-30256 ,  255 of these of length > 25000









 
« Last Edit: January 29, 2020, 10:47:46 pm by gsgs »

Share on Facebook Share on Twitter