Author Topic: genbank 2020-04-11  (Read 242 times)

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
genbank 2020-04-11
« on: April 11, 2020, 09:50:21 am »
COVID-sequences links :
Nov20,40432+107300+631

https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?VirusLineage_ss=Severe%20acute%20respiratory%20syndrome%20coronavirus%202,%20taxid:2697049&SeqType_s=Nucleotide
https://www.cogconsortium.uk/data/
https://civnb.info/sequences/
http://www.insdc.org/
https://www.ebi.ac.uk/




-------------------------------------------------------------------------------
other links :
(https://www.viprbrc.org/brc/home.spg?decorator=vipr).

flu : https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database

releases : https://ftp.ncbi.nih.gov/genbank/gbrel.txt



----------------------------------------------------------------
582 COVID-19 sequences (507 of these full genome) downloaded on Apr11 from :
[615 on Apr12)
(876 on Apr14 , 795 full)+102 full German ones from Drosten-tweet s.u.
(1207 on Apr21)

https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?VirusLineage_ss=Severe%20acute%20respiratory%20syndrome%20coronavirus%202,%20taxid:2697049&SeqType_s=Nucleotide

386 from USA , (earliest from Jan19, IL-Jan21,CA-Jan22,WA-Jan25
94 from China , (66 full , earliest from Dec.23
25 from Spain (earliest Feb26 ,
17 from Iran , (2 full, earliest Mar09
6 from Italy , (1 full, Jan30
...

dates = symptom-onset ?


mutation-picture : http://magictour.free.fr/sars2-4.GIF
rows are viruses , columns are RNA-nucleotide-positions , black pixels are mutations
horizontally and vertically sorted so to give maximum black connected components

community-spread in Washington State since at least Jan19,
one day before Zhong Nanshang raised the alarm on Chinese TV
and 4 days before Wuhan lockdown !

Jan19 was sample-collection, sequence-release date was Mar26
another sequence from WA , Jan25, release date Feb05
also already with that C27969T (my current enumeration) - mutation typical for WA

MN985325 , https://wwwnc.cdc.gov/eid/article/26/6/20-0516_article
China-traveler , symptom-onset = Jan16 , sampled Jan19 , USA-WA , genbank Mar27
https://www.ncbi.nlm.nih.gov/nuccore/MN985325
had already the C27969T mutation

---------------------------------------------

« Last Edit: February 06, 2021, 04:54:34 am by gsgs »

Share on Facebook Share on Twitter


gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #1 on: April 12, 2020, 12:30:37 am »
earliest outside China :

MN970003 MN970003 2020-01-23   290 Thailand lung, oronasopharynx 2020-01-08
MT072688 MT072688 2020-02-18 29811 Nepal oronasopharynx 2020-01-13
MN970004 MN970004 2020-01-23   290 Thailand lung, oronasopharynx 2020-01-13
MN985325 MN985325 2020-01-24 29882 USA oronasopharynx 2020-01-19
MT233526 MT233526 2020-03-26 29847 USA: WA oronasopharynx 2020-01-19
MT246667 MT246667 2020-03-26 29867 USA: WA oronasopharynx 2020-01-19
MN988713 MN988713 2020-01-25 29882 USA: Illinois lung, oronasopharynx 2020-01-21
MN997409 MN997409 2020-01-28 29882 USA: AZ feces 2020-01-22
MN994468 MN994468 2020-01-28 29883 USA: CA oronasopharynx 2020-01-22
MN994467 MN994467 2020-01-28 29882 USA: CA oronasopharynx 2020-01-23
MT007544 MT007544 2020-01-31 29893 Australia: Victoria  2020-01-25
MT192772 MT192772 2020-03-16 29891 Viet Nam: Ho Chi Minh city  2020-01-22
MT192773 MT192773 2020-03-16 29890 Viet Nam: Ho Chi Minh city  2020-01-22
LC523809 LC523809 2020-02-13   357 Philippines  2020-01-23
MT066159 MT066159 2020-02-14   290 Malaysia oronasopharynx 2020-01-24
MT066157 MT066157 2020-02-14   290 Malaysia oronasopharynx 2020-01-24
MT066158 MT066158 2020-02-14   290 Malaysia oronasopharynx 2020-01-24
MT192759 MT192759 2020-03-16 29862 Taiwan lung, oronasopharynx 2020-01-25
MT020881 MT020881 2020-02-05 29882 USA: WA oronasopharynx 2020-01-25
MT020880 MT020880 2020-02-05 29882 USA: WA oronasopharynx 2020-01-25
LC523808 LC523808 2020-02-13   357 Philippines  2020-01-26
LC522350 LC522350 2020-02-08   182 Philippines  2020-01-26
MT044258 MT044258 2020-02-12 29858 USA: CA oronasopharynx 2020-01-27
MT012098 MT012098 2020-03-06 29854 India: Kerala State oronasopharynx 2020-01-27
MT044257 MT044257 2020-02-12 29882 USA: IL lung, oronasopharynx 2020-01-28
MT039888 MT039888 2020-02-11 29882 USA: MA oronasopharynx 2020-01-29
MT027062 MT027062 2020-02-07 29882 USA: CA oronasopharynx 2020-01-29
MT027063 MT027063 2020-02-07 29882 USA: CA oronasopharynx 2020-01-29
MT027064 MT027064 2020-02-07 29882 USA: CA oronasopharynx 2020-01-29
MT020781 MT020781 2020-02-05 29806 Finland  2020-01-29
MT066156 MT066156 2020-03-09 29867 Italy lung, oronasopharynx 2020-01-30
MT050493 MT050493 2020-03-06 29851 India: Kerala State oronasopharynx 2020-01-31
MT066175 MT066175 2020-02-14 29870 Taiwan  2020-01-31
MT039887 MT039887 2020-02-11 29879 USA: WI oronasopharynx 2020-01-31
MT008023 MT008023 2020-01-31   322 Italy: Rome oronasopharynx 2020-01
MT008022 MT008022 2020-01-31   322 Italy: Rome oronasopharynx 2020-01

----------------------------------



                                                                       
« Last Edit: April 12, 2020, 02:11:09 am by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #2 on: April 13, 2020, 02:14:56 am »
above enumeration is wrong, doesn't consider the first digit {expecting only flu-sequences}
it is corrected now in the mutations below




most genomes are from 2 Washington-strains

WA2,101 genomes , 4+2+1 mutations
C00150T{Spain},C02946T{Spain},C14317T{Spain},A23312G{Spain}  , {RI-Feb28}
  G25472T{CA-Feb29}  , C00968T{CA-Feb29}  ,
 (G29462A{WA-Mar13})

WA1,169 genomes , 2+1+2 mutations
C08691T{China},T28053C{China},  {Wuhan,Dec26}
C17969T{WA-Jan19}, 
C17656T{Mar-WA},A17767G{Mar-WA}   {Iran-Feb26}

we had 2 clearly distinguished strains already in Wuhan, Dec.26 .
Once is the predecessor of WA1 and one is the predecessor of WA2

plusminus some contant , the first and last part of the sequences  (~100 nucleotides)
are skipped because they often contain mutations supposed to be sequencing errors

18 genomes that didn't easily align because of insertions or deletions were excluded

there was 1 genome from France, it had a 3bp deletion , none from Germany

----------------------------------------------------
102 German sequences :  https://civnb.info/public/charite-SARS-CoV-2.fasta.gz
@c_drosten  2020/04/13/08:06UTC
https://civnb.info/sequences/

An overview of #SARS-CoV-2 genome sequences from Germany, including early releases by
@charitevirology. Some interesting insights into local clustering and wider dispersal, for
people familiar with German geography.  https://civnb.info/sequences/

102>7+4+87 , >3 different strains
most have C241T,C3037T,A23403G  and most of these also have C14408T

this is WA2 above (add 91)

the one from Jan28 has C241T,C3037T, and A23403G

I assume that was the Webasto-introduction. The whole WA2 and Spain and maybe Italy, France
seems to have descended from it ! (after C14317 was developed in Germany)

WA2,101 genomes , 4+2+1 mutations
C00150T{Spain},C02946T{Spain},C14317T{Spain},A23312G{Spain}  , {RI-Feb28}
  G25472T{CA-Feb29}  , C00968T{CA-Feb29}  ,
 (G29462A{WA-Mar13})
« Last Edit: October 27, 2020, 05:59:06 am by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #3 on: April 14, 2020, 12:57:54 am »
Wittkowski paper :
https://www.medrxiv.org/content/10.1101/2020.03.28.20036715v2.full.pdf

 > The epidemiological data does not support the hypothesis that SARS-CoV-2 spread from Munich
 > in Germany to Italy.(Kupferschmidt 2020)
 > Instead, the virus may have spread from Italy to its neighboring
 > countries, Switzerland, France, Spain, Austria, and Slovenia, within just a few days
 > of arriving from Iran.

there is one partial sequence from Qum, Feb09, at genbank and it has not the T28688C mutation
which have the sequences from Tehran

Iran sequences seem to have a 6bp  insertion at position 14606

Iran1,C08782T,C17753T,A17864G,C18066T,T28150C
Iran2,G01397A,G11083T, TCCTTA-insertion at 14606 , C8383T,G9380A,G9748T     {Mar09}

WA1,169 genomes , 2+1+2 mutations  [add 91 to all positions]
C08691T{China},T28053C{China},  {Wuhan,Dec26}
C17969T{WA-Jan19},
C17656T{Mar-WA},A17767G{Mar-WA}   {Iran-Feb26}
« Last Edit: April 14, 2020, 02:19:07 am by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #4 on: April 14, 2020, 10:26:33 pm »
[add 91 ?!]

Wuhan1, Dec.26
-

Wuhan2
C08691T{China},T28053C{China},  {Wuhan,Dec26}

WA1,169 genomes , 2+1+2 mutations
C08691T{China},T28053C{China},  {Wuhan,Dec26}
C17969T{WA-Jan19}, 
C17656T{Mar-WA},A17767G{Mar-WA}   {Iran-Feb26}

WA2,101 genomes , 4+2+1 mutations
C00150T{Spain},C02946T{Spain},C14317T{Spain},A23312G{Spain}  , {RI-Feb28}
  G25472T{CA-Feb29}  , C00968T{CA-Feb29}  ,
 (G29462A{WA-Mar13})

Webasto,
C241T,C3037T,A23403G   { C14408T }
« Last Edit: April 17, 2020, 11:25:06 am by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #5 on: April 23, 2020, 10:15:01 pm »
2020/04/23   , 1432 sequences in total

many (>100) sequences from NY  today
Hong Kong  , collected in Jan
« Last Edit: April 23, 2020, 11:04:43 pm by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #6 on: April 27, 2020, 10:26:17 pm »
2020/04/28 , 10567 genomes from UK : https://www.cogconsortium.uk/data/


2020/05/22 ,  the total number of sequences is 16380
2020/09/03 , 48561 sequences
« Last Edit: October 22, 2020, 10:16:10 am by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #7 on: May 14, 2020, 10:28:23 pm »
2020/05/15 , 3812 genomes , 3415 of these are easily aligned  (no insertions or deletions) , full size

2049 (=60%) of these have C241T
2088 have C3037T
2114 have C14408T
2113 have A24403G
------------------------------

1434 have G25563T
1234 have C1059T

2126 from USA
81 from France
69 from China
809 from Australia:Victoria
91 from Greece:Athens
38 from India:Ahmedabad
23 from Thailand
21 from Taiwan
20 from HongKong
18 from India
12 from PuertoRico

(German and UK sequences see above )


why so many C-->T
« Last Edit: May 14, 2020, 11:33:22 pm by gsgs »

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #8 on: September 28, 2020, 09:52:42 pm »
2020/09/28

24923 COVID-19 sequences at genbank (was 16438 on 2020/08/18), file sars2-19
24185 complete ones (>29400 nucleotides)
22,208,199,355,6040,5033,2561,2869,4494,639,219 with collection months 2019/12,...,2020/09
USA:14091,AUS:5562,IND:570,EGY:233,BGD:231,THA:227,CHN:195,IRN:101,GRC:98,GER:92,
PER:89,FRA:89,JPN:84,IRQ:76,MEX:66,ITA:60,SAU:58,TUR:57,SPA:48,GHA:46
USA:WA=3455,CA=1953,FL=1520,WI=1340,VA=903,MA=876,MI=493,NY=318
human:22574,mink:13,mink:12,5:cat,2:dog,tiger:1
onp:6269,saliva:54,swab:310,saliva/onp:54,lung/oronasopharynx:49,lung:28,blood:3,feces:4,
1:onp/onp,1:placenta,1:urine

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 04/11
« Reply #9 on: October 28, 2020, 10:19:00 pm »
2020/10/29 , 36846 sequences



----------------------------------------
    Charite has 544 German sequences now,
    36 from October , all with the D614G mutation
    https://civnb.info/sequences/

    the most frequent mutations from October are :

    36,C3037T,C241T,C14408T,A23403G
    14,G28883C,14,G28882A,14,G28881A

    10,G21255C
    9,T445C
    9,G29645T
    9,G25563T
    9,C6286T
    9,C28932T
    9,C22227T
    8,G27870T
    8,C6040T
    8,C27944T
    8,C26801T
    8,C21614T
    7,C28854T

    27 (10 from October) are the new Spanish strain "20A.EU1" from this paper :
    https://www.medrxiv.org/content/10.1...063v1.full.pdf
    mutation picture : http://magictour.free.fr/sars2de5.GIF
   

gsgs

  • Administrator
  • Full Member
  • *****
  • Posts: 196
    • View Profile
Re: genbank 2020-04-11
« Reply #10 on: December 28, 2020, 12:13:05 am »
2020-12-09 , "sars2-21" , 44708 sequences

countries :

Code: [Select]
518862 ,13303 ,  25 ,Australia
  71659 ,23785 , 331 ,USA
  68834 ,  119 ,   1 ,Bahrain
  22619 ,   10 ,   0 ,Malta
  21801 ,   29 ,   1 ,Timor-Leste
  13072 ,  114 ,   8 ,Serbia
  12756 ,   96 ,   7 ,Hong Kong
   9975 ,    4 ,   0 ,Belize
   9424 ,   98 ,  10 ,Greece
   3780 ,  143 ,  37 ,Poland
   3248 ,  227 ,  69 ,Thailand
   2899 ,  245 ,  84 ,Iran
   2832 ,   94 ,  33 ,Peru
   2696 ,    8 ,   2 ,Jamaica
   2694 ,   32 ,  11 ,Tunisia
   2633 ,   27 ,  10 ,Jordan
   2324 ,  240 , 103 ,Egypt
   2222 ,    5 ,   2 ,Gabon
   2145 ,   23 ,  10 ,Czechia
   2068 ,   12 ,   5 ,Denmark
   2007 ,    8 ,   3 ,Georgia
   1919 ,   78 ,  40 ,Iraq
   1653 ,   58 ,  35 ,Saudi Arabia
   1649 ,  273 , 165 ,Bangladesh
   1506 ,   91 ,  60 ,Italy
   1466 ,   46 ,  31 ,Ghana
   1366 ,   11 ,   8 ,Sierra Leone
   1362 ,   89 ,  65 ,France
   1342 ,   32 ,  23 ,Taiwan
   1096 ,   92 ,  83 ,Germany
   1090 ,   51 ,  46 ,Spain
   1087 ,   10 ,   9 ,Israel
    932 ,   16 ,  17 ,Netherlands
    918 ,  116 , 126 ,Japan
    755 ,   64 ,  84 ,Turkey
    704 ,   20 ,  28 ,Venezuela
    587 ,    4 ,   6 ,Lebanon
    573 ,   11 ,  19 ,Chile
    571 ,   74 , 129 ,Mexico
    553 ,   10 ,  18 ,Guatemala
    504 ,  700 ,1386 ,India
    474 ,   18 ,  37 ,Canada
    287 ,    1 ,   3 ,Uruguay
    276 ,    9 ,  32 ,Malaysia
    269 ,   10 ,  37 ,Morocco
    258 ,    3 ,  11 ,Belgium
    253 ,   28 , 110 ,Philippines
    239 ,   35 , 145 ,Russia
    225 ,    4 ,  17 ,Ecuador
    211 ,    4 ,  18 ,Kazakhstan
    199 ,    1 ,   5 ,New Zealand
    197 ,    2 ,  10 ,Sweden
    194 ,   10 ,  51 ,S. Korea
    186 ,    4 ,  21 ,Sri Lanka
    180 ,    1 ,   5 ,Finland
    176 ,    2 ,  11 ,Cuba
    143 ,  207 ,1439 ,China
     93 ,   20 , 213 ,Brazil
     73 ,    5 ,  68 ,UK
     71 ,    7 ,  97 ,Vietnam
     59 ,    2 ,  33 ,Uzbekistan
     59 ,    1 ,  16 ,Cambodia
     53 ,   12 , 222 ,Pakistan
     53 ,    1 ,  18 ,Zambia
     52 ,    1 ,  19 ,Romania
     39 ,    2 ,  51 ,Colombia
     36 ,    2 ,  54 ,Kenya
     34 ,    1 ,  29 ,Nepal
     19 ,    4 , 208 ,Nigeria
     16 ,    1 ,  59 ,South Africa

-------------------------------------
SARS2-sequences at genbank on 2020/12/09 per billion population
SARS2-sequences at genbank on 2020/12/09
population in million


 1.701412E+38 , 3  0 ,Guam
 1.701412E+38 , 2  0 ,West Bank
 1.701412E+38 , 14  0 ,Puerto Rico



mutations :

Code: [Select]
  36235,A23403G
  36105,C3037T
  35945,C14408T
  34533,C241T
  15954,G28881A
  15931,G28883C
  15929,G28882A
  14983,G25563T
  12027,C1059T
  11227,A1163T
  10960,C18555T
  10845,G23401A
  10834,G16647T
  10810,T7540C
   9845,G22992A
   3493,C27964T
   2913,T28144C
   2867,C8782T
   2030,A20268G
   1961,C18060T
   1939,C28854T
   1929,A17858G
   1888,C17747T
   1696,C18877T
   1670,C10319T
   1415,G11083T
   1382,C22480T
   1309,C14805T
   1260,T19839C
   1037,A18424G
   1027,G25907T
   1014,C28472T
    983,C21304T
    982,C2416T
    957,C28869T
    933,C29870A
    873,C29784T
    850,T833C
    833,G15594T
    829,C11916T
    747,C15933T
    743,C27059T
    717,A17337G
    667,G16968T
    663,C18568T
    649,G3114T
    635,C26735T
    626,C16260T
    615,C28821A
    602,C15324T
    584,G8083A
    561,C17850T
    533,G29553A
    524,C920T
    524,C5144T
    522,C66T
    515,A29700G
    495,C24034T
    491,C3177T
    488,G28077C
    476,T18736C
    474,T490A
    474,C313T
    467,T26729C
    451,T27785C
    449,G3871T
    444,T3931C
    440,C28887T
    433,A696C
    424,G26144T
    423,C21575T
    423,A22320G
    418,T24076C
    417,C28977T
    406,C36T
    405,C23707T
    396,C18998T
    392,G29540A
    376,C28087T
    368,C379A
    368,C28311T
    364,G21255C
    362,A26162G
    338,C22444T
    331,G29402T
    315,G1738T
    311,C22227T
    308,C28657T
    306,C5672T
    300,C6541T
    296,C4226T
    280,C6286T
    280,A24694T
    279,A7837C
    275,G29645T
    270,C6040T
    270,C28932T
    270,A35T
    268,C26801G
    262,C7086T
    260,T445C
    260,C2836T
    259,C6033T
    256,G3242A
    255,G26233T
    254,C1385T
    253,A14084G
    248,C28961T
    246,T9477A
    245,G29692T
    244,C3773T
    244,C28863T
    244,C26256T
    242,G25979T
    242,C4540T
    240,C23457A
    240,A24253T
    238,T14191C
    238,C16092T
    237,G26526T
    236,C29668T
    232,G23900C
    230,C25710T
    224,C23655T
    223,C29546T
    220,G29808T
    219,G21724T
    218,C8389T
    215,G26690T
    214,G19677T
    211,G3564T
    206,C7162T
    198,A10323G
    197,G29868A
    197,G14241T
    197,A34T
    194,T22162C
    194,C18486T
    193,G29711T
    193,C13730T
    192,G24933T
    189,C23929T
    187,G7798T
    187,C29870M
    184,C3738T
    184,C13536T
    180,C11109T
    180,C10188T
    179,C28868T
    177,C15352T
    173,G12478A
    172,G22225T
    172,C4543T
    172,C19524T
    170,G8371T
    170,G16813A
    170,C20759T
    169,T17247C
    168,T24982C
    167,C23731T
    166,G28487A
    166,C7600T
    166,C28896G
    164,G28842T
    164,C4002T
    162,C11497T
    159,C14768T
    158,G15766T
    158,G13993T
    157,G9526T
    157,G17019T
    157,C16887T
    156,C6312A
    156,C27603T
    156,C18744T
    155,T26876C
    155,G9130T
    155,C16616T
    155,A29871G
    154,T27957C
    154,G28975C
    154,A16889G
    153,G29399A
    153,G10265A
    153,C27635T
    152,T23548C
    152,G29742T
    152,C15654T
    151,C9996T
    150,C5184T
    150,C26885T
    150,C17639T
    150,C11379Y
    148,T21925C
    148,C21516T
    148,C1917T
    146,C29769T
    144,T17091C
    144,G29777T
    144,G26257T
    144,G19684T
    144,G15907A
    144,C14362T
    142,T8041C
    142,C28253T
    142,C12025T
    141,C2558T
    141,C16111T
    140,T27384C
    140,C19217T
    140,C11575T
    140,A6512C
    139,G26062T
    139,C29614T
    137,G1397A
    137,C1288T
    136,T27319C
    136,C12116T
    136,A27024C
    136,A21137G
    135,G29734C
    135,C23191T
    134,C19718T
    133,G5629K
    133,G29706T
    133,A2480G
    133,A12199G
    132,C222T
    131,C19170T
    130,T28688C
    130,C28256T
    130,C16762T
    127,T29867A
    126,G28878A
    126,C25916T
    126,C10376T
    123,T10717C
    123,C9430T
    123,C37A
    123,C25572T
    123,C16376T
    122,C15540T
    120,G29742A
    120,G105T
    120,C19488T
    120,A15972G
    118,G11417T
    118,C29353T
    118,C24351T
    117,C16289T
    116,T25111A
    116,G17721T
    116,C11956T
    116,A4624G
    115,C6285T
    115,C23185T
    114,G21468T
    113,G25429T
    113,G16912T
    113,C2488T
    113,A26759T
    112,T15276A
    111,G28373T
    110,G20580T
    110,C11379T
    109,G4300T
    109,G10097A
    108,G23593T
    108,G18985T
    108,C6445T
    108,C12763T
    107,C12880T
    106,C5365T
    106,C28849T
    105,C29366Y
    105,C22987T
    105,C15720T
    104,C3411T
    104,C24904T
    104,A2161G
    103,T15354C
    103,C29095T
    103,C25350T
    103,C24378T
    102,T25902W
    102,C4832T
    101,T25908W
    101,T1927C
    101,G28727T
    101,G24992C
    101,G1401A
    101,C4113T
    100,T29867W
    100,C6651T

« Last Edit: December 28, 2020, 12:23:50 am by gsgs »