nCoV_info
General Category => genetics => Topic started by: gsgs on April 11, 2020, 09:50:21 am
-
COVID-sequences links :
Nov20,40432+107300+631
https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?VirusLineage_ss=Severe%20acute%20respiratory%20syndrome%20coronavirus%202,%20taxid:2697049&SeqType_s=Nucleotide
https://www.cogconsortium.uk/data/
https://civnb.info/sequences/
http://www.insdc.org/
https://www.ebi.ac.uk/
-------------------------------------------------------------------------------
other links :
(https://www.viprbrc.org/brc/home.spg?decorator=vipr).
flu : https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database
releases : https://ftp.ncbi.nih.gov/genbank/gbrel.txt
----------------------------------------------------------------
582 COVID-19 sequences (507 of these full genome) downloaded on Apr11 from :
[615 on Apr12)
(876 on Apr14 , 795 full)+102 full German ones from Drosten-tweet s.u.
(1207 on Apr21)
https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?VirusLineage_ss=Severe%20acute%20respiratory%20syndrome%20coronavirus%202,%20taxid:2697049&SeqType_s=Nucleotide
386 from USA , (earliest from Jan19, IL-Jan21,CA-Jan22,WA-Jan25
94 from China , (66 full , earliest from Dec.23
25 from Spain (earliest Feb26 ,
17 from Iran , (2 full, earliest Mar09
6 from Italy , (1 full, Jan30
...
dates = symptom-onset ?
mutation-picture : http://magictour.free.fr/sars2-4.GIF
rows are viruses , columns are RNA-nucleotide-positions , black pixels are mutations
horizontally and vertically sorted so to give maximum black connected components
community-spread in Washington State since at least Jan19,
one day before Zhong Nanshang raised the alarm on Chinese TV
and 4 days before Wuhan lockdown !
Jan19 was sample-collection, sequence-release date was Mar26
another sequence from WA , Jan25, release date Feb05
also already with that C27969T (my current enumeration) - mutation typical for WA
MN985325 , https://wwwnc.cdc.gov/eid/article/26/6/20-0516_article
China-traveler , symptom-onset = Jan16 , sampled Jan19 , USA-WA , genbank Mar27
https://www.ncbi.nlm.nih.gov/nuccore/MN985325
had already the C27969T mutation
---------------------------------------------
-
earliest outside China :
MN970003 MN970003 2020-01-23 290 Thailand lung, oronasopharynx 2020-01-08
MT072688 MT072688 2020-02-18 29811 Nepal oronasopharynx 2020-01-13
MN970004 MN970004 2020-01-23 290 Thailand lung, oronasopharynx 2020-01-13
MN985325 MN985325 2020-01-24 29882 USA oronasopharynx 2020-01-19
MT233526 MT233526 2020-03-26 29847 USA: WA oronasopharynx 2020-01-19
MT246667 MT246667 2020-03-26 29867 USA: WA oronasopharynx 2020-01-19
MN988713 MN988713 2020-01-25 29882 USA: Illinois lung, oronasopharynx 2020-01-21
MN997409 MN997409 2020-01-28 29882 USA: AZ feces 2020-01-22
MN994468 MN994468 2020-01-28 29883 USA: CA oronasopharynx 2020-01-22
MN994467 MN994467 2020-01-28 29882 USA: CA oronasopharynx 2020-01-23
MT007544 MT007544 2020-01-31 29893 Australia: Victoria 2020-01-25
MT192772 MT192772 2020-03-16 29891 Viet Nam: Ho Chi Minh city 2020-01-22
MT192773 MT192773 2020-03-16 29890 Viet Nam: Ho Chi Minh city 2020-01-22
LC523809 LC523809 2020-02-13 357 Philippines 2020-01-23
MT066159 MT066159 2020-02-14 290 Malaysia oronasopharynx 2020-01-24
MT066157 MT066157 2020-02-14 290 Malaysia oronasopharynx 2020-01-24
MT066158 MT066158 2020-02-14 290 Malaysia oronasopharynx 2020-01-24
MT192759 MT192759 2020-03-16 29862 Taiwan lung, oronasopharynx 2020-01-25
MT020881 MT020881 2020-02-05 29882 USA: WA oronasopharynx 2020-01-25
MT020880 MT020880 2020-02-05 29882 USA: WA oronasopharynx 2020-01-25
LC523808 LC523808 2020-02-13 357 Philippines 2020-01-26
LC522350 LC522350 2020-02-08 182 Philippines 2020-01-26
MT044258 MT044258 2020-02-12 29858 USA: CA oronasopharynx 2020-01-27
MT012098 MT012098 2020-03-06 29854 India: Kerala State oronasopharynx 2020-01-27
MT044257 MT044257 2020-02-12 29882 USA: IL lung, oronasopharynx 2020-01-28
MT039888 MT039888 2020-02-11 29882 USA: MA oronasopharynx 2020-01-29
MT027062 MT027062 2020-02-07 29882 USA: CA oronasopharynx 2020-01-29
MT027063 MT027063 2020-02-07 29882 USA: CA oronasopharynx 2020-01-29
MT027064 MT027064 2020-02-07 29882 USA: CA oronasopharynx 2020-01-29
MT020781 MT020781 2020-02-05 29806 Finland 2020-01-29
MT066156 MT066156 2020-03-09 29867 Italy lung, oronasopharynx 2020-01-30
MT050493 MT050493 2020-03-06 29851 India: Kerala State oronasopharynx 2020-01-31
MT066175 MT066175 2020-02-14 29870 Taiwan 2020-01-31
MT039887 MT039887 2020-02-11 29879 USA: WI oronasopharynx 2020-01-31
MT008023 MT008023 2020-01-31 322 Italy: Rome oronasopharynx 2020-01
MT008022 MT008022 2020-01-31 322 Italy: Rome oronasopharynx 2020-01
----------------------------------
-
above enumeration is wrong, doesn't consider the first digit {expecting only flu-sequences}
it is corrected now in the mutations below
(https://ncovinfo.createaforum.com/proxy.php?request=http%3A%2F%2Fmagictour.free.fr%2Fsars2-4.GIF&hash=69e199058c1fb0eab8d36a68e84f6222764f3ecd)
most genomes are from 2 Washington-strains
WA2,101 genomes , 4+2+1 mutations
C00150T{Spain},C02946T{Spain},C14317T{Spain},A23312G{Spain} , {RI-Feb28}
G25472T{CA-Feb29} , C00968T{CA-Feb29} ,
(G29462A{WA-Mar13})
WA1,169 genomes , 2+1+2 mutations
C08691T{China},T28053C{China}, {Wuhan,Dec26}
C17969T{WA-Jan19},
C17656T{Mar-WA},A17767G{Mar-WA} {Iran-Feb26}
we had 2 clearly distinguished strains already in Wuhan, Dec.26 .
Once is the predecessor of WA1 and one is the predecessor of WA2
plusminus some contant , the first and last part of the sequences (~100 nucleotides)
are skipped because they often contain mutations supposed to be sequencing errors
18 genomes that didn't easily align because of insertions or deletions were excluded
there was 1 genome from France, it had a 3bp deletion , none from Germany
----------------------------------------------------
102 German sequences : https://civnb.info/public/charite-SARS-CoV-2.fasta.gz
@c_drosten 2020/04/13/08:06UTC
https://civnb.info/sequences/
An overview of #SARS-CoV-2 genome sequences from Germany, including early releases by
@charitevirology. Some interesting insights into local clustering and wider dispersal, for
people familiar with German geography. https://civnb.info/sequences/
102>7+4+87 , >3 different strains
most have C241T,C3037T,A23403G and most of these also have C14408T
this is WA2 above (add 91)
the one from Jan28 has C241T,C3037T, and A23403G
I assume that was the Webasto-introduction. The whole WA2 and Spain and maybe Italy, France
seems to have descended from it ! (after C14317 was developed in Germany)
WA2,101 genomes , 4+2+1 mutations
C00150T{Spain},C02946T{Spain},C14317T{Spain},A23312G{Spain} , {RI-Feb28}
G25472T{CA-Feb29} , C00968T{CA-Feb29} ,
(G29462A{WA-Mar13})
-
Wittkowski paper :
https://www.medrxiv.org/content/10.1101/2020.03.28.20036715v2.full.pdf
> The epidemiological data does not support the hypothesis that SARS-CoV-2 spread from Munich
> in Germany to Italy.(Kupferschmidt 2020)
> Instead, the virus may have spread from Italy to its neighboring
> countries, Switzerland, France, Spain, Austria, and Slovenia, within just a few days
> of arriving from Iran.
there is one partial sequence from Qum, Feb09, at genbank and it has not the T28688C mutation
which have the sequences from Tehran
Iran sequences seem to have a 6bp insertion at position 14606
Iran1,C08782T,C17753T,A17864G,C18066T,T28150C
Iran2,G01397A,G11083T, TCCTTA-insertion at 14606 , C8383T,G9380A,G9748T {Mar09}
WA1,169 genomes , 2+1+2 mutations [add 91 to all positions]
C08691T{China},T28053C{China}, {Wuhan,Dec26}
C17969T{WA-Jan19},
C17656T{Mar-WA},A17767G{Mar-WA} {Iran-Feb26}
-
[add 91 ?!]
Wuhan1, Dec.26
-
Wuhan2
C08691T{China},T28053C{China}, {Wuhan,Dec26}
WA1,169 genomes , 2+1+2 mutations
C08691T{China},T28053C{China}, {Wuhan,Dec26}
C17969T{WA-Jan19},
C17656T{Mar-WA},A17767G{Mar-WA} {Iran-Feb26}
WA2,101 genomes , 4+2+1 mutations
C00150T{Spain},C02946T{Spain},C14317T{Spain},A23312G{Spain} , {RI-Feb28}
G25472T{CA-Feb29} , C00968T{CA-Feb29} ,
(G29462A{WA-Mar13})
Webasto,
C241T,C3037T,A23403G { C14408T }
-
2020/04/23 , 1432 sequences in total
many (>100) sequences from NY today
Hong Kong , collected in Jan
-
2020/04/28 , 10567 genomes from UK : https://www.cogconsortium.uk/data/
2020/05/22 , the total number of sequences is 16380
2020/09/03 , 48561 sequences
-
2020/05/15 , 3812 genomes , 3415 of these are easily aligned (no insertions or deletions) , full size
2049 (=60%) of these have C241T
2088 have C3037T
2114 have C14408T
2113 have A24403G
------------------------------
1434 have G25563T
1234 have C1059T
2126 from USA
81 from France
69 from China
809 from Australia:Victoria
91 from Greece:Athens
38 from India:Ahmedabad
23 from Thailand
21 from Taiwan
20 from HongKong
18 from India
12 from PuertoRico
(German and UK sequences see above )
why so many C-->T
-
2020/09/28
24923 COVID-19 sequences at genbank (was 16438 on 2020/08/18), file sars2-19
24185 complete ones (>29400 nucleotides)
22,208,199,355,6040,5033,2561,2869,4494,639,219 with collection months 2019/12,...,2020/09
USA:14091,AUS:5562,IND:570,EGY:233,BGD:231,THA:227,CHN:195,IRN:101,GRC:98,GER:92,
PER:89,FRA:89,JPN:84,IRQ:76,MEX:66,ITA:60,SAU:58,TUR:57,SPA:48,GHA:46
USA:WA=3455,CA=1953,FL=1520,WI=1340,VA=903,MA=876,MI=493,NY=318
human:22574,mink:13,mink:12,5:cat,2:dog,tiger:1
onp:6269,saliva:54,swab:310,saliva/onp:54,lung/oronasopharynx:49,lung:28,blood:3,feces:4,
1:onp/onp,1:placenta,1:urine
-
2020/10/29 , 36846 sequences
----------------------------------------
Charite has 544 German sequences now,
36 from October , all with the D614G mutation
https://civnb.info/sequences/
the most frequent mutations from October are :
36,C3037T,C241T,C14408T,A23403G
14,G28883C,14,G28882A,14,G28881A
10,G21255C
9,T445C
9,G29645T
9,G25563T
9,C6286T
9,C28932T
9,C22227T
8,G27870T
8,C6040T
8,C27944T
8,C26801T
8,C21614T
7,C28854T
27 (10 from October) are the new Spanish strain "20A.EU1" from this paper :
https://www.medrxiv.org/content/10.1...063v1.full.pdf
mutation picture : http://magictour.free.fr/sars2de5.GIF
-
2020-12-09 , "sars2-21" , 44708 sequences
countries :
518862 ,13303 , 25 ,Australia
71659 ,23785 , 331 ,USA
68834 , 119 , 1 ,Bahrain
22619 , 10 , 0 ,Malta
21801 , 29 , 1 ,Timor-Leste
13072 , 114 , 8 ,Serbia
12756 , 96 , 7 ,Hong Kong
9975 , 4 , 0 ,Belize
9424 , 98 , 10 ,Greece
3780 , 143 , 37 ,Poland
3248 , 227 , 69 ,Thailand
2899 , 245 , 84 ,Iran
2832 , 94 , 33 ,Peru
2696 , 8 , 2 ,Jamaica
2694 , 32 , 11 ,Tunisia
2633 , 27 , 10 ,Jordan
2324 , 240 , 103 ,Egypt
2222 , 5 , 2 ,Gabon
2145 , 23 , 10 ,Czechia
2068 , 12 , 5 ,Denmark
2007 , 8 , 3 ,Georgia
1919 , 78 , 40 ,Iraq
1653 , 58 , 35 ,Saudi Arabia
1649 , 273 , 165 ,Bangladesh
1506 , 91 , 60 ,Italy
1466 , 46 , 31 ,Ghana
1366 , 11 , 8 ,Sierra Leone
1362 , 89 , 65 ,France
1342 , 32 , 23 ,Taiwan
1096 , 92 , 83 ,Germany
1090 , 51 , 46 ,Spain
1087 , 10 , 9 ,Israel
932 , 16 , 17 ,Netherlands
918 , 116 , 126 ,Japan
755 , 64 , 84 ,Turkey
704 , 20 , 28 ,Venezuela
587 , 4 , 6 ,Lebanon
573 , 11 , 19 ,Chile
571 , 74 , 129 ,Mexico
553 , 10 , 18 ,Guatemala
504 , 700 ,1386 ,India
474 , 18 , 37 ,Canada
287 , 1 , 3 ,Uruguay
276 , 9 , 32 ,Malaysia
269 , 10 , 37 ,Morocco
258 , 3 , 11 ,Belgium
253 , 28 , 110 ,Philippines
239 , 35 , 145 ,Russia
225 , 4 , 17 ,Ecuador
211 , 4 , 18 ,Kazakhstan
199 , 1 , 5 ,New Zealand
197 , 2 , 10 ,Sweden
194 , 10 , 51 ,S. Korea
186 , 4 , 21 ,Sri Lanka
180 , 1 , 5 ,Finland
176 , 2 , 11 ,Cuba
143 , 207 ,1439 ,China
93 , 20 , 213 ,Brazil
73 , 5 , 68 ,UK
71 , 7 , 97 ,Vietnam
59 , 2 , 33 ,Uzbekistan
59 , 1 , 16 ,Cambodia
53 , 12 , 222 ,Pakistan
53 , 1 , 18 ,Zambia
52 , 1 , 19 ,Romania
39 , 2 , 51 ,Colombia
36 , 2 , 54 ,Kenya
34 , 1 , 29 ,Nepal
19 , 4 , 208 ,Nigeria
16 , 1 , 59 ,South Africa
-------------------------------------
SARS2-sequences at genbank on 2020/12/09 per billion population
SARS2-sequences at genbank on 2020/12/09
population in million
1.701412E+38 , 3 0 ,Guam
1.701412E+38 , 2 0 ,West Bank
1.701412E+38 , 14 0 ,Puerto Rico
mutations :
36235,A23403G
36105,C3037T
35945,C14408T
34533,C241T
15954,G28881A
15931,G28883C
15929,G28882A
14983,G25563T
12027,C1059T
11227,A1163T
10960,C18555T
10845,G23401A
10834,G16647T
10810,T7540C
9845,G22992A
3493,C27964T
2913,T28144C
2867,C8782T
2030,A20268G
1961,C18060T
1939,C28854T
1929,A17858G
1888,C17747T
1696,C18877T
1670,C10319T
1415,G11083T
1382,C22480T
1309,C14805T
1260,T19839C
1037,A18424G
1027,G25907T
1014,C28472T
983,C21304T
982,C2416T
957,C28869T
933,C29870A
873,C29784T
850,T833C
833,G15594T
829,C11916T
747,C15933T
743,C27059T
717,A17337G
667,G16968T
663,C18568T
649,G3114T
635,C26735T
626,C16260T
615,C28821A
602,C15324T
584,G8083A
561,C17850T
533,G29553A
524,C920T
524,C5144T
522,C66T
515,A29700G
495,C24034T
491,C3177T
488,G28077C
476,T18736C
474,T490A
474,C313T
467,T26729C
451,T27785C
449,G3871T
444,T3931C
440,C28887T
433,A696C
424,G26144T
423,C21575T
423,A22320G
418,T24076C
417,C28977T
406,C36T
405,C23707T
396,C18998T
392,G29540A
376,C28087T
368,C379A
368,C28311T
364,G21255C
362,A26162G
338,C22444T
331,G29402T
315,G1738T
311,C22227T
308,C28657T
306,C5672T
300,C6541T
296,C4226T
280,C6286T
280,A24694T
279,A7837C
275,G29645T
270,C6040T
270,C28932T
270,A35T
268,C26801G
262,C7086T
260,T445C
260,C2836T
259,C6033T
256,G3242A
255,G26233T
254,C1385T
253,A14084G
248,C28961T
246,T9477A
245,G29692T
244,C3773T
244,C28863T
244,C26256T
242,G25979T
242,C4540T
240,C23457A
240,A24253T
238,T14191C
238,C16092T
237,G26526T
236,C29668T
232,G23900C
230,C25710T
224,C23655T
223,C29546T
220,G29808T
219,G21724T
218,C8389T
215,G26690T
214,G19677T
211,G3564T
206,C7162T
198,A10323G
197,G29868A
197,G14241T
197,A34T
194,T22162C
194,C18486T
193,G29711T
193,C13730T
192,G24933T
189,C23929T
187,G7798T
187,C29870M
184,C3738T
184,C13536T
180,C11109T
180,C10188T
179,C28868T
177,C15352T
173,G12478A
172,G22225T
172,C4543T
172,C19524T
170,G8371T
170,G16813A
170,C20759T
169,T17247C
168,T24982C
167,C23731T
166,G28487A
166,C7600T
166,C28896G
164,G28842T
164,C4002T
162,C11497T
159,C14768T
158,G15766T
158,G13993T
157,G9526T
157,G17019T
157,C16887T
156,C6312A
156,C27603T
156,C18744T
155,T26876C
155,G9130T
155,C16616T
155,A29871G
154,T27957C
154,G28975C
154,A16889G
153,G29399A
153,G10265A
153,C27635T
152,T23548C
152,G29742T
152,C15654T
151,C9996T
150,C5184T
150,C26885T
150,C17639T
150,C11379Y
148,T21925C
148,C21516T
148,C1917T
146,C29769T
144,T17091C
144,G29777T
144,G26257T
144,G19684T
144,G15907A
144,C14362T
142,T8041C
142,C28253T
142,C12025T
141,C2558T
141,C16111T
140,T27384C
140,C19217T
140,C11575T
140,A6512C
139,G26062T
139,C29614T
137,G1397A
137,C1288T
136,T27319C
136,C12116T
136,A27024C
136,A21137G
135,G29734C
135,C23191T
134,C19718T
133,G5629K
133,G29706T
133,A2480G
133,A12199G
132,C222T
131,C19170T
130,T28688C
130,C28256T
130,C16762T
127,T29867A
126,G28878A
126,C25916T
126,C10376T
123,T10717C
123,C9430T
123,C37A
123,C25572T
123,C16376T
122,C15540T
120,G29742A
120,G105T
120,C19488T
120,A15972G
118,G11417T
118,C29353T
118,C24351T
117,C16289T
116,T25111A
116,G17721T
116,C11956T
116,A4624G
115,C6285T
115,C23185T
114,G21468T
113,G25429T
113,G16912T
113,C2488T
113,A26759T
112,T15276A
111,G28373T
110,G20580T
110,C11379T
109,G4300T
109,G10097A
108,G23593T
108,G18985T
108,C6445T
108,C12763T
107,C12880T
106,C5365T
106,C28849T
105,C29366Y
105,C22987T
105,C15720T
104,C3411T
104,C24904T
104,A2161G
103,T15354C
103,C29095T
103,C25350T
103,C24378T
102,T25902W
102,C4832T
101,T25908W
101,T1927C
101,G28727T
101,G24992C
101,G1401A
101,C4113T
100,T29867W
100,C6651T