COVID-19, SARS AND BATS CORONAVIRUSES GENOMES PECULIAR HOMOLOGOUS RNA SEQUENCES

Jean Claude Perez ^*1, Luc Montagnier ²

^*1PhD Maths § Computer Science Bordeaux University, RETIRED interdisciplinary researcher (IBM

Emeritus, IBM European Research Center on Artificial Intelligence Montpellier), Martignas sur jalles, Bordeaux Metropole, France

² Fondation Luc Montagnier Quai Gustave-Ador 62 1207 Genève, Switzerland

DOI: https://doi.org/10.29121/granthaalayah.v8.i7.2020.678

Article Type: Research Article

Article Citation: Perez, J. C. Montagnier, L.. (2020). COVID-19, SARS AND BATS CORONAVIRUSES GENOMES PECULIAR HOMOLOGOUS RNA SEQUENCES. International Journal of Research -GRANTHAALAYAH, 8(7), 217-263. https://doi.org/10.29121/granthaalayah.v8.i7.2020.678

Received Date: 07 July 2020

Accepted Date: 30 July 2020

Keywords:

COVID-19

Bats Coronaviruses

RNA Sequences

SARS

HIV

Plasmodium yoelii

Spike

ABSTRACT

We are facing the worldwide invasion of a new coronavirus. This follows several limited outbreaks of related viruses in various locations in a recent past (SARS, MERS). Although the main current objective of researchers is to bring efficient therapeutic and preventive solutions to the global population, we need also to better understand the origin of the newly coronavirus-induced epidemic in order to avoid future outbreaks. The present molecular appraisal is to study by a bio-infomatic approach the facts relating to the virus and its precursors.

This article shows how 16 fragments (Env Pol and Integrase genes) from different strains, both diversified and very recent, of the HIV1, HIV2 and SIV retroviruses have high percentage of homology into parts of the genome of COVID_19. Moreover each of these elements is made of 18 or more nucleotides and therefore may have a function. They are called Exogenous Informative Elements (EIE).

Among these EIE, 12 are concentrated in a very small region of the COVID-19 genome, length less than 900 bases, i.e. less than 3% of the total length of this genome. In addition, these EIE are positioned in two functional genes of COVID-19: the orf1ab and S spike genes.

Here are the two main facts which contribute to our hypothesis of a partially synthetic genome: A contiguous region representing 2.49% of the whole COVID-19 genome of which 40.99% is made up of 12 diverse fragments originating from various strains of HIV SIV retroviruses. Some of these 12 EIE appear concatenated. Notably, the retroviral part of these regions, which consists of 8 elements from various strains of HIV1, HIV2 and SIV covers a length of 275 contiguous bases of COVID-19. The cumulative length of these 8 HIV/SIV elements represents 200 bases. Consequently, the HIV SIV density rate of this region of COVID-19 is 200/275 = 72.73%.

A major part of these 16 EIE already existed in the first SARS genomes as early as 2003. However, we demonstrate how a new region including 4 HIV1 HIV2 Exogenous Informative Elements radically distinguishes all COVID-19 strains from all SARS and Bat strains with the exception of Bat RaTG13.

We gather facts about the possible origins of COVID_19. We have particularly analyzed this small region of 225 bases common to COVID_19 and bat RaTG13.

We have studied the most recent genetic evolution of the COVID_19 strains involved in the world epidemic. We found a significant occurrence of mutations and deletions in the 225 bases area.

On sampling genomes, we show that this 225 bases key region of each genome, rich in EIE, and the 1770bases SPIKE region evolve much faster than the corresponding whole genome (cases of 44 patients genomes from WA Seattle state, original epicenter in USA).

In the comparative analysis of both SPIKES genes of COVID_19 and Bat RaTG13 we note two abnormal facts:

1) the insertion of 4 contiguous PRRA amino acids in the middle of SPIKE (we show that this site was already an optimal cleavage site BEFORE this insertion).

2) an abnormal distribution of synonymous codons in the second half of SPIKE.

Finally we show the insertion in this 1770 bases SPIKE region of a significant pair of EIEs from Plasmodium Yoelii and of apossible HIV1 EIE with a crucial Spike mutation.

1. INTRODUCTION

We are facing the worldwide invasion of a new coronavirus. This follows several limited outbreaks of related viruses in various locations in a recent past (SARS, MERS) [1], [2]. The human civilization has been very successful in the last centuries regarding demographic and economic growths. However, in our times, the economic power is concentrated in the hands of a few individuals and consequently economic interests are prevailing over the well being of humanity.

Although the main objective of researchers is to bring efficient therapeutic and preventive solutions to the global population, we also need to better understand the origin of the new coronavirus-induced epidemic in order to avoid future outbreaks. The present molecular appraisal is to study by a bio-infomatic approach the facts relating to the virus and its precursors.

We had analyzed the evolution of coronaviruses from the first SARS (2003), to the first genomes of COVID- 19, when it was still called 2019-nCoV [3]. We had knowledge of the online article by J.Lyons-Weiler [4] according to which a region of around 1kb is totally new in the genome of COVID-19.

Using our proprietary bio-mathematic approach where we are able to evaluate the level of cohesion and organization of a genome, we discovered that the deletion by mutation of this new region of 1kb [4] would increase the level of «structural harmonization» of the genome.

This suggests a possible exogenous «addition» to the genome. Upon studying the publication of Pradhan et al. [15] we then searched in this genome for possible traces of HIV or even SIV. A first publication [5] reports the discovery of 6 HIV SIV RNA pieces relates to crucial retroviral genes like Envelope and RT Pol. The present article confirms and extends these initial results.

2. MATERIALS AND METHODS

2.1. ACCESS TO DATA BANKS

Preliminary Note

The COVID-19 genome sequence initially studied for this article is NC_045512.2. More generally, we are interested in the first genomes published under the reference "Wuhan market". However, these sequences published in January 2020 evolved somewhat during the first quarter of 2020. Thus, NC_045512.2 has evolved from 29866 bases to 29903 bases; so, our Genbank NCBI reference was also changed.

All these sequences of genomes referenced as "Wuhan market" relating to individual patients, were deposited on January 30, 2020 and then re-published on March 6, 2020. For these reasons we will have to specify and adjust here the addresses of the key regions "A" and "B " which we analyze in this article.

The Wuhan market referenced genomes are presently:

https://www.ncbi.nlm.nih.gov/nuccore/LR757995.1

https://www.ncbi.nlm.nih.gov/nuccore/LR757996.1

https://www.ncbi.nlm.nih.gov/nuccore/LR757997.1

https://www.ncbi.nlm.nih.gov/nuccore/LR757998.1

and

https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2

Thus, the start address of the region of 330 bases named in this article "region B" which was initially positioned at 21673 bases in our previous article is now shifted at 21698 bases in NC_045512.2 , at 21683b in LR757995.1, at 21678 bases in LR757996.1, , and at 21673 bases in LR757998.1. The sequence LR757997.1, is unavailable because it contains more than 10,000 indeterminate « N » bases.

Finally, this region « B » has the same starting address in our NC_045512.2 reference sequence and in LR757998.1. The reference sequence used in this article is: https://www.ncbi.nlm.nih.gov/nuccore/LR757998.1

So, we use as reference the former referenced genome: Wuhan market ID: LR757998.1

Validation of nucleotide fragments as «Exogenous Informative Elements» (EIE):

We have chosen this minimal length of 18 nucleotides (6 amino acids) for the support of information (thus as an antigenic motif). This is also the size of the primers used for PCR which allowing a high specificity of sequence selection on DNA recognition.

Main COVID_19 genes involved

The two main genes involved in COVID-19 genome are Orf1ab and «S» Spike. Their relative addresses in our referenced genome are:

266... 21555 for Orf1ab

21563...25384 for S spike

The main analyzed regions

Region « A », Location of the 600 bases from the COVID_19 reference genome “Wuhan market”

ID: LR757998.1.

Its length was between 21072 and 21672 nucleotides.

AGGGTTTTTTCACTTACATTTGTGGGTTTATACAACAAAAGCTAGCTCTTGGAGGTTCCGTGGCTATAAAGATAACAGAACATTCTTGGAATGCTGATCTTTATAAGCTCATGGGACACTTCGCATGGTGGACAGCCTTTGTTACTAATGTGAATGCGTCATCATCTGAAGCATTTTTAATTGGATGTAATTATCTTGGCAAACCACGCGAACAAATAGATGGTTATGTCATGCATGCAAATTACATATTTTGGAGGAATACAAATCCAATTCAGTTGTCTTCCTATTCTTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAAAGAAGGTCAAATCAATGATATGATTTTATCTCTTCTTAGTAAAGGTAGACTTATAATTAGAGAAAACAACAGAGTTGTTATTTCTAGTGATGTTCTTGTTAACAACTAAACGAACAATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCC

See details alignment in supplementary materials « a ».

Region «B», Location of the 330 first bases from the COVID_19 reference genome “Wuhan market”

ID: LR757998.1.

Their length was between 21672 and 22002 nucleotides (then immediately following region «A»:

TCAGTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGT

See details alignment in supplementary materials « b ».

We analyzed this larger region which starts at the same address as our region "B":

entitled « Region Lyons-Weiler » [4].

Their length was between 21672 and 23050 (1378 nucleotides) within reference genome Wuhan market

ID: LR757998.1

In the RESULTS and DISCUSSION, we will more particularly analyze a small region of 225 nucleotides of the reference genome:

TGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCAGTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTT ACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAA

Alignments: Analyzing COVID-19 DNA sequences, We use BLAST NCBI (National Center for Biotechnology) public tool.

BLASTn - NIH

NCBI National Center for Biotechnology Information.

https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch

Relating the « DNA Master Code », a biomathematic method to analyze cohesion/heterogeneity of a DNA/RNA sequence:

We must introduce and summarize this theoretical method, because it constitutes a strong way to illustrate crucial differences between COVID_19 and bat RaTG13 specific genomes (Figs 4, 5, 12 and 13).

Full details on this numerical method in [6], [7], [8], and [31], and recall Methods in supplementary Materials « 9 »..

Starting from the atomic masses of the C O N H S P bioatoms constituting RNA, DNA nucleotides and amino acid, a simple law of projection of these atomic masses leads to a UNIFICATION of GENOMICS and PROTEOMICS patterned images that can be calculated for any DNA/RNA codons sequence. This numerical projection of atomic masses produces a whole numbers numerical code common to the triplets codons DNA, RNA, or amino acids. A process of DIGITAL INTEGRATION at short, medium and very long distance then allows a globalization of genetic information by a principle which recalls an analogy with the HOLOGRAM.

« Thus, any codon radiates at long distance and vice versa ». The Master Code of this sequence then produces two signatures, one GENOMIC and the other for PROTEOMIC, materialized by 2 very strongly correlated curves. And is this level of coupling which will provide key information on the COHESION or on the HETEROGENEITY [11] of this nucleotide sequence. in particular the extreme regions (mini / maxi) would be associated with biological functions such as active sites, chromosomes breakpoints, etc.

Dynamics of the COVID_19 sequences available:

We will specify that this study having been carried out over several weeks at the time when the number of genomes of COVID_19 was constantly evolving, we saw fit to specify, each time in deital characters, the dates of the BLASTn searches as well as the number of sequences available at this exact moment.

3. RESULTS AND DISCUSSION

This RESULTS and DISCUSSION will have 4 main sections:

Part I

18 RNA fragments of homology equal or more than 80% with human or simian retroviruses have been found in the COVID-19 genome. These fragments are 18 to 30 nucleotides long and therefore have the potential to modify the gene expression of Covid19. We have named them Exogenous Informative Elements or EIE. These EIE are not dispersed randomly, but are concentrated in a small part of the genome (§1 and 2).

Part II

This region, a 225-nucleotide long region is unique to COVID_19 and Bat RaTG13 and can also discriminate between these 2 genomes (§3, 4, 5 6 and 7).

Part III

In the decreasing slope of the epidemic, this 225 bases area exhibits an abnormally high rate of mutations/deletions, particularly in the USA Seattle WA state (§8, 9 and 10).

Part IV

The comparative analysis of the SPIKES genes of COVID_19 and of Bat RaTG13 (§11, 12, 13 and 14).

Part I

18 RNA fragments of homology equal or more than 80% with human or simian retroviruses have been found in the COVID_19 genome. These fragments are 18 to 30 nucleotides long and therefore have the potential to modify the gene expression of Covid-19. We have named them Exogenous Informative Elements or EIE. These EIE are not dispersed randomly, but are concentrated in a small part of the genome (§1 and 2).

Warning: on the limits of bioinformatics tools like BLASTn: the main criticism that this article will have to face is that of the relevance of our BLASTn analyzes highlighting many small traces of HIV in the genome of COVID_19. We will answer with the following 2 facts:

1) We limit the HIV fragments selected to a minimum of 18 bases to consider them as relevant.

2) Today, technologies such as CRISPR-Cas13 RNA [23] make it possible to modify RNA sequences with a clockmaker's precision capable of placing exogenous sequence fragments "side by side", as we will demonstrate here.

1. A high density of HIV SIV regions that are diverse both in their nature and in their collection dates: indeed, a concentration of 12 significant HIV SIV EIE in only 744bases.

We are looking here for possible traces of HIV1, HIV2 or SIV EIE into our Wuhan market reference genome

LR757998.1.

We will only use as significant EIE those which have at least 18 nucleotides of homology, i.e. 6 codons.

Note: We will present below 12 +4 HIV/SIV EIE in the sequential order of their locations within COVID_19 genome. Initially, by focusing on the genome region mentioned in [4], we find and published [5] 6 first EIE located at the very beginning of this region.

By amore in-depth exploration of this region (region "B" 330 bases), then exploring region "A"

(of 600 bases) immediately located upstream of this region "B ", we discover, concentrated on less than 930 bases, 12 HIV SIV EIE. We complete them with the last 4 EIE located upstream in the genome. It is this set of 16 EIE which will be detailed below.

Evidence for 12 HIV/SIV EIE sequences in regions “A” and “B” of the COVID-19 genome (plus two in the interface space, one merged and one overlapped):

Following, the 14 HIV/SIV “Exogenous Informative Elements”:

==> ==> BLASTn detailed scans are in Supplementary Materials (Ref1).

Region A: 600 bases (21072 to 21672)

Details:

Hiv-2. France (2012) 66-81

Hiv-1 Sweden (2017) 154-174

Hiv-2 Guinea (2012) 236-253

SIV Africa (2016) 366-386

Interface:

HIV-1 Kenya (2008) 471-501

HIV-1 Cape Verde (2012) 512-529

Region B: 330 bases (21672 to 22002)

Details:

Hiv-2. Côté ivoire (2014) 23 42 *

Siv Tanzania (2016) 29 50 partial overlap

Siv P18 Africa (2016) 77 96 *

Hiv-1. Netherlands (2016). 85. 112. Usa (2011) 85 108 (merged) *

Hiv-2 UC1 Cote d'Ivoire (1993) 132 157 *

Hiv-2 Sénégal (2011) 179 194 *

Hiv-1 Malawi (2013) 212 243 *

Hiv-1. Russia (2010) 242 280 *

SivagmTan-Cameroon (2015) 279 298 *

We consider only the 8 (*) HIV SIV motifs, the 9th is partially in overlap.

These 14 HIV/SIV -EIE- are detailed in SUPPLEMENTARY MATERIALS (ref 1). They are summarized in Table1.

Table 1: Synoptic table of 12 significant EIE from HIV SIV strains in the "A" and "B" regions of the COVID-19 genome (plus two in the interface).

Origins	HIV SIV type	Relative Location		« Exogenous Informative Element » Label	Genba nk Access	Homology	Bases identities	O R F 1 a b	S s p i k e	Real location
Region A: 600 bases: 21072 to 21672
266. 21555. Orf1ab. Relative locations 484/600 (end Orf1ab gene),
2012 France	HIV2	66-81		HIV-2 isolate 56 from France envelope glycoprotein (env) gene, partial cds	JN230 738.1	100,00% Unsignific ant	16/16 Unsignif icant	§		21137 21152
2017 Sweden	HIV1	154-174		HIV-1 isolate 060SE from Sweden, partial genome	MF3 7316 3.1	100,00%	21/21	§		21225 21245
2012 Guinea	HIV2	236-253		HIV-2 isolate CA65410.13 from Guinea-Bissau envelope gene, partial cds	JN 86 3831.	94,00%	17/18	§		21307 21324
					1
2016 Africa	SIV	366-386		Simian immunodeficiency virus isolate VSAA2001, complete genome	KR86 2351. 1	95,00%	20/21	§		21437 21457
21563..25384.			S spike
2008 Kenia [9]	HIV1	471-501		HIV-1 clone ML1592n from Kenya nonfunctional vpu protein (vpu) gene, complete sequence; and nonfunctional envelope glycoprotein (env) gene, partial sequence	EU87 5177. 1	88,00%	28/32	§	§	21542 21572
2012 Cap verde	HIV2	512-529		HIV-2 isolate 05HANCV37 from Cape Verde envelope glycoprotein (env) gene, partial cds	JF 26 7434.	100,00%	18/18		§	21583 21600
					1
Region B: 330 bases (21672 to 22002)
2014 Cote d'ivoire	HIV2	23-42		HIV-2 isolate 106CP_RT from Cote d'Ivoire reverse transcriptase gene, partial cds	KJ13 1112. 1	95,00%	19/20		§	21694 21713
2016	SIV	29-50		Simian immunodeficiency virus	AF00 3044. 1	91,00%	20/22		§	21700
Tanzania				isolate TAN5 from Tanzania,						21721
Partially				complete genome
overlap

Note: « § » indicates location of each HIV / SIV EIE within COVID_19 genome (gene identification). First, it is important to note that all the regions found here are included in one of the 2 main genes of

Evidence for 4 other HIV/SIV EIE sequences in others areas of COVID-19 genome:

We also found 4 other non-contiguous HIV SIV regions summarized in Table 2 below. Details of these searches in the supplementary materials "d".

==> ==> These 4 HIV/SIV -EIE- are detailed in SUPPLEMENTARY MATERIALS (ref 2). They are summarized in Table 2.

Table 2: Synoptic table of 4 gene EIE motifs from HIV SIV strains in others areas than the "A" and "B" regions of the COVID-19 genome.

Origins	HIV SIV type	Genes	« Exogenous Informative Elements » Label	Genban k Access	Homology	Bases identities	O R F 1 a b	S s p i k e	Real location
266. 21555. Orf1ab.
2015 Germany	SIV	POL	Simian immunodeficiency virus	KM37 8564.1	100,00%	20/20	§		8751 8770
			isolate D4 from Germany gag protein (gag) gene, complete cds; pol protein (pol) gene, partial cds; vif protein (vif), vpx protein (vpx), vpr protein (vpr), tat protein (tat), rev protein (rev), and envelope glycoprotein (env) gene...
2016 China	HIV1	ENV	HIV-1 clone XJ47 from China envelope	EU184 986.1	87,00%	33/38	§		14340 14378
			glycoprotein (env)
			gene, partial cds
2004 USA	HIV1	Integrase	Homo sapiens clone HIV1-H9-106 HIV-1	AY516 986.1	93,00%	26/28	§		20373 20401
			integration site
2011 USA	HIV1	ENV	HIV-1 isolate JACH1853_A5 from USA envelope glycoprotein (env) gene, complete cds; and vpu protein (vpu), rev protein (rev), and tat protein (tat) genes, partial cds	HQ21 7329.1	93,00%	28/30	§		20400 20430

Note: « § » indicates location of each HIV / SIV EIE within COVID_19 genome (gene identification).

Table 3: The 17 HIV/SIV EIE according to their homologies with COVID-19 sorted by decreasing % (the merged one from USA is excluded).

HIV SIV strain	COVID-19 gene	Homology
HIV2 Env France 2012 (non-significant)	Orf1ab	100,00%
HIV1 Sweden 2017 (recombinant form in Sweden)	Orf1ab	100,00%
HIV2 Env Cape Verde 2012	S spike	100,00%
HIV2 Pol 2011 Senegal (non-significant)	S spike	100,00%
SIV Pol 2015 Germany	Orf1ab	100,00%
SIV 2016 African Monkey	Orf1ab	95,00%
HIV2 RT Pol 2014 Cote d'ivoire	S spike	95,00%
SIV Env 2016 Africa	S spike	95,00%
HIV2Env 2012 Guinea	Orf1ab	94,00%
HIV1 Integrase 2004 USA	Orf1ab	93,00%
HIV1 Env 2011 USA	Orf1ab	93,00%
SIV 2016 Tanzania	S spike	91.00%
HIV1 Env 2016 Netherlands	S spike	89,00%
HIV1 Env 2008 Kenia	Orf1ab and S spike	88,00%
HIV1 Env 2013 Malawi	S spike	88,00%
HIV1 Env 2016 China	Orf1ab	87,00%

Figure 1: The 18 HIV SIV EIE according to their homologies with COVID-19 sorted by decreasing %.

First, it is important to note that all the regions found here are included in one of the two main genes of COVID-19, so they are «Informative Exogenous Elements». A synthetic chart is in Fig 1.

Some significant results relating to this analyzed region of 930 base pairs (600 + 330) are:

The entire genome has 29903 bases. At least 12 regions are located between the bases 21225 and 21969, which is exactly 744bases.

This therefore represents an average space of 744/12 = 62 bases for each EIE. Or as a % of the whole genome 744/29903 = 2.49% of the whole genome.

As the cumulative length of the 12 EIE is 305 bases, we deduce that the average size of an insert is 337/12

= 25.4bases.

Finally, we deduce an occupancy rate of the 744bases space by EIE from HIV SIV of 25.4/62 = 40.99%. This percentage is considerable.

So, to summarize: a contiguous region representing 2.49% of the whole COVID-19 genome is 40.99% made up of 12 diverse EIE originating from various strains of HIV SIV retroviruses.

Figure 2: Summary chart of the 8 HIV/SIV EIE from region “B”. This summary chart demonstrating how 200bases from various HIV SIV retroviral strains within a concentrated 275bases COVID-19 contig have a density rate equal to 72.73%.

Figure 3: Comparative trends in HIV/SIV EIE densities and average cumulative homologies for 3 clusters.

In these comparative trends in HIV/ SIV EIE densities (blue) and average cumulative homologies (red) for 3 clusters, where 3 region B EIE are side by side, joined by 5 more to complete 8 EIE from region B, plus the final six to integrate all the 14 EIE (A+B cumulated regions).

2. Concatenations of HIV/SIV regions "placed" in sequence and side by side.

Table 2 shows that two very different EIE follow each other side by side in the RNA sequence of COVID-19:

The first, at location 20373 to 20401 comes from an HIV1 Integrase from a USA virus from 2004 ( Homo sapiens clone HIV1-H9-106 HIV-1 integration site, AY516986.1 ), while the second, at location 20400 to 20430 comes from an Envelope from another HIV1 virus from the USA from 2011 ( HIV-1 isolate JACH1853_A5 from USA envelope glycoprotein (env) gene, complete cds, HQ217329.1 ).

Even more surprisingly, in Table 1, we note the same phenomenon between, this time not 2 but 3 EIE from the radically different HIV SIV viruses:

Here are these 3 EIE concatenated with seemingly perfect " watchmaker's precision":

Malawi, year 2013.

HIV1 212-243 HIV-1 isolate

4045_Plasma_Visit1_amplicon9 Malawi envelope glycoprotein (approx) 88.00% 28/32 Location: 21883 21914

Russia, year 2010.

HIV1 242-280 HIV-1 isolate 07. RU.SP-R497.VI.F5 envelope glycoprotein Russia (env) gene 82.00% 32/39 Location: 21913 21951

Cameroon year 2015.

SIV 279-298 partial simian immunodeficiency virus pol gene for Pol, 83.00% 25/30 Location: 21950 21969

It will be observed that the cumulative length in COVID_19 of these 3 EIE is 126 bases of which the HIV occupied bases are 120. So, a total HIV/COVID_19 of 120/126 > 95%, which is artificially remarkable.

Part II

Within this part, a

225-nucleotide long region is unique to COVID_19 and Bat RaTG13, and can also discriminate between these 2 genomes (§3, 4, 5, 6 and 7).

The origin of COVID-19 remains an open question: see particularly [14-20] and [5, 27,30, 33, 34].

In this second part of the RESULTS and DISCUSSION, we will present two types of facts: On the one hand, we will show that the 2 genomes of COVID_19 and Bat RaTG13 are exclusively distinguished from all the other genomes of SARS, MERS and other Bats.

On the other hand, we will analyze several specific facts suggesting that COVID_19 does not originate from Bat RaTG13.

3. Evidence of the absence of 4 HIV/SIV « Exogenous Informative Elements » from COVID_19 within the SARS-2005 and MERS genomes.

In the following Table 4 it appears that 14 of the 18 HIV/SIV EIE existed - already - from the first human SARS genomes that appeared in China around 2003.

However, a novel long region of around 225 nucleotides, less than 1% of the genome, appears to us to have been inserted: This region is completely absent in all SARS genomes, whereas it is present and 100% homologous for all COVID-19 genomes listed in NCBI.

Table 4: Comparing 16 EIE from « A », « B » and remaining regions in COVID-19, HIV/SIV and SARS.

HIV/SIV «Exogenous Informative Elements (EIE) »	Locations within regions of: «A» 600 bases and «B» 330 bases	Length nucleotides in COVID_19	Length nucleotides in HIV and SIV EIE % HIV and SIV / COVID-19	Length nucleotides in SARS genomes % SARS/COVID-19
Region « A »
HIV2 2012 France	66-81	16 non-significant	16 100%	13	81%
HIV1 2017 Sweden	154-174	21	21 100%	19	90%
HIV2 2012 Guinea	236-253	18	17 94%	11	61%
SIV 2016 Africa	366-386	21	20 95%	18	86%
Start 225 bases zone including 4 « Exogenous Informative Elements »
HIV1 2008 Kenia	471-501	32	28 88%	0	0%
HIV2 2012 Cap verde	512-529	18	18 100%	0	0%
Region « B »
HIV2 2014 Cote d'ivoire	23-42	20	19 95%	0	0%
SIV 2016 Africa	77-96	20	19 95%	0	0%
End 225 bases EIE zone including 4 « Exogenous Informative Elements » (note1)
HIV1 2016 Netherlands variant HIV1 USA 2011	85-112 85-108	28	25 89%	13 9	46% 32%
HIV2 1993 côte ivoire	132-157	26	22 85%	20	77%
HIV2 2011 Sénégal	179-194	16 non-significant	16 100%	12	75%
HIV1 2013 Malawi	212-243	32	28 88%	22	69%
HIV1 2010 russia	242-280	39	32 82%	15	38%
SIV 2015 Cameroun.	279-298	30	25 83%	10	33%
others areas than the "A" and "B" regions
SIV 2015 Germany	8751 8770	20	20 100%	9	45%
HIV1 2016 China	14340 14378	38	33 87%	34	89%
HIV1 2004 USA	20373 20401	28	26 93%	28	100%
HIV1 2011 USA	20400 20430	30	28 93%	21	70%

Note1: this genome HIV-1 USA 2011 is self-contained within the HIV-1 2016 Netherlands variant in the 225 bases area (85-108 and 85-112), the 225 bases frontier is in the relative region “B”.

Here we wanted to find out if the 16 EIE discovered in the COVID-19 genome already existed in the human SARS genomes that appeared in 2003.

Table 4 summarizes this research. In particular, it appears that 14 of the 18 HIV/SIV EIE already existed since the first human SARS genomes that appeared in China around 2003.

However, a novel long region of around 225 nucleotides, appears to us to be totally new: This region is completely absent in ALL SARS genomes, whereas it is present and 100% homologous for all COVID-19 genomes listed in NCBI or GISAID COVID_19 genomic databases.

This region is located (in the COVID-19 genome which served as a reference) between the addresses 21550 and 21772. It is therefore located between the end of region "A" (from base 475 to 600) and the start of region "B" (from base 1 to 99).

A remarkable fact is also observed: the HIV/SIV EIEs which already existed in SARS have evolved a lot through numerous mutations. Thus, four EIEs have very weak homologies (near 30%) between their SARS version and their COVID-19 version. These homologies gradually improve in more recent SARS (2015 or 2017 for example, right column in Table 4).

The 4 « Exogenous Informative Elements » added in COVID_19 are respectively:

HIV1 Kenia 2008

HIV2 Cape Verde 2012

HIV2 Ivory Coast 2014

SIV Africa 2016.

The reader will be able to note that these strains HIV1/HIV2/SIV are very recent and subsequent to the emergence of SARS. However, most of the other strains HIV/SIV (HIV1 2017 Sweden, HIV2 2012 Guinea, etc.) have dates posterior to the emergence of the first SARS. This fact will have to be explained …

The case of the MERS genome:

An analysis of the reference genome of the pathogenic RNA virus MERS ( Middle East respiratory syndrome coronavirus, complete genome NCBI Reference Sequence: NC_019843.3, https://www.ncbi.nlm.nih.gov/nuccore/NC_019843.3?report=genbank ) shows that from the end of our "A" region, and from all of the key 225 base regions, of the "B" region and of the "Lyons-Weiler" region. FOUR crucial regions of our article are totally ABSENT in MERS.

4. Evidence for HIV/SIV sequences in this region, and their compaction in the 225 bases portion of both COVID_19 and Bat coronavirus RaTG13 genomes.

We now analyze the level of homologies between the four strains HIV/SIV of the 4 cases which are always present in COVID-19 but always absent in SARS. The remarkable point is as follows: It is strange that the most significant "Bat" genome, Bat coronavirus RaTG13 genome [12], is from 2020, just like COVID-19 ... In particular, for the HIV1 Kenia 2008 sequence [9], [10] bat RaTG13 is the only strain found in the "Bat" population to have it, while for the three other EIE, the "Bat" strains are very numerous but with non-significant HIV/SIV homologies.

Table 5: Comparing the 4 EIE from COVID-19, HIV/SIV and Bat coronavirus RaTG13 [12].

HIV/SIV « Exogenous Informative Elements »			Locations within regions of: « A » 600bases and « B » 330bases	Length nucleotides in COVID_19	Length nucleotides in HIV/SIV EIE % HIV-SIV / COVID_19		Length nucleotides in Bat coronavirus RaTG13 genome
Region « A »
2008 Kenia HIV1			471-501	32	28 88%		27 (note1)	84%
2012 Cap verde HIV2			512-529	18	18	100,00%	16 89%	(note2)
Region « B »
2014 HIV2	Cote	d'ivoire	23-42	20	19	95%	15 79%	(note3)

Note1

COVID-19 / HIV-1 28/32 88%, Only COVID_19 strains, Bat coronavirus RaTG13 and Rhinolophus affinis coronavirus isolate LYRa3 spike protein gene. No others Bat strains.

Note2

COVID-19 / HIV-2 18/18 100%, Bat. 16/18. 89%, Sars urbani. 10/10

Various others Bat and Sars with VERY low homologies but all < 10

Note3

COVID-19 / HIV-2 19/20 95%, had a Bat RaTG13. 15/17. 88%. well. Sars urbani. 9/9 Various others Bat and sArs but all <12

Note4

COVID-19 / SIV. 19/20. 95%, Bat coronavirus 10/10, to exchange RNA with bat RaTG13 HIV, Bat. Bad homology. Various Bat and Sars all <12

We must explain why, for HIV1 Kenya, homologies are the same between COVID_19 and Bat RaTG13, in contrast to the 3 others (Cap verde, Cote d'ivoire, Africa) where the Bat RaTG13 homologies are lower than those of COVID_19.

Zooming on the first HIV1 Kenia Homologies:

Synthesis data: Comparing the 3 key regions « A », « B », and « Lyons-Weiler » region [4] in the cases of COVID-19, Bat RaTG13 coronavirus [12] and the best homologies for other Bat and SARS coronaviruses.

Table 6: Comparing the 3 key regions « A », « B », and « Lyons-Weiler » region [4] in the cases of COVID-19, Bat RaTG13 coronavirus [12] and the best homologies for other Bat and SARS coronaviruses.

Coronavirus genome	Region « A »	Region « B »	Region « Lyons-weiler »
COVID_19	600/600 100%	330/330 100%	1378/1378 100%
Bat RaTG13	563/599 98%	309/330 94%	1209/1311 92%
Other Bat	518/605 86% (note1a)	158/212 75% (Note1b)	402/521 77% (Note1c)
Other SARS	400/474 84% (note2a)	144/177 73% (Note 2b)	297/376 79% (Note2c)

Note1a - Bat SARS-like coronavirus isolate bat-SL-CoVZC45

Note1b - BtRs-BetaCoV/YN2013, complete genome

Note 1c - Bat SARS-like coronavirus isolate bat-SL-CoVZC45, complete genome

Note2a - SARS coronavirus GZ0402, complete genome

Note 2b - SARS coronavirus isolate CFB/SZ/94/03, complete genome

Note2c - SARS coronavirus SZ3, complete genome

5. The determining case of HIV1 Kenya 2008 absent from all coronaviruses other than COVID-19 and bat RaTG13.

==> ==> Please see in Supplementary Materials (Ref 3) complete data on this particular EIE Kenya 2008. To summarize,

The case of HIV1 Kenya 2008

This important HIV1 genome was particularly studied in an HIV vaccine strategy context by Canadian Professor Franck Plummer Lab. Team [9], [10].

This region, in addition to its hundred strong homologies with all the COVID_19 strains of 2020, shows only two other homologies with, on the one hand, Bat coronavirus RaTG13, and at a lower level, with Rhinolophus affinis coronavirus isolate LYRa3 spike protein gene.

The HIV1 Kenya 2008 fingerprint recall: TGTTTTTATTACTTTTATTGCCACTATTCTCT

Here is the detail of these two main homologies:

Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome Sequence ID: NC_045512.2Length: 29903Number of Matches: 1

Score Expect Identities Gaps Strand

37.4 bits (40) 8e-04 28/32(88%) 1/32(3%) Plus/Plus

Query 1 TGTTTTTATTACTTTTATTGCCACTATTCTCT 32

||||||| || |||||||||||||| |||||

Sbjct 21568 TGTTTTTCTTG-TTTTATTGCCACTAGTCTCT 21598

Bat coronavirus RaTG13, complete genome

Sequence ID: MN996532.1Length: 29855Number of Matches1:

Score Expect Identities Gaps Strand

32.8 bits (35) 0.032 27/32(84%) 1/32(3%) Plus/Plus

Query 1 TGTTTTTATTACTTTTATTGCCACTATTCTCT 32

||||||| || |||||||||||||| | |||

Sbjct 21550 TGTTTTTCTTG-TTTTATTGCCACTAGTTTCT 21580

==> ==> Please, see the detailed Table 2.1 in Supplementary Materials Ref 4 (Dates of collection then deposit of various Bat genomes involved in the 225 bases area).

This Table results from the BLASTn analysis on April 10, 2020 option "SARS coronaviruses taxid 694009" reports 386 occurrences including 16 bats and 2 Rhinolophus, and 368 COVID_19.

In this Table, we demonstrate that in ALL Bats genomes others than Bat RaTG13 none of them have the presence of the EIE Kenya 2008.

In ALL cases, the 225 bases region is reduced to contiguous small regions between 17 and 96 bases length. In ALL cases, the Kenya 2008 EIE is totally absent.

We also note in this Table 6 that the Bats closest to COVID_19 were collected between 2013 and 2017, but only sequenced in 2020 (Bat RaTG13 (2013), Bat SARS-like coronavirus isolate Bat-SL-CoVZXC21 (2015), and Bat SARS-like coronavirus isolate bat-SL-CoVZC45 (2017). Alina Chan found that RaTG13 is the same as the “4991” strain with which Zheng-Li was working in 2017-18 (https://archive.vn/4Ot2j).

Location of the EIE HIV1 Kenya 2008 within the junction between the 2 Orf1ab and Spike genes:

Firstly, the EIE regions of HIV1 Kenya 2008 nonfunctional (Sequence ID: EU875177.1) and of HIV1 Kenya real (Sequence ID: FJ623481.1) are identical while the respective Gp120 genes are only 82% homologous: 494/603 (82%).

HIV-1 isolate 06KECst_005 from Kenya, complete genome

Sequence ID: FJ623481.1Length: 8766Number of Matches: 1

Range 1: 5192 to 5794

Score	Expect Identities	Gaps	Strand
595 bits (659)	6e-168 494/603(82%)	3/603(0%)	Plus/Plus

The HIV1 Kenya EIE nonfunctional region from the COVID-19 genome is located overlapping between the end of the "Orf1ab" gene and the start of the "S spike" gene:

Details COVID-19 genes: Orf1ab Spike

266---------------21555 21563-----------------------------25384

HIV-1 Kenya 2008: 21542 21572

COVID_19 Wuhan market ID:LR757998.1 reference genome location of EIE Kenya 2008 HIV1: 21542-21572 bases.

Spike gene location: 21563-25384 bases.

So, in terms of amino acids:

START location of HIV1 KENYA: 21 amino acids before SPIKE begins.

END location of HIV1 KENYA: 9 amino acids after the beginning of SPIKE.

How about this same question in the case of bat RaTG13 genome?

The locations of HIV-1 Kenya within Bat RaTG13 Sequence ID: MN996532.1

is: 21550 TGTTTTTCTTG-TTTTATTGCCACTAGTTTCT 21580

(see RESULTS§ ref 3).

Location of the Spike gene within Bat RaTG13 is: 21545. 25354

/gene="S"

/codon_start=1

/product="spike glycoprotein"

/protein_id="QHR63300.2"

So, in terms of amino acids:

START address of HIV1 KENYA: 6 amino acids after SPIKE begins.

END address of HIV1 KENYA: 36 amino acids after the beginning of SPIKE.

Notably, unlike COVID-19 where HIV-1 Kenya starts before the start of the SPIKE gene, here, in the case of bat RaTG13, HIV1 Kenya is entirely contained within the SPIKE gene.

6. The discovery of a new EIE from the HIV1 group «O» differentiating COVID-19 from the Bat RaTG13 genome.

The HIV-1 group « O » constitutes a subgroup of HIV retroviruses very different comparing with others HIV/SIV subgroups, it appears particularly in Cameroon. However, little is known about group O and why this highly divergent retrovirus genome has not become pandemic [21].

We wanted to look for hypothetical traces of EIE coming from HIV group "O", more particularly, we looked for possible traces in COVID_19 and in bat RaTG13.

We then discover a POL (Integrase) homology from this strain HIV1 group "O", referenced as AF422215.1, which is located towards the 23800 bases of COVID_19.

==> On April 21, 2020, BLASTn reported 489 COVID_19 sequences - all the sequences available on this date - with ALL of the following homology: 20/22 (90.91%), except two2 high level deleted strains reported below.

==> As of May 4, 2020, BLASTn is providing 1578 COVID_19 sequences. All except 3 highly deleted at whole genome scale (Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USA/CA-CZB-IX00017/2020, ID: MT385497.1

, Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/UT-00087/2020,

ID: MT334549.1, Wuhan seafood market pneumonia virus genome, ID: LR757997.1) which are

very highly deleted contain this sequence completely preserved according to its homology of 20/22 bases, ie 90.91% of homology.

We must recall here this homology:

Between HIV-1 strain group O isolate 98CMA010 from Cameroon integrase (pol) gene, partial cds

GenBank: AF422215.1 https://www.ncbi.nlm.nih.gov/nuccore/AF422215.1

and

Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome

Sequence ID: LR757998.1Length: 29866Number of Matches: 1

Range 1: 23804 to 23825

Score Expec Identities Gaps Strand t

31.9 bits (34) 3.0 20/22(91%) 0/22(0%) Plus/Plus

Query 532 ATGGCAGTATTTGTTCACAATT 553

|||||||| ||||| |||||||

Sbjct 23804 ATGGCAGTTTTTGTACACAATT 23825

The same research applied to Bat RaTG13 ID: MN996532.1 produces the results summarized by the Synthesis below:

Synthesis:
HIV1 Group O	532	ATGGCAGTATTTGTTCACAATT 553
COVID_19	23804	ATGGCAGTTTTTGTACACAATT 23825
bat RaTG13	23799	ATGGTAGTTTTTGCACACAATT 23820
differences		X X between COVID_19 and HIV1 gr O
differences		X X between COVID_19 and bat RaTG13
differences		X X XX between bat RaTG13 and HIV1 gr O (18/22)
HIV1 Group O		532 ATGGCAGTATTTGTTCACAATT 553
COVID_19	23804		ATGGCAGTTTTTGTACACAATT	23825
bat RaTG13	23799		ATGGTAGTTTTTGCACACAATT	23820
bat-SL-CoVZXC21		23665	ATGGCAGTTTTTGCACACAA 23684 jui2015 /	5fev2020 / 17/22
			1 2 32 55
bat-SL-CoVZC45		23734	ATGGCAGTTTTTGCACACAA 23753 fev2017 /	5fev2020 / 18/22
			1 2 32 55
SARS strain BtKY72		23639	ATGGTAGTTTCTGTACACAA 23658 aug2007 /	8fev2020 / 17/22
			3 4 12 55
Notes related to numbers under sequences i.e 1,2,3,4,5:

Notes related to numbers under sequences i.e 1,2,3,4,5:

1) similar HIV1 group O see base T identical between HIV1 group « O » and SARS strain BtKY72 (note 1)

2) similar COVID_19 and bat RaTG13

3) similar bat RaTG13

4) different all (COVID_19 and bat RaTG13)

5) Absent contrarly HIV1 group O, COVID_19 and bat RaTG13

It is very interesting to note the following points:

· It is well known that bats have been studied in particular in China in recent years (https://en.wikipedia.org/wiki/Shi_Zhengli).

· The respective collection dates of these Bat genomes are 2007, 2013, 2015, 2017 while all of them were only sequenced in 2020 (with the exception of BtRf-BetaCoV / HeB2013, sequenced in 2017).

· We observe that all these Bat SARS strains have COVID_19 homologies in this region quite close to that of Bat RaTG13.

· It is remarkable to note (note1) this base T which is the only one to be simultaneously present in HIV1 group "O" and in SARS strain BtKY72.

· Finally, while COVID_19 has a homology of 20/22 bases with HIV1 group "O", Bat RaTG13 (2013) and bat-SL-CoVZC45 (2017) have a homology of 18/22 bases with HIV1 group "O".

7. Analysis of local and global cohesions and heterogeneities of the 225 bases COVID_19, bat RaTG13 and SARS Urbani genomes.

Now, we demonstrate how a new region including 4 HIV/SIV EIE radically distinguishes all COVID-19 strains from all SARS and Bat strains.

Then, we will be particularly interested in the Bat RaTG13 strain whose genomic proximity to COVID-19 will be analyzed with the greatest attention and precision.

The theoretical method used here makes it possible to evaluate the overall level of cohesion - then also of heterogeneity - of a sequence of nucleotides, and that independantly of the scale due to the fractal nature of this numerical method.

Full details on this numerical method in [6-8], and recall Methods in supplementary Materials ref 9.

Here we analyze the Master Code of 3 characteristic genomes COVID_19, bat RaTG13 and SARS Urbani.

We will study, for each of these 3 genomes, 5 successive amplitude scales and this according to the 3 reading frames of the codons and on the 2 main and complementary strands:

· whole genomes.

· bases 15,000 to 25,000.

· region including "A", "B", "Lyons Weiler".

· regions of 425 bases including 100, 225, 100 bases.

· 225 bases area.

Table 7: Synthetic Genomics/Proteomic global Master Code coupling (%). Note: we select in each case the best codons reading frame % coupling.

Genome	Selective Region 225 bases
Wuhan market ID: LR757998.1	69.47
Bat RaTG13 ID: MN996532.1	92.13
SARS Urbani ID: MK062180.1	Absent

The main result to be discussed now is the comparison between both 225 bases area analyzes of COVID_19 and Bat RaTG13.

We must recall here both 225 bases area within Wuhan market ID: LR757998.1 reference and bat RaTG13 genomes:

Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome

Sequence ID: LR757998.1Length: 29866Number of Matches: 1

Score Expect Identities Gaps Strand

407 bits(450) 7e-114 225/225(100%) 0/225(0%) Plus/Plus

Bat coronavirus RaTG13, complete genome

Sequence ID: MN996532.1Length: 29855Number of Matches: 1

Score Expect Identities Gaps Strand

312 bits (345) 4e-85 204/225(91%) 0/225(0%) Plus/Plus

The sequence SARS Urbani is totally absent selecting 1000 SARS like genomes in BLAST.

Homology of the 225 bases area between Wuhan market ID: LR757998.1 ref. and bat RaTG13 is very important: 204/225 bases (91% homology).

Analyzing the locations of the 4 HIV1 HIV2 EIE within the 225 bases area:

Wuhan market ID: LR757998.1 start address: 21543. Bat start address: 21550. Nucleotides and amino acids within Wuhan market ID: LR757998.1:

HIV1 Ken 471 501	ya 2008 Nucleotides addresses within region « A » 600 bases
1 31	Nucleotides addresses within region 225 bases
1 10	Amino acids within region 225 bases
HIV2 Cap verde 2012 512 529 Nucleotides addresses within region « A » 600 bases 42. 59 Nucleotides addresses within region 225 bases 14. 20 Amino acids within region 225 bases

HIV2 Cote d' ivoire 2014

66 85 Nucleotides addresses within region « B » 330 bases

195. 214. Nucleotides addresses within region 225 bases

65. 71 Amino acids within region 225 bases

SIV Africa 2016

76 97 Nucleotides addresses within region « B » 330 bases

205. 226 Nucleotides addresses within region 225 bases

68. 75 Amino acids within region 225 bases

Nucleotides homologies between Bat RaTG13 [21549 on 225 bases] and COVID_19 ID: LR757998.1 ref [21542 on 225 bases]

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 Kenya HIV1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 Cap verde HIV2

1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0

1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1

1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1

0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 2 last HIV2 and SIV have a partial overlap.

1 1 1 1 1 1 1 1 1 1 1 0 1 1 1

Then, only 20 bases differences on 225 bases.

Note : The regions in bold correspond to the relative positions of the 4 EIEs HIV1 Kenya 2008, HIV2 Cape Verde 2012, HIV2 Cote d (ivoire 2014 and SIV Africa 2016. “1” significates same nucleotide value in COVID_19 and RaTG13. “0” significates different nucleotide value in COVID_19 and RaTG13.

Wuhan market ID: LR757998.1 ref region 225 basesFrame1

TGTTTTTCTTGTTTTATTGCCACTAGTCTC

TAGTCAGTGTGTTAATCTTACAACCAGAAC

TCAATTACCCCCTGCATACACTAATTCTTT

CACACGTGGTGTTTATTACCCTGACAAAGT

TTTCAGATCCTCAGTTTTACATTCAACTCA

GGACTTGTTCTTACCTTTCTTTTCCAATGT

TACTTGGTTCCATGCTATACATGTCTCTGG

GACCAATGGTACTAA

bat RaTG13 region 225 bases Frame1

TGTTTTTCTTGTTTTATTGCCACTAGTTTC

TAGTCAGTGTGTTAATCTAACAACTAGAAC

TCAGTTACCTCCTGCATACACCAACTCATC

CACCCGTGGTGTCTATTACCCTGACAAAGT

TTTCAGATCTTCAGTTTTACATTTAACTCA

GGATTTGTTTTTACCTTTCTTCTCCAATGT

GACCTGGTTCCATGCTATACATGTTTCAGG

GACCAATGGTATTAA

COVID_19 Wuhan market ID: LR757998.1 region 225 bases FRAME1

=======

CYS PHE SER CYS PHE ILE ALA THR SER LEU Kenya HIV1

ARR SER VAL CYS ARR SER TYR ASN GLN ASN Cap verde HIV2

SER ILE THR PRO CYS ILE HIS ARR PHE PHE

HIS THR TRP CYS LEU LEU PRO ARR GLN SER

PHE GLN ILE LEU SER PHE THR PHE ASN SER

GLY LEU VAL LEU THR PHE LEU PHE GLN CYS

TYR LEU VAL PRO CYS TYR THR CYS LEU TRP 2 last HIV1 and SIV have a partial overlap

ASP GLN TRP TYR ARR

bat RaTG13 region 225 bases FRAME1

=======

CYS PHE SER CYS PHE ILE ALA THR SER PHE Kenya HIV1

ARR SER VAL CYS ARR SER ASN ASN ARR ASN Cap verde HIV2

SER VAL THR SER CYS ILE HIS GLN LEU ILE

HIS PRO TRP CYS LEU LEU PRO ARR GLN SER

PHE GLN ILE PHE SER PHE THR PHE ASN SER

GLY PHE VAL PHE THR PHE LEU LEU GLN CYS

ASP LEU VAL PRO CYS TYR THR CYS PHE ARG 2 last HIV1 and SIV have a partial overlap

ASP GLN TRP TYR ARR

Note: The best nucleotides and amino acids matchings must be analyzed from the 3 codons and directions of codons reading frames.

In other words, in this above Table5 we see that apart from HIV1 KENYA the HIVs of the 225 bases area are more homologous in Wuhan market ID: LR757998.1 than in ba tRATG13.

Figure 4: High level of HETEROGENEITY within the 225 bases area in Wuhan market reference genome. In this COVID_19 wuhan market ID: LR757998.1 reference genome, the coupling between Genomics pattern (red) and Proteomiics pattern (blue) appear highly disturbed, unstable, and “chaotic”. Their correlation is poor (69.47%).

Figure 5: High level of COHESION in 225 bases bat RaTG13 region. This high level of COHESION in 225 bases bat RaTG13 region which include the fingerprint of Kenya HIV1 but, probably, not the 3 others HIV SIV signatures. Then, also, both Genomics pattern (red) and Proteomics pattern (blue) appear highly “harmonic” and correlated (92.13%).

We will draw the reader's attention to the 2 figs 4 and 5 above: The first concerns the 225 bases area of COVID-19 (Fig 4), it appears chaotic and not very organized. On the contrary, the same analysis for the same 225 bases region in bat RaTG13 (Fig 5) shows a more "smoothed" and regular profile. Let us not forget that this sequence, although filed in 2020, was taken in 2013, then 7 years earlier.

Here is how we explain this difference: the “DNA master code” (see supplementary materials ref 9) allows us to measure a certain level of cohesion and homogeneity between the genomic pattern (double stranded DNA) and its corresponding proteomic image (translation into amino acids). Here, as we pointed out in the article, the 3 EIEs cap verde, cote d'ivoire and Afrika were probably integrated by the natural evolution of Bat RaTG13, we would assume that the EIE Kenya would have has been integrated very recently (red line in Fig 5). On the contrary (Fig 4), for COVID_19, there are the whole 4 EIEs that would have been inserted very recently. This would result in this chaotic image in Fig 4.

Part III

In the decreasing slope of the epidemic, this 225 bases area on exhibits an abnormally high rate of mutations/deletions, particularly in USA Seattle WA state (§8, 9 and 10).

8. First encouraging mutations in the 225 bases, « A » and « B » regions, particularly in USA WA state.

We must recall here that the BLASTn analysis on April 10, 2020 option "SARS coronaviruses" reports 386 occurrences including 16 bats, 2 Rhinolophus, and 368 COVID_19. The same research running on 16 april 2020 reveals 523 strains sequences. The number of COVID_19 sequences available is therefore constantly changing principally due to USA new sequences deposits.

We were interested in the first cases of significant COVID_19 mutations in this key region of 225 bases (homologies of the order of 96%). we find 5 of them located in the BLASTn just in front of and near RaTG13, all come from the USA, taken and sequenced in April 2020, pathogenic.

A BLASTn analysis dated April 11, 2020 produces the following results: 386 sequences in total. whose:

351 strains with full 100% homology with 225 bases area.

17 strains with mutations in 225 bases area.

18 strains bat.

Now let's look at these 17 cases of mutations in the 220 bases region.

Table 8: Mutations in region 225 bases

Strain number

Strain reference

Mutations relatives addresses within 225 bases area

Homologies

HIV1/SIV EIE

(note1)

Collection and deposit dates

USA

SARS-CoV-2/WA-UW381/human/2020/USA,

partial genome Sequence ID: MT263460.1

C/T

224/225

99.6%

HIV1

Kenya 2008

30 mar 2020

6 apr 2020

USA

SARS-CoV-2/WA-

UW334/human/2020/USA, complete genome Sequence ID: MT263414.1

C/T

224/225

99.6%

HIV1

Kenya 2008

mar 2020

apr 2020

USA

ARS-CoV-2/WA-

UW301/human/2020/USA, complete genome Sequence ID: MT263384.1

C/T

224/225

99.6%

mar 2020

apr 2020

USA

SARS-CoV-2/WA-

UW270/human/2020/USA, partial genome

Sequence ID: MT259262.1

C/T

224/225

99.6%

mar 2020

apr 2020

USA

SARS-CoV-2/WA-

UW257/human/2020/USA, complete genome Sequence ID: MT259249.1

157

G/C

224/225

99.6%

13 mar 2020

6 apr 2020

USA

SARS-CoV-2/WA-

UW231/human/2020/USA, complete genome Sequence ID: MT246488.1

C/T

224/225

99.6%

HIV1

kenya 2008

mar 2020

apr 2020

USA

SARS-CoV-2/WA-

UW204/human/2020/USA, complete genome Sequence ID: MT246461.1

C/T

224/225

99.6%

HIV1

kenya 2008

mar 2020

apr 2020

China

SARS-CoV-2/KMS1/human/2020/CHN, complete genome

Sequence ID: MT226610.1

217

T/A

224/225

99.6%

SIV Africa 2016

jan 2020

apr 2020

Finland

CoV-FIN-29-Jan-2020, partial genome Sequence ID: MT020781.2

140

C/T

224/225

99.6%

jan 2020

mar 2020

China

SARS-CoV-2/Yunnan- 01/human/2020/CHN, complete genome Sequence ID: MT049951.1

T/A

224/225

99.6%

jan 2020

apr 2020

USA

2019-nCoV/USA-CA5/2020, complete genome

Sequence ID: MT027064.1

140 C/T

224/225

99.6%

24 mar 2020

06 apr 2020

12 USA

SARS-CoV-2/WA-

UW302/human/2020/USA, partial genome Sequence ID: MT263385.1

175-176

CA/NN 164-166 CCT/NNN

220/225

97.7%

23 mar 2020

6 apr 2020

13 USA

SARS-CoV-2/WA-

UW356/human/2020/USA, complete genome Sequence ID: MT263436.1

188-196

TTCCATGC T/NNNNNN NNN

216/225

96%

HIV2 cote d'ivoire 2014

24 mar 2020

06 apr 2020

14 USA

SARS-CoV-2/WA-

UW351/human/2020/USA, complete genome Sequence ID: MT263431.1

189-197

TTCCATGCT A/NNNNNN NNN

216/225

96%

HIV2 cote d'ivoire 2014

24 mar 2020

06 apr 2020

15 USA

SARS-CoV-2/WA-

UW287/human/2020/USA, complete genome Sequence ID: MT259277.1

189-197

TCCATGCT A/NNNNNN NNN

216/225

96%

HIV2 cote d'ivoire 2014

15 mar 2020

06 apr 2020

16 USA

SARS-CoV-2/WA-

UW306/human/2020/USA, partial genome Sequence ID: MT263389.1

145-191

46 del

144/144

100%

then 34/34

23 mar 2020

06 apr 2020

17 China

Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome

Sequence ID: LR757997.1

106-225

120 del

1-105

100%

HIV2

cote d'ivoire 2014

and SIV Africa 2016

31 dec 2019

06 mar 20209

17 COVID-19 different strains ===> 5 different « IEE » HIV/SIV

Note1: when the mutation is in HIV/SIV insert, we note the strain ref.

We observe that out of these 17 cases of mutations, the majority of them (13/17) concern the USA with dates posterior to the Chinese origin of the pandemic. Only 3 relate to China and one to Finland. There is probably the beginning of a mutation strategy of the genome to balance and integrate exogenous HIV EIE.

9 of these 17 mutations directly affect an HIV / SIV region. The others affect the intermediate region separating the 2 and 2 HIV / SIV pools.

It will also be noted that the majority of these strains come from recent samples (12/17 have dates of collection posterior or equal to March 2020). These dates would therefore correspond to a "mature" period of the COVID_19 genomes, which have now entered a phase of diversified mutations.

Finally, we observe the repetition of several mutations, proof of a robust mutation strategy which eliminates the hypothesis of sequencing errors.

We note that 5 different HIV/SIV EIE and 5 mutations regions are matching within the 17 different COVID_19 strains.

Now we consider Table 9 – Comparing 225 bases area significative mutations § deletions % with whole genomes mutations and deletions %.

Table 9: Comparing 225 bases area significative mutations § deletions % with whole genomes mutations and deletions %.

Strain number

Strain reference

Mutations relatives addresses

Homologies region 225 bass

Homologies whole genomes

HIV1/S IV

EIE

Collection and deposit

within 225 bases area

same region in reference genome LR757998.1

and mutations rate %

/ whole

reference genome LR757998.1

and mutations rate %

dates

12 USA

SARS-CoV-2/WA-

UW302/human/2020/USA, partial genome Sequence ID: MT263385.1

175-176

CA/NN 164-166 CCT/NNN

220/225

97.7%

2.222222%

29517/ 29598

= 81

99.726333 %

0.273667%

23 mar

2020

6 apr 2020

13 USA

SARS-CoV-2/WA-

UW356/human/2020/USA, complete genome Sequence ID: MT263436.1

188-196

TTCCATGC

T/ NNNNNNN NN

225-9

= 216

96%

4.000000%

29828/ 29846

= 18

99.939690 %

0.060309%

HIV2

cote d'ivoir e 2014

24 mar

2020

06 apr

2020

14 USA

SARS-CoV-2/WA-

UW351/human/2020/USA, complete genome Sequence ID: MT263431.1

189-197

TTCCATGC TA/NNNNN NNNN

225-9

= 216

96%

4.000000%

29834/ 29852

= 18

99.939702 %

0.060297%

HIV2

cote d'ivoir e 2014

24 mar

2020

06 apr

2020

15 USA

SARS-CoV-2/WA-

UW287/human/2020/USA, complete genome Sequence ID: MT259277.1

189-197

TCCATGCT A/NNNNNN NNN

225-9

= 216

96%

4.000000%

29843/ 29866

= 23

99.922989 %

0.077011%

HIV2

cote d'ivoir e 2014

15 mar

2020

06 apr

2020

16 USA

SARS-CoV-2/WA-

UW306/human/2020/USA, partial genome Sequence ID: MT263389.1

145-191

46 del

225-179

= 46

79.5555%

20.44444%

29517/ 29598

= 81

99.726332 %

0.273667%

23 mar

2020

06 apr

2020

17 China

Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome

Sequence ID: LR757997.1

106-225

120 del

225-105

=120

46.6666%

53.333333%

19263/29388

= 10125

65.547162 %

34.452838%

HIV2

cote d'ivoir e 2014

and

31 dec

2019

06 mar

20209

SIV

Africa

2016

In Table 9, results involving 6 significant genomes show a great average mutations level in each 225 bases regions (13.5687%) than in their relating whole genomes (0.3496%). Then a ratio between average rate mutations region 225 bases and average rate mutations whole genome = 38.813, due principally to the wuhan market hyper deleted genome LR757997.1

Note: last line ref17 China has many deleted or « N » regions: 19263 TCAG nucleotides on 29470 length, then 10207 nucleotides deletions or undetermined nucleotides regions.

The following Fig 6 illustrates these results.

Figure 6: Comparative time evolution in WA mutations/deletions rates % at whole genome and 225 bases levels.

This chart illustrates for 5 COVID_19 USA strains collected from NCBI data banks in April 2020, the mutation rate from 225 bases regions and whole genomes. In all cases, the mutation rate is greater at 225 bases region that at whole genome scale.

Now, we do the same study for high density EIE regions « A » and « B » :

==> ==> The 2 Tables (Table Ref 6.1 and Table Ref 6.2) are available in Supplementary Materials Ref 6:

In Table Ref 6.1 – Region « A » interesting mutations, and in Table Ref 6.2 – Region « B » interesting mutations.

We obtain the same kind of results:

For region « A » analysis (Table Ref 6.1), we note that 5 different HIV/SIV EIE and 5 mutations regions are matching within the 8 different COVID_19 strains.

Supplementary Materials

For region « B » analysis (Table Ref 6.2), we note that 20 different HIV/SIV EIE and 13 mutations regions are matching within the 13 different COVID_19 strains.

Supplementary Materials

The following Fig 7 illustrates these highly significant results.

Fig 7 illustrates for 5 COVID_19 USA strains collected from NCBI data banks in April 2020, the mutation rate from regions « A »+ « B » (then 600+330bases) regions and whole genomes. In all cases, the mutation rate is greater at regions « A »+ « B » region that at whole genome scale.

Figure 7: Comparative time evolution in WA / Minesota regions “A” and “B”. This chart represents (WA and Minesota strains first mutations) and mutations/deletions rates % at whole genome and in the case of region 930 bases = region « A » (600bases) + region « B » (330 bases).

Some conclusions on the geographical evolution of the genome:

In China, the strains seem to have changed very little in mutations (with the exception of Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome Sequence ID: LR757997.1).

In Italy and in France, we find no remarkable mutation vis-à-vis the Chinese reference genome.

It is in Spain and the USA that we detect the most significant traces of a notorious evolution of the genome: In Spain, recent sequences (March 2020) demonstrate significant deletions and mutations in regions containing EIE. According to the first results of analyzes [13], this genome would not have increased its pathogenicity and would seem to use new modes of transmission.

In the USA, the analysis of multiple sequences from the Seatle region (WA) and Minnesota shows a clear growing trees progressiveness in the mutations then successive deletions of the regions "A", "B" and 225 bases, thus:

Table8 (ref 1 to 7, then 11 to 13), we progress from simple mutations to longer mutations on 3 codons, they affect HIV / SIV EIE.

Table Ref 6.1 (from Sup. Materials): also, there are grouped mutations (ref 4, 5) affecting EIE areas.

Table Ref 6.2 (from Sup. Materials): here we illustrate at best a sort of "shedding" of EIE regions in which these genomes progress: thus, (ref 3 5 6 7), the mutations affect 2 or 3, then 8 consecutive bases.

Then (9 10 11 12), in addition to other new mutations, it is whole pieces, on several tens of bases of the genome which are deleted. The most remarkable point is that in all these cases, it is indeed EIE regions which are targeted.

On the most recent date of April 23, 2020, we can check how other COVID_19 strains from Seatle WA have new deletions located in regions “A” and "B" of our article. It is deletions that are "shedding" in part of the EIE HIV / SIV located in region “A” and also in region “B”, particularly in the “side by side” EIE (see in Table 1: HIV1 Malawi 2013, HIV1 Russia 2010, SIV Cameroon 2015). There is the case particularly for:

Sequence ID: MT188341.1Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USA/WA-UW386/2020, partial genome

Length: 29835 collected 5mar2020, sequenced13mar2020,

Sequence ID: MT263466.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USA/WA-UW386/2020, partial genome

Length: 29634 collected 16mar2020, sequenced 15apr2020

Sequence ID: MT263385.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USA/WA-UW302/2020, partial genome

Length: 29610 collected 23mar2020, sequenced 15apr2020

Sequence ID: MT293224.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USA/WA-UW-1608/2020, complete genome

Length: 29847 collected 18mar2020, sequenced 15apr2020

Sequence ID: MT293213.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USA/WA-UW-1574/2020, complete genome

Length: 29887 collected 19mar2020, sequenced 15apr2020

9. Generalization of the analysis of 225 base regions in genomes of recent USA patients who have mutated.

In order to formally demonstrate the specificity of this region of 225 bases located from base 21542 of 225 bases, we are exploring regions of the same size every 5000 bases throughout the genome of COVID_19. Let be from bases 1542, 6542, 11542, 16542, 26542. We can then deny or affirm the fact that this region of 225 bases that we have highlighted would indeed have a tendency to mutate or even to be partially deleted as this seems to appear for certain WA Seattle strains reported here (Fig 8). Table 10 below shows how the mutation rate of the 225 bases area is always much higher than that of the 5 regions 225 bases explored every 5000 bases (34.82 times).

Table 10: This Table summarizes remarkable results: they demonstrate the exclusive specificity of the 225 bases area which appears here in an obvious way to mutate in priority.

Strain numbe r	Strain reference	Mutation s relatives address es within 225 bases area	Homologie s 225 bases area / same region in reference genome LR75799 8 .1 and mutations rate %	Homologi es whole genomes / whole reference genome LR7579 9 8.1 and mutations rate %	20kb Upstre am region 225	15kb Upstre am region 225	10kb Upstre am region 225	5kbUp strea m region 225	5kb Down strea m region 225	Ratio area 225 bases / avera ge 5 others 225 bases areas
12 USA WA 23mar 2020	SARS-CoV- 2/WA- UW302/human /2020/USA, partial genome Sequence ID: MT263385.1	175-176 CA/NN 164-166 CCT/NN N	220/225 97.7% 2.222222%	29517/ 29598 = 81 99.72633 3 % 0.273667 %	0,00%	0,00%	197 A/T 0.44%	0,00%	183- 185 CAC/N NN 1.33%	6.24 Times
13 USA WA 24mar 2020	SARS-CoV- 2/WA- UW356/human /2020/USA, complete genome Sequence	188-196 TTCCAT GCT/ NNNNNN NNN	225-9 = 216 96% 4.000000%	29828/ 29846 = 18 99.93969 0 % 0.060309 %	0,00%	0,00%	197 A/T 0.44%	0,00%	0,00%	45 Times
	ID: MT263436.1

The following Fig 8 illustrates these strong results.

Figure 8: High level of deletions in the 225 bases area comparing to others 225 bases regions.

Horizontally: 5 patients from WA state with 225 bases area mutations. Vertically: proportional to mutations/deletions amount. The red surface is related to 225 bases Real area. The others four coloured areas are related to average amount of mutations/deletions rates for the 5 others 225 bases régions and whole genome. Ratio (i.e. 32.86 Times) is the ratio between the red 225 bases area and the average of others régions mutations/deletions rates. To summarize these remarkable results: they demonstrate (red areas) the exclusive specificity of the 225 bases area which appears here in an obvious way to mutate in priority, probably in order to get rid of the exogenous EIE regions characterizing this region.

10. New evidence of increased deletions from region 225 bases in WA State in the USA.

As of May 2, 2020, we wanted to assess whether the 225 bases area of the COVID-19 strains continued to mutate in the WA state region in particular. Out of 1578 COVID_19 strains accessible to date, 32 presented significant mutations (more than 2 bases out of 225). Among them, 30 came from the USA (see table 12 below and Fig 9), the last 2 from Wuhan and the Czech Republic are not considered here. Among these 30 USA strains, 22 came from the state of WA, 5 from CA, 2 from Utah, and 1 from the state of New York.

The 3 most remarkable facts are:

On the one hand, a great diversity of places and types of mutations and deletions in the region of 225 bases. It will be interesting to locate these mutations vis-à-vis the positions of the 4 EIEs in this region.

On the other hand, new types of mutations are also appearing in states other than WA, in California in particular.We can conclude from this that this key region of 225 bases continues to be shed from its genome by the virus COVID_19.

Thirtly, there is a high variety and diversity of mutations and deletes: On these 30 USA cases, 20 cases are totally different mutation/deletions configurations.

Table 11: This Table demontrates expansion and diversity of 225 bases area on 2 May 2020, particularly in WA Seattle USA state.

Label	Reference	Strain description		Mutations/ deletions	Mutations rate	Integrity Genomic s/Proteo mics % Master Code
USA0 WA	Reference Genome WA seattle	Severe acute respiratory coronavirus 2 isolate2/human/USAWA-UW391/2020, genome	Syndrome SARS-CoV- complete	0 del	No	88.4
		GenBank: MT293156.1
USA0 UT	Reference Genome UTah	Severe acute respiratory coronavirus 2 isolate 2/Human/USA/UT-02025/2020, genome Gerbante: MT536977.1	syndrome SARS-CoV- complete	0 del	No	84.7
USA0 NY	Reference Genome NY	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/NY-CDC- SURV0985NYC/2020, complete genome Sequence ID: MT434817.1		0 del	No	86.5
CA1	USA CA 28mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CA-CZB- IX00112/2020, complete genome Sequence ID: MT385489.1		121 CAGAT/5N	2.22%	86.9
CA2	USA CA	Severe acute respiratory syndrome coronavirus 2		164-166	2.22%	51.8
	28mar2020	isolate SARS-CoV-2/human/USA/WA-		CCT/NNN	(1/5)
		UW302/2020, partial genome Sequence ID: MT263385.1		175-176 CA/NN
WA1	USA WA 23mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 2225/2020 ORF1ab polyprotein (ORF1ab) and ORF1a polyprotein (ORF1ab) genes, partial cds; and surface glycoprotein (S), ORF3a protein (ORF3a), envelope protein (E), membrane glycoprotein (M), ORF6 protein (ORF6), ORF7a protein (ORF7a), ORF7b (ORF7b), ORF8 protein (ORF8), nucleocapsid phosphoprotein (N), and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT345837.1		177 ATGTTA/6N	2.66%	62.9
CA3	USA CA 23mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CA-CZB- EX00700/2020, complete genome Sequence ID: MT385494.1		137 TTACATTC/8N	3.55%	93.5 <==
WA2	USA WA 20mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1765/2020, complete genome Sequence ID: MT326134.1		189 TCCATGCTA/9 N	4,00%	85.9
WA3	USA WA 20mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1698/2020, complete genome Sequence ID: MT326129.1		189 TCCATGCTA/9 N	4,00%	85.4
WA4	USA WA 18mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1608/2020, complete genome Sequence ID: MT293224.1		188 TTCCATGCT/9 N	4,00%	87.1
WA5	USA WA 19mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1574/2020, complete genome Sequence ID: MT293213.1		189 TCCATGCTA/9 N	4,00%	86
WA6	USA WA 19mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1603/2020, complete genome Sequence ID: MT293200.1		189 TCCATGCTA/9 N	4,00%	86.8
WA7	USA WA 19mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1583/2020, complete genome Sequence ID: MT293198.1		189 TCCATGCTA/9 N	4,00%	86
WA8	USA WA 19mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1567/2020, complete genome Sequence ID: MT293196.1		189 TCCATGCTA/9 N	4,00%	85.8
WA9	USA WA 24mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA- UW356/2020, complete genome Sequence ID: MT263436.1		188 TTCCATGCT/9 N	4,00% (2/5)	87.1
WA10	USA WA 24mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA- UW351/2020, complete genome Sequence ID: MT263431.1		189 TCCATGCTA/9 N	4,00% (3/5)	85.5
WA11	USA WA 15mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA- UW287/2020, complete genome Sequence ID: MT259277.1		189 TCCATGCTA/9 N	4,00% (4/5)	85.7
WA12	USA WA	Severe acute respiratory syndrome coronavirus 2		188	4,00%	57.5
	21mar2020	isolate SARS-CoV-2/human/USA/WA-UW- 1758/2020 ORF1ab polyprotein (ORF1ab), ORF1a polyprotein (ORF1ab), surface glycoprotein (S), ORF3a protein (ORF3a), envelope protein (E), membrane glycoprotein (M), ORF6 protein (ORF6), ORF7a protein (ORF7a), ORF7b (ORF7b), ORF8 protein (ORF8), nucleocapsid phosphoprotein (N), and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT326171.1		TTCCATGCT/9 N
WA13	USA WA 24mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1963/2020 ORF1ab polyprotein (ORF1ab) and ORF1a polyprotein (ORF1ab) genes, partial cds; surface glycoprotein (S), ORF3a protein (ORF3a), and envelope protein (E) genes, complete cds; M gene, partial sequence; ORF6 gene, complete sequence; and ORF7a protein (ORF7a), ORF7b (ORF7b), ORF8 protein (ORF8), nucleocapsid phosphoprotein (N), and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT326080.1		106-118 TTACCCTGAC AAA/13N	5.77%	59.7
WA14	USA WA 28mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 4749/2020, complete genome Sequence ID: MT375449.1		143-152 TCAACTCAG G/10N 156 T/G 158 T/A 162 T/D 165 C/T	6.22%	83.1
CA4	USA CA 8avr2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CA-CZB- IX00141/2020, complete genome Sequence ID: MT385478.1		Del 32 bases 194-225	14.22%	77.2
NY1	USA NY 22mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/NY- PV09161/2020 ORF1ab polyprotein (ORF1ab) gene, partial cds; ORF1a polyprotein (ORF1ab) gene, complete cds; surface glycoprotein (S) gene, partial cds; and ORF3a protein (ORF3a), envelope protein (E), membrane glycoprotein (M), ORF6 protein (ORF6), ORF7a protein (ORF7a), ORF7b (ORF7b), ORF8 protein (ORF8), nucleocapsid phosphoprotein (N), and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT371011.1		Del 32 bases 1- 32	14.22%	63.1
WA15	USA WA 27mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 4744/2020, complete genome Sequence ID: MT375448.1		166-178 TTTCTTTTCC AAT/13N Del 12 214-225	11.11%	71.7
CA5	USA CA 25mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CA-CZB- IX00017/2020, complete genome Sequence ID: MT385497.1		125-144 AGATCCTCA GTTTTACATT C/20N	8.88%	84.6
WA16	USA WA 9mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW71/2020, complete genome Sequence ID: MT252799.1		Del 42 bases 184-225	18.66%	85.4
WA17	USA WA 6avr2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 4707/2020, complete genome Sequence ID: MT375462.1		107-128 TACCCTGAC AAAGTTTTC AGAT/22N	9.77%	67.8
WA18	USA WA 16mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA- UW306/2020, partial genome Sequence		Del 47 bases 145-191	20.88% (5/5)	67.2
		ID: MT263389.1
WA19	USA WA 20mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 1673/2020 ORF1ab polyprotein (ORF1ab) and ORF1a polyprotein (ORF1ab) genes, partial cds; and surface glycoprotein (S), ORF3a protein (ORF3a), envelope protein (E), membrane glycoprotein (M), ORF6 protein (ORF6), ORF7a protein (ORF7a), ORF7b (ORF7b), ORF8 protein (ORF8), nucleocapsid phosphoprotein (N), and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT326131.1		Del 60 bases 132-191 220 A/N	27.11%	85.2
WA20	USA WA 23mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 2220/2020 ORF1ab polyprotein (ORF1ab) and ORF1a polyprotein (ORF1ab) genes, partial cds; surface glycoprotein (S) and ORF3a protein (ORF3a) genes, complete cds; envelope protein (E) and membrane glycoprotein (M) genes, partial cds; and ORF6 protein (ORF6), ORF7a protein (ORF7a), ORF7b (ORF7b), ORF8 protein (ORF8), nucleocapsid phosphoprotein (N), and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT345839.1		Del 53bases 129-181	23.55%	69.8
UT1	USA UT 25mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/UT-00302/2020 ORF1ab polyprotein (ORF1ab) gene, partial cds; ORF1a polyprotein (ORF1ab) gene, complete cds; surface glycoprotein (S) gene, partial cds; ORF3a protein (ORF3a) gene, complete cds; envelope protein (E) and membrane glycoprotein (M) genes, partial cds; ORF6 protein (ORF6) gene, complete cds; ORF7a protein (ORF7a) and ORF7b (ORF7b) genes, partial cds; ORF8 protein (ORF8) gene, complete cds; nucleocapsid phosphoprotein (N) gene, partial cds; and ORF10 gene, complete sequence Sequence ID: MT334562.1		Del 99 bases 1- 99	44,00%	74.4
CA6	USA CA 31mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CA-CZB- EX00719/2020, complete genome Sequence ID: MT385496.1		Del 102 bases 124-225	45.33%	78.1
UT2	USA UT 12mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/UT-00087/2020 ORF1ab polyprotein (ORF1ab), ORF1a polyprotein (ORF1ab), surface glycoprotein (S), ORF3a protein (ORF3a), envelope protein (E), and membrane glycoprotein (M) genes, partial cds; ORF6 protein (ORF6) gene, complete cds; ORF7a protein (ORF7a) gene, partial cds; ORF7b gene, complete sequence; ORF8 protein (ORF8) gene, partial cds; and nucleocapsid phosphoprotein (N) and ORF10 protein (ORF10) genes, complete cds Sequence ID: MT334549.1		Del 103 bases 1-103	45.77%	93.3 <==
China1	China Wuhan 31dec2019	Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome Sequence ID: LR757997.1		Del 120 bases 106-225	53.33% (5)	84.8
WA21	USA WA 31mar2020	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/WA-UW- 4582/2020, complete genome Sequence ID: MT375436.1		Del 190 bases 36-225	84.44%	74.4

Note1 to Note5: these COVID_19 USA strains selected on our BLASTn April scanning (Table 9 and Fig 6) will be re-used, here, in Table11 and Fig 9. Then, we could compare 225 bases genome evolution and

increasing mutations rate between April and May BLASTn scanning analyzes, particularly in the cases of USA WA state COVID_19 strains.

Remark: Considering patients WA2 to WA12, we note 2 sets of common deletions (3 cases from base 188 collected 18 to 24 mars 2020, and 8 other cases from base 189 collected 15 to 24 Mars 2020). This Table 11 demontrates expansion and diversity of 225 bases area on 2 May 2020, particularly in WA Seattle USA state.

Figure 9: Analyzing mutations/deletions within 32 COVID_19 225 bases areas on 2 may 2020.

We compare evolution of patients with mutations/deletions between 2 NCBI genbank genomes sets collected with about 3 weeks delay. In "red" are the 5 "old" (11 April 2020) deletions from Table 10. In "blue" are the 25 "New" (2 May 2020) deletions from Table 11; we conclude that the COVID_19 genomes with deletions sequences available on 2 May 2020 has significantly increased in number but also in length of deletions. Then, we could conclude (blue colors) that USA COVID_19 genomes continue doing large deletions § mutations in critical 225 bases area. In the same time, both amount and diversity of these mutations are increasing and evolving.

Particularly, the average mutation rate of these 30 COVID_19 individual patients is 14.49% with a maximum WA state deletion case with 84.44% mutation rate.

Interestingly some of these deletions/mutations are touching the locations the 4 EIE present in this 225 bases area:

HIV1 Kenya 2008

1 31 Nucleotides addresses within region 225 bases

HIV2 Cap verde 2012

42. 59 Nucleotides addresses within region 225 bases

HIV2 Cote d' ivoire 2014

195. 214. Nucleotides addresses within region 225 bases

SIV Africa 2016

205. 226 Nucleotides addresses within region 225 bases

Locations of the 4 EIE within the 225 bases region (bold) within Wuhan market ID: LR757998.1 ref [21542 on 225 bases]

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Kenya HIV1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Cap verde HIV2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 last HIV2 and SIV have a partial overlap.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A detailed scanning of Table10 (Mutations/deletions column) reveals these intersting data:

Eleven (11) repeated cases of 9bases mutations are located between 188-197 or 189-198, then they « cut » the final HIV/SIV region starting in base 195. Others big deletions destroys systematically the 2 starting EIE region (1-59) or the 2 end EIE region (195-225): i.e Del 32 bases 194-225 and Del 32 bases 1-32 (which destroys exactly HIV1 Kenya EIE). Others bigger deletions erase half (begin or end) sections of the 225 bases region: i.e Del 99 bases 1-99, Del 102 bases 124-225 etc...

Finally, in 20 cases on 30 analyzed USA patients mutations/deletions affect partially or totally one or more of the 4 HIV/SIV EIE regions.

Part IV

The comparative analysis of the SPIKES genes of COVID_19 and Bat RaTG13 (§11, 12, 13 and 14).

11. The region 1770 bases of the 2 proteins SPIKE in COVID_19 and Bat RaTG13.

We will be interested in the sequences of the 2 respective SPIKE proteins of COVID_19 (reference genome used in the article) and Bat RaTG13.The relative addresses are respectively: SPIKBAT: address in Bat RaTG13 of address 21545 on 3810 bases.

SPIKCOV: address in COVID_19 (ref 998) of address 21538 on 3822 bases.

The comparative analysis of these 2 SPIKES sequences highlights the following partition: 1-A first region between bases 1 and 2040 common to COVID_19 and bat RaTG13.

Then, for Spike COVID_19 only, an insertion of 12 bases (CCTCGGCGGGCA) corresponding to the 4 amino acids "PRRA" (Pro, Arg, Arg, and Ala).

Then comes a second common region of 1,770 bases: Located from 2041 on 1770 bases for Bat RaTG13. And located from 2053 to 1770 bases for COVID_19.

We are then confronted with two “anomalies" which are dificult to explain in natural biological conditions:

1) A short insert of 4 amino acids PRRA. This insert is UNIQUE in COVID_19 and does not exist in Bat RaTG13.

2) When comparing for these 2 pairs of regions the synonymous mutations and the non synonymous mutations, an abnormal fact will be highlighted for the second of the regions, that

The first region of 2040 bases (680 amino acids) common to the SPIKES of COVID_19 and Bat RaTG13:The 2 sequences are differentiated by 172 nucleotide mutations.

Let’s finally:

155 different codons.

101 synonymous codons.

For 54 non-synonymous codons.

Then a ratio “Codons synonyms” / “Codons not synonyms” = 101/54 = 1.8703.

Therefore,

“bases involved in synonymous codons” / “bases involved in not synonymous codons” = 5.611. This value close to the ratio “5” corresponds to the standard usually encountered in natural genetic sequences.

The second region of 1770 bases (590 amino acids) common to the SPIKES of COVID_19 and Bat RaTG13: The 2 sequences are differentiated by 90 nucleotide mutations.

Let’s finally:

89 different codons.

83 synonymous codons.

For 6 non synonymous codons ONLY.

Either a ratio “Codons synonyms” / “Codons not synonyms” = 83/6 = 13.8333

Therefore,

“bases involved in synonymous codons” / bases involved in not synonymous codons”= 41.499 .

Thus down tream PRRA region (41.499) is 7.396 times greater than upstream PRRA region (5.611).

This 1770b region represents an "abnormal" level because the ratio of synonymous codons / non- synonymous codons = 41 is completely ABNORMAL. This suggests the possible manipulation of this region of the COVID_19 genome.

Fig 10 below illustrates these “abnormal” results.

And it is the following § which will bring us an unexpected answer to this question ...

Figure 10: Comparing all codons mutations differenciating both Spikes related to COVID_19 and Bat RaTG13.

On the left, we represent the 2040b Spike region upstream the 4 amino acids insert, on the right we represent the 1770b region downstream the 4 amino acids insert. In red, the synonymous codons, in blue the non-synonymous codons. The right chart appears “unnatural”.

It is agreed that covid_19 would come from bat RaTG13. In such a case, the codons of covid_19 would have been modified from those of bat RaTG13.

The majority of these mutations would have led to synonymous codons whereas only 6 out of 590 amino acids in the 1770 base region would have changed values, or around 1%, which remains very low. A question then remains open: why this very low number of mutations in non synonymous codons?

Let us try to explain this abnormal phenomenon. When mutations are natural, the rate of synonymous/ non- synonymous codon mutations is close to 5. This is the case for the region of 2040 bases located upstream of the PRRA (left image in Fig 10.). What is abnormal in the right part of Fig 10., region 1770b, is the very low number of non-synonymous codons (blue) because the rate of change of synonymous codons is normal: the slopes of the 2 straight lines in red are similar. But, paradoxically, it is in the variation of synonymous codons that an explanation of the anomaly must be sought. In Fig 11. of next & 12, we demonstrate that almost all of the nucleotide mutations of this region 1770b concern the third base of codons, precisely, that which generally does not change the amino acid and produces a synonymous codon. The only question we will not be able to answer will be this, a question of ANTERIORITY:

"were the 1770 bases region abnormal mutations of synonymous codons carried out on COVID_19 or on RaTG13?"

An exhaustive inventory of synonymous mutations: « how did 89 codon mutations only lead to six amino acid changes? »

We sought, in particular, the distribution of mutations on the 3rd bases of the 84 synonymous codons: 77 of these 84 codons are divided into 3 classes:

1) class 1. 42 TC or CT.

2) class 2. 18 AG or GA.

3) class 3. 17 TA or AT.

Classes 1 and 2, i.e. 60 mutations, are of the transitions type:(Transition:each of the 4 nucleotide changes between purines or between pyrimidines: T <=> C or A <=> G).

Whoever observes the structure of the table of the universal genetic code organized according to the TCAG order, will notice that the 60 codons of classes 1 and 2 are found in 2 adjoining vertical boxes, therefore in the same amino acid. Likewise, certain amino acids like GLY, VAL, PRO, LEU, SER, ALA, THR or ARG occupy 4 contiguous vertical cells, where the 17 mutations of class3 TA/AT produce the same amino acid.

This is how we demonstrate how 77 of 84 mutations on the 3rd base of codons will not have produced amino acid changes.

12. Evidence of a SPIKE significant EIE of Plasmodium Yoelii and of a possible HIV1 EIE with a crucial Spike mutation.

The search for possible EIEs in COVID_19 and Bat RaTG13, both at the level of whole genomes, of the protein Spike, or of the critical region of 1770 bases highlights different candidate EIEs (see supplementary materials ref 7). The analysis of the region of 1770 bases more particularly reveals an EIE with a high probability BLASTn, moreover, the analysis via the Master Code points to a very probably precise functional site in this same region located towards the relative address 300 (100 amino acids (see supplementary materials ref 7a):

Plasmodium yoelii strain 17X genome assembly, chromosome: 10

Sequence ID: LM993664.2Length: 2065729Number of Matches: 2

Score Expect Identities Gaps Strand

46.4 bits (50) 0.004 36/42(86%) 1/42(2%) Plus/Plus

Query 296 CACAAGTCAAACAAATTTACAAAACAC-CACCAATTAAAGAT 336

||||| ||||||||||||||||||||| ||||| ||| || Sbjct 5556

CACAAATCAAACAAATTTACAAAACACAAACCAAAAAAAAAT 5597

This EIE appears in several chromosomes of the plasmodium yoelii. In particular, it was quickly identified as a protein with the name “Fam a” Plasmodium yoelii “fam-a” protein (PY17X_0018000), partial mRNA Sequence ID: XM_022956016.1

We should remember here that Plasmodium Yoelii is studied in mice in malaria vaccine

strategies [29].

An analysis of amino acid homologies confirms the very probable insertion of this EIE in COVID_19, in fact, 10 amino acids concentrated in a short sequence are homologous between COVID_19 and Plasmodium Yoelii protein "Fam a" (supplementary materials ref7b).

Analysis of the region in SPIKE Covid_19, located at the address 2052 + 295 on 42 product bases:

CAC AAG TCA AAC AAA TTT ACA AAA CAC CAC CAA TTA AAG ATT …/...

Either on the first reading frame of the codons:

HIS LYS SER ASN LYS PHE THR LYS HIS HIS GLN LEU LYS ILE …/...

We can easily verify that this codon reading frame is indeed that of the "Fam a" protein:

/product="fam-a protein protein_id="XP_022810934.1"

/db_xref="GeneID:3801450"

/translation="MNIFFVQIVLFLLIISLCVNKNTLATELIPKKDKK

HKSNKFTKH KP K

KNKKCYPTYDNTKEIYQKN.../...

The homologous region on yoelii "Fam a", produces:

CAC AAA TCA AAC AAA TTT ACA AAA CAC AAA CCA AAA AAA AAT.../...

Either on the first reading frame of the codons:

HIS LYS SER ASN LYS PHE THR LYS HIS LYS PRO LYS LYS ASN.../...

Or an almost perfect homology of amino acids despite 2 synonymous codons underlined here (AAG / AAA and AAG / AAA).

For information, the same analysis conducted on Bat RaTG13 produces:

CTC AAG TTA AAC AAA TTT ATA AGA CAC CAC CAA TTA AAG ATT …/...

LEU LYS LEU ASN LYS PHE ILE ARG HIS HIS GLN LEU LYS ILE …/...

The remarkable fact is the following: the amino acid homology between the region COVID_19 and Yoelii "Fam a" (10/14) is greater than that between Bat RaTG13 and yoelli "Fam a" (6/14), and equivalent to the homology between Bat RaTG13 and COVID_19 (10/14).

Which is much less obvious as homology (6 amino acids instead of 10).

One question: did this Plasmodium yoelii EIE already exist in SARS? We analyze SARS Exon1 Sequence ID: FJ882956.1 (collected 2008, sequenced then published 2010). Curiously, another small homology with SIV ENV appear also (see supplementary materials ref 7c and ref7d).

The following cross homologies with Plasmodium Yoelii quickly appear:

SIV 24/33 bases 3/14 amino acids.

SARS. 31/42 bases. 8/14 amino acids (including a Stop codon). Bat RaTG13. 34/42 bases 6/14 amino acids.

COVID_19. 36/42 bases 10/14 amino acids.

Finally, the global homology between these 5 sequences is:

SARS CTC AAG TCA AAC AAA TGT ACA AAA CCC CAA CTT TGA AAT ATT RATG13 CTC AAG TTA AAC AAA TTT ATA AGA CAC CAC CAA TTA AAG ATT COVID CAC AAG TCA AAC AAA TTT ACA AAA CAC CAC CAA TTA AAG ATT YOELII CAC AAA TCA AAC AAA TTT ACA AAA CAC AAA CCA AAA AAA AAT

SIV AC AAG gCA AA_ AgA gTT AgA AAA CAC CAC CAA T...

Meanwhile, the homology between COVID_19 and SIV is here:

SIV / COVID_19: 28/33 bases 5/14 amino acids.

In this array we underlined amino acids homologies. It can be seen in this table that the amino acids of COVID-19 homologous to those of Yoelii result from a sort of "fusion" between those of SARS and those of Bat RaTG13.

It is interesting to note that this EIE of Plasmodium Yoelii in Spike COVID_19 is not an isolated case. For example, in the region "B" of 330 bases, very rich in EIE HIV / SIV, we can demonstrate the presence of EIE of Plasmodium Yoelii proteins (see supplementary materials ref 7e).

Another homology is added: SIV (supplementary materials ref 7d):

Simian immunodeficiency virus isolate UG31 from Tanzania gag protein (gag) and pol polyprotein (pol) genes, partial cds; vif protein (vif) and vpr protein (vpr) genes, complete cds; and tat protein (tat), rev protein (rev), and envelope glycoprotein (env) genes, partial cds

Sequence ID: JN091692.1Length: 5254Number of Matches: 1

Score Expect Identities Gaps Strand

34.6 bits(37) 7.8 28/33(85%) 1/33(3%) Plus/Plus

Query 297 ACAAGTCAAACAAATTTACAAAACACCACCAAT 329

||||| |||| | | ||| ||||||||||||||

Sbjct 2232 ACAAGGCAAA-AGAGTTAGAAAACACCACCAAT 2263

Another question: does this homology between COVID_19 and "Fam a" continue beyond? Indeed, an apparent continuity of this protein located downstream would extend this homology over a length of more than 60 bases:

Plasmodium yoelii genome assembly PYYM01, chromosome : 14

Sequence ID: LK934642.1Length: 2614191Number of Matches: 1

Score Expec Identities Gaps Strand t

41.9 bits(45) 0.16 42/54(78%) 2/54(3%) Plus/Minus

uery 309 AATTTA--CAAAACACCACCAATTAAAGATTTTGGTGGTTTTAATTTTTCACAA 360

|||||| ||||| | |||||||| | |||||| | | ||||||||||| ||

Sbjct 1561202 AATTTAGTCAAAATAAAACCAATTATATATTTTGATCATATTAATTTTTCAAAA 1561149

In [27], we had already demonstrated the presence of several EIEs of plasmodium yoelii in the "Lyons weiler" region of COVID_19. Indeed, thanks to a method allowing to detect heterogeneous sequences, therefore can be exogenous, we had suspected the possible presence of such sequences in the region "Lyons weiler” (& 7 and Figs 2 and 3 in [27]). By re-visiting this region, we show the existence of at least 4 EIEs in this region of COVID_19 Spike "Lyons weiler" région addresses 219, 464, 689, e 1132 (see supplementary materials ref 7f). In June 2020, a Korean team has just confirmed our results by publishing a PREPRINT demonstrating the presence of homologous sequences to Plasmodium in this same region [28].

Finally, here is the alignment of the nucleotides of these 3 respective sequences: COVID_19, Bat RaTG13 and Yoelii "Fam a":

COVID19 CACAAGTCAAACAAATTTACAAAACACCACCAATTAAAGATTTTGGTGGTTTTAATTTTTCAC RATG13 CTCAAGTTAAACAAATTTATAAGACACCACCAATTAAAGATTTTGGTGGTTTCAATTTTTCAC

YOELII CACAAATCAAAAATTTAGTC AAAATAAAACCAATTATATATTTTGATCATATTAATTTTTCAA

Note: The underlined part in yoeli comes from the second yoelii fragment of this second Blastn.

COVID_19 on 63 bases:

CACAAGTCAAACAAATTTACAAAACACCACCAATTAAAGATTTTGGTGGTTTTAATTTTTCAC

HIS LYS SER ASN LYS PHE THR LYS HIS HIS GLN LEU LYS ILE LEU VAL VAL LEU ILE PHE HIS

RaTG13 on 63 bases: CTCAAGTTAAACAAATTTATAAGACACCACCAATTAAAGATTTTGGTGGTTTCAATTTTTCAC

LEU LYS LEU ASN LYS PHE ILE ARG HIS HIS GLN LEU LYS ILE LEU VAL VAL SER ILE PHE HIS

Yoelii « Fam a » on 63 bases :

CACAAATCAAAAATTTAGTCAAAATAAAACCAATTATATATTTTGATCATATTAATTTTTCAA

HIS LYS SER LYS ILE ARR SER LYS ARR ASN GLN LEU TYR ILE LEU ILE ILE LEU ILE PHE GLN

Therefore, the relative homologies in nucleotides, then in amino acids over this length extended to 63 bases, that is to say 21 amino acids lead to:

COVID_19 / Bat RaTG13 = 58/63b et 16/21AA

COVID_19 / Yoelii « Fam a » = 46/63b et 11/21AA Bat RaTG13 / Yoelli « Fam a » = 41/63b et 7/21AA

It is therefore clear that this second region of Yoelii does not coincide with the extension downstream of the sequence "Fam a", although concatenated with the fragment Yoelii "Fam a" in COVID_19, this region would come from another region (functional ?) from Plasmodium Yoelii ...

Figure 11: Comparing bases codons positions in COVID_19 and Bat RaTG13 1770 bases SPIKE region.

Evidence that the majority of the 90 nucleotide mutations between COVID_19 and Bat RaTG13 SPIKE region 1770 bases are located on the third bases of the codons.

It will be interesting to note this major fact: in [26] (Fig 1), Petrovski et al demonstrate a whole region where the amino acids are massively changed between SARS and COVID_19. Very precisely, this region is the region of 1770 bases of u SPIKE of COVID_19 where the amino acids are almost ALL IDENTICAL between COVID_19 and Bat RaTG13, whereas, at the same time, almost all the codons are c "changed" into synonymous codons.

The major conclusion of this demonstration of an EIE of the plasmodium Yoelii in COVID_19 is as follows: This very high amino acid homology score of 10/14 between covid / yoelii "Fam a" results from a shift in the reading frame of the spike codons. This explains the poorer score of the RaTG13 bat with respect to the yoelii which, however, is homologous in amino acids in this region which is very poor in amino acid mutations! So these are the basic mini mutations between COVID_19 and bat RaTG13 which made the difference here!

With this proof of yoelii, we obtain at the same time the explanation of this anomaly of the ratio codons synonyms / non-synonyms of the region 1770b highlighted previously. Indeed, as shown in Fig 11 above, the minor mutations do not change the amino acid values COVID_19 / bat RaTG13 (almost always the 3rd base of synonymous codons).

We believe that this strategy of shifting the codon reading frame was probably used throughout this region of 1770 bases, for example in this location (relative to 1770 bases region):

1464 TAATGCTTCAGTTGTAAA-CATTCAAAAA 1491 with 93% nucleotides homology, and a good amino acids homology considering the shift of codons reading frame. Effectively, this other EIE from plasmodium Yoelii also corresponds to a shifted position from the reading frame for Spike codons (see supplementary materials).

But with the change of the codon reading frame, a “synonymous” mutation on the Spike frame will become “not synonymous” on a second codon reading frame, which has just been demonstrated here, this is very precisely what who arrives here with this blatant proof of the fact that an EIE of the gene "Fam a" of the plasmodium Yoelii would have been inserted here using this "strategy for intelligent": while the 2 genes SPIKE of COVID-19 and Bat RaTG13 are almost identical according to their normal reading frame, a second reading frame radically differentiates the expression of the EIE "Fam a" between the 2 respective Spikes of COVID_19 and Bat RaTG13.

A possible HIV1 EIE contains a crucial Spike mutation.

Besides this EIE of plasmodium yoelii, it seems important to note this other smaller and hypothetical EIE in the region 2040b (S1) of the Spike.

We analyze the region 1801 to 1899 of Spike, its 33 amino acids contain an important mutation of Spike.

GGAACAAATACTTCTAACCAGGTTGCTGTTCTTTATCAGGATGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGA TCAACTTACTCCTACTTGG

End of April 2020, Bette Korber, from the Los Alamos National Laboratory, in New Mexico, claimed that a strain carrying a mutation called S-D614G seemed to take precedence over the others when it competed in a given geographic territory.

In vitro studies at the Scripps Research Department of Immunology and Microbiology of Florida have just confirmed this theory today. When they had this mutation, viruses more easily infected human cells in vitro [32].

This mutation identified in early March in Europe, Mexico, Brazil and China, Wuhan, modifies the structure of the Spike protein. This mutation, S-D614G: a glycine GLY replaced an aspartic acid ASP on codon 614 of protein Spike.

HIV-1 M:08GQ267 partial pol gene for gag-pol fusion polyprotein precursor, isolate 08GQ267

Sequence ID: FN557340.1Length: 1751Number of Matches: 1

If we make the mutation GAT (ASP) ==> GGT (GLY)

This EIE homology with HIV1 is lost.

COVID_19 becomes active if protein S is separated by an enzyme in S1 and S2 which then become functional, without however completely detaching from each other. It's here that the mutation acts: it seems to make the bond more "stable"

linking S1 and S2 after action of this enzyme.

The mutation "stabilizes" the virus in its most form effective.

This would explain the predominance of this mutated strain. The mutation is present in 70% of the samples posted on Genbank in May 2020, and it now epresents 60% of the strains present in Genbank. This strain has circulated a lot in France, Italy and now in the USA, but almost not in the State of WA studied in our article. If we do not find deletions of this strain in WA, Genbank contains strains where this area is deleted in other places: Australia, India, USA MAsachussets, CAlifornia, UTah, and especially FLorida.

As we have shown for other areas of the genome (WA state Seattle), it seems that, here too, the genome is trying to delete this region of the Spike.

13. The analysis of deletions in the SPIKE critical region of 1770 bases in the USA WA state (Seattle).

As we did above for the region 225 bases of COVID_19, we will ask ourselves here the same question: "The region of 1770 bases of Spike, and more particularly the EIE of Plasmodium Yoelii undergo strong deletions in genomes from USA patients from Washington State WA Seattle "?

Table 12: 23 USA” WA state” individual patient genomes with deletions in the 1770 bases COVID_19 SPIKE region.

	23 USA WA individual patient partially deleted whole genome	Deletions	Plasmodi um Yoelii deletions	Genomics/ Proteomics % COUPLING
Reference Genome WUHAN998	Wuhan seafood market pneumonia virus genome assembly, chromosome: whole_genome ref LR757998.1.	0 del	No	86.2
Reference Genome WA seattle	Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV- 2/human/USAWA-UW391/2020, complete genome GenBank: MT293156.1	0 del	No	88.4
WA77	USA/WA-UW-5205/2020, complete genome Sequence ID: MT412257.1	6 del	No	87.3
WA78	USA/WA-UW-5182/2020, complete genome Sequence ID: MT412228.1	6 del	No	84.6
WA79	USA/WA-UW146/2020, complete genome Sequence ID: MT252737.1	8 del	No	80.7
WA80	USA/WA-UW273/2020, partial genome Sequence ID: MT259265.1	8 del	No	68.6
WA81	USA/WA-UW199/2020, complete genome Sequence ID: MT246456.1	13del	No	65.4
WA82	USA/WA-UW280/2020, partial genome Sequence ID: MT259272.1	18del	No	61.2
WA83	USA/WA-UW302/2020, partial genome Sequence ID: MT263385.1	21del	No	51.8
WA84	USA/WA-UW373/2020, complete genome Sequence ID: MT263453.1	25del	296-300	75.
WA85	USA/WA-UW386/2020, partial genome Sequence ID: MT263466.1	33del	Close upstream	74.3
			Yoelii
WA86	USA/WA-UW278/2020, partial genome Sequence ID: MT259270.1	38del	No	66.2
WA87	USA/WA-UW306/2020, partial genome Sequence ID: MT263389.1	39del	No	67.2
WA88	USA/WA-UW206/2020, partial genome Sequence ID: MT246463.1	44del	301-313 and 322- 336	82.
WA89	USA/WA-UW289/2020, partial genome Sequence ID: MT259279.1	45del	301-313 and close downstre am Yoelii	74.4
WA90	USA/WA-UW-6315/2020, complete genome Sequence ID: MT412323.1	46del	301-313 and 332-336	74/4
WA91	USA/WA-UW208/2020, partial genome Sequence ID: MT246465.1	66del	301-313 and 320-326 and 330- 336	69.6
WA92	USA/WA-UW312/2020, partial genome Sequence ID: MT263393.1	99del	No	75.7
WA93	USA/WA-UW-4538/2020, complete genome Sequence ID: MT375428.1	129del	Totally deleted	81.6
WA94	USA/WA-UW347/2020, partial genome Sequence ID: MT263427.1	198del	Totally deleted	57.1
WA95	USA/WA-UW157/2020, complete genome Sequence ID: MT252730.1	167del	Totally deleted	85.8
WA96	USA/WA-UW-4707/2020, complete genome Sequence ID: MT375462.1	180del	No	67.8
wa97	USA/WA-UW379/2020, partial genome Sequence ID: MT263409.1	361del	Totally deleted	74.
WA98	USA/WA-UW246/2020, partial genome Sequence ID: MT259238.1	413del	322-336	78.2
WA99	USA/WA-UW267/2020, partial genome Sequence ID: MT259259.1	390del	Totally deleted	84.2
	Summary	23 deletions / 23 cases	12 undelete d, 6 partially deleted, 5 totally deleted	23 on 23 have a lower % than WA state COVID_19 reference

Note: we have selected here the last 23 WA (Seattle) genomes resulting from a BLASTn search carried out on the 1770 bases region on the GENBANK COVID_19 sequences public database on May 27, 2020.

Complete details in supplementary materials (ref 8).

It appears here very clearly that these genomes of the USA WA state (Seattle) region seem to try to "rid" of these EIE regions: indeed, of these 23 genomes analyzed, almost half have eliminated, partially (6) or totally (5), this region suspected of containing a EIE of plamodium Yoelii.

This second proof, with that relating to the 225 bases area, demonstrates that the COVID_19 genome tends to eliminate exogenous regions in priority. It can therefore be suggested that, as a result, the infectivity and pathogenicity of the virus gradually decrease over time ...

The biomathematical method of the “DNA Master Code” makes it possible to assess the level of integrity and coherence of a genome on a global genome scale. Also, in the case of the 23 USA WA patients from table 12 who underwent deletions in the region 1770 bases of the Spike, we thought that this mathematical tool could make it possible to assess the possible impact of these deletions on the global scale of the respective genomes. .

The column on the right in Table 12 illustrates these results. We selected 2 reference genomes, the Wuhan reference genome and the non-mutated genome usually encountered in the WA state. The results demonstrate that in ALL cases the global coupling is affected by deletions. Note, however, that if this results in part from deletions in the 1770 base region of Spike, other deletions in other regions of the genome can also have a joint impact.

Figure 12: ALL 44 WA state DELETIONS (1770b and 225 bases area) DESTROY INTEGRITY at WHOLE GENOME scale.

All the 23 individual patients’ cases where SPIKE 1770 bases region is partially deleted have a Master Code Genomics/Proteomics % Coupling at whole genome scale partially destroyed (top chart Fig 12 related Table12 data). All the 21 individual patients cases where 225 bases area is partially deleted have a Master Code Genomics/Proteomics % Coupling at whole genome scale partially destroyed (bottom chart Fig 12 related Table 11 data). Note that the further we go to the right of both charts, the more the volume of deletions increases.

The LINK demonstrated here between DELETIONS and degradation of the coupling of the DNA Master Code is a FACT. It will remain to demonstrate its possible link with the contagiousness of the virus and perhaps its reduction in pathogenicity.

14. Is the COVID_19 Spike insertion site of the quadri-amino acid cleaving sequence PRRA the result of chance?

F. Castro-Chavez observed that the PRRA sequence is hyper rich in CG (10/12 bases) [30], we then have the intuition to analyze this region of Spike where PRRA is inserted by the « DNA Master Code » biomathematical method (this method is particularly based on a (-1,0) binary re- coding of sequences differenciating CG/TA) [31]. Indeed, one of its properties is the highlighting of active sites, breakdown points, cleavage sites. The challenge of such an analysis is: "is the PRRA insertion site randomly or did it already have FAVORABLE properties for such insertion"? Here is the result of this proof obtained by "induction":

1) The precise address of the insertion of the PRR A insert was even before this insertion a PRIVILEGED cleavage site of the protein Spike both for bat RaTG13 and for COVID_19. It would therefore not have been chosen at random.

2) The fact of inserting therein the fragment PRRA, very rich in CG (10/12), must accentuate and AMPLIFY this property of Cleavage.

3) The analysis by progressive integrations of increasing regions of the Spike part located downstream of the PRRA insert, PRESERVES the calculated address of the cleavage point ("dna master code"), it can be suggested that the numerous modifications of synonymous codons differentiating RaTG13 of covid_19 could have contributed to this invariability of the active site.

We will successively analyze 3 cases for various regions framing the PRRA insert address, ie base 2040 of the respective Spikes of bat RaTG13 and COVID_19:

· Bat RaTG13.

· COVID_19 without PRRA.

· COVID_19 real, with PRRA.

The "dna master code" "classifies" each of the codons with regard to the entire studied sequence. We successively study regions of 600, 900, 1200, 1500, and 1800 bases progressively integrating growing regions of the region of 1770 bases located downstream of the PRRA insert. In all analyzes cases, we are interested in the Top 10 of the first 10 codons likely to constitute an active cleavage site.

Table 13: Why, before insertion of the PRRA, this site was already an optimal cleavage site?

Comparing PREDICTED CLEAVAGE SITE in Bat RaTG13 and COVID_19 without PRRA
Top10 codons	1 2 3 4 5 6 7 8 9 10
600b Bat RaTG13	86 74 85 87 73 75 77 88 99 70
600b COVID_19 without PRRA	85 87 99 103 74 84 86 88 98 100
900b Bat RaTG13	86 74 85 87 73 75 77 70 88 72
900b COVID_19 without PRRA	85 87 74 84 99 86 73 88 75 103
1200b Bat RaTG13	86 74 85 87 73 75 77 70 72 88
1200b COVID_19 without PRRA	85 74 87 84 73 86 99 75 88 77
1500b Bat RaTG13	86 74 85 87 73 75 77 70 72 88
1500b COVID_19 without PRRA	86 74 85 76 87 78 73 84 75 77
1800b Bat RaTG13	86 74 85 87 73 75 77 70 88 72
1800b COVID_19 without PRRA	86 74 85 76 87 78 73 84 75 77
Insert site relative codon 80	80 80 80 80 80 80 80 80 80 80
Comparing PREDICTED CLEAVAGE SITE with PRRA insert in COVID_19 real Spike
Top10 codons	1 2 3 4 5 6 7 8 9 10
600b COVID_19 with PRRA	99 89 91 88 103 92 107 87 102 104
900b COVID_19 with PRRA	90 89 91 88 92 103 107 87 102 93
1200b COVID_19 with PRRA	90 89 91 88 92 103 87 107 102 93
1500b COVID_19 with PRRA	90 89 91 88 92 103 87 107 93 102
1800b COVID_19 with PRRA	90 89 91 88 92 103 87 107 102 93
Insert PRRA Start (codon 81)	81 81 81 81 81 81 81 81 81 81
Insert PRRA End (codon 84)	84 84 84 84 84 84 84 84 84 84

The 1st part of Table 13 demonstrated the optimality of the "shear" form of the 2040 bases site (80 codons in relative address compared to base 1800 reference). This remains true for the 2 Spikes bat RaTG13 and COVID_19 sequences without PRRA, and for various lengths located downstream from the PRRA point. The second part studies the incidence of PRRA insertion in Spike COVID_19 (Codons 81-84).

Figure 13: The PRRA insertion site was not chosen by chance.

The upper graph shows the optimality of the relative address codon 80 (base 2040 of Spike) as a theoretical optimal cleavage site, and this as well for BatRaTG13 as for COVID_19 without PRRA. It would seem that the codons synonymous within the 1770b region located downstream of this site contribute to the conservation of this theoretical site all along the Spike. The graphic below shows the very slight offset from this theoretical site when we insert the PRRA (codons 81-84) to constitute the real genome of COVID_19. (Both curves Blue 1200b and Red 1800b COVID_19 with PRRA are superimposed).

Note that PRRA like inserts could be managed using CRISP RNA type technologies [23].

4. CONCLUSIONS

1) 18 RNA fragments of homology equal or more than 80% with human or simian retroviruses have been found in the COVID_19 genome.

2) These fragments are 18 to 30 nucleotides long and therefore have the potential to modify the gene expression of Covid19. We have named them external Informative Elements or EIE.

3) These EIE are not dispersed randomly, but are concentrated in a small part of the COVID_19 genome.

4) Among this part, a 225-nucleotide long region is unique to COVID_19 and Bat RaTG13 and can discriminate and formally distinguish these 2 genomes.

5) In the decreasing slope of the epidemic, this 225 bases area and the 1770 bases Spike region, exhibits an abnormally high rate of mutations/deletions (cases of 44 patients from WA Seattle state, original epicenter in USA).

6) In the comparative analysis of both SPIKES genes of COVID_19 and Bat RaTG13, we note two abnormal facts:

· The insertion of 4 contiguous PRRA amino acids in the middle of SPIKE (then we show that this site was already an optimal cleavage site BEFORE this insertion).

· An abnormal ratio of synonymous codons / non synonymous codons in the second half of SPIKE.

Finally we show the insertion in this 1770 bases SPIKE region of a significant EIE from Plasmodium Yoelii and of a possible HIV1 EIE with a crucial Spike mutation.

Through the 14 facts relating to each of the 14 paragraphs of this article, everything converges towards possible laboratory manipulations (End Note below) which contributed to modifications of the genome of COVID_19, but also, very probably much older SARS, with perhaps this double objective of vaccine design and of "gain of function" in terms of penetration of this virus into the cell.

This analysis, made in silico, is dedicated to the real authors of Coronavirus COVID_19. It belongs only to them to describe their own experiments and why it turned into a world disaster: 650 000 lives (on 26 July 2020), more than those taken by the two atomic bombs of Hiroshima and Nagasaki. We, the survivors, should take lessons from this serious alert for the future of humanity. We urge our colleagues scientists and medical doctors to respect ethical rules as expressed by Hipocrates oath: do not harm, never and never !

End Note: Why could COVID-19 come from Laboratory manipulations?

The following 4 proofs concern differences with respect to SARS either common to COVID-19 and bat RaTG13, or facts radically differentiating these 2 sequences of which it is claimed that the first (COVID-19) comes from a natural evolution of the second (bat RaTG13). We have ranked these 4 proofs in ascending order of importance according to our point of view.

1) Four EIE formally distinguishes COVID-19 and bat RaTG13 genomes from all other SARS or bats genomes. However, their level of HIV/SIV homologies appears much more affirmed for COVID-19 than for bat RaTG13, as if these EIE fragments had recently been “re-injected” into the COVID-19 genome. ==> see & 7, (figures 4 and 5).

2) natural deletions (USA WA Seattle state) apply in priority to EIE inserts (HIV Kenya etc ..). ==> see full Part III and Figure 12 in §13.

3) Synonymous codons mutations within the 1770 bases region of the Spike, which simulate a natural evolution of bat RaTG13 towards COVID-19 while maintaining the optimality obtained in amino acid values, probably from “gain of function” Laboratory experiments (optimality common to both RNA sequences COVID-19 and bat RaTG13) ==> see Figure 10 in & 11 and Figure 11 in §12.

4) “PRRA” amino acids was inserted exactly on the Spike location already theoretically optimal on both COVID-19 and RATG13 (of which it constitutes the main difference). ==> see Figure 13 in & 14.

SOURCES OF FUNDING

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

CONFLICT OF INTEREST

The author have declared that no competing interests exist.

ACKNOWLEDGMENT

For the multiple exchanges of information and key publications, we would like to thank Alain Bauer, Professor of criminology at Conservatoire National des Arts et Metiers, in New York and Shanghai, co-author « Vivre au temps du Coronavirus », Cerf 2020, (ISBN: https://www.amazon.fr/Comment-vivre-temps-coronavirus-comprendre-ebook/dp/B08BFBS5QW , and Professor Fernando Castro-Chavez, PhD, Universitad de Guadalajara, MX, former Postdoc, Pharmacology, New York Medical College (NYMC), NY, USA: https://tinyurl.com/Anticovidian2..

SUPPLEMENTRY FILE

REFERENCES

[1] WHO-SARS, https://www.google.com/url? sa=t&source=web&rct=j&url=https://www.who.int/ith/diseases/sars/en/&ved=2ahUKEwi YufHk5tDoAhXU3oUKHSTwBuYQFjAWegQIBRAB&usg=AOvVaw0bFoEUPELafXU98baC4o2k

[2] WHO-MERS, https://www.google.com/url? sa=t&source=web&rct=j&url=https://www.who.int/emergencies/mers-cov/en/&ved=2ahUKEwjigPe059DoAhXEx4UKHU5xDDYQFjAMegQIBBAC&usg=AOvVaw1kaYVgLwAr9c7E yL7kGXQn

[3] Perez, J.C, 2020/02/13, Wuhan nCoV-2019 SARS Coronaviruses Genomics Fractal Metastructures Evolution and Origins, DO -DOI: 10.20944/preprints202002.0025.v2, Researchgate : https://www.researchgate.net/publication/339331507_Wuhan_nCoV- 2019_SARS_Coronaviruses_Genomics_Fractal_Metastructures_Evolution_and_Origins

[4] Lyons Weiler J., 2020, 1-30-2020, On the origins of the 2019 ncov virus wuhan china, https://jameslyonsweiler.com/2020/01/30/on-the-origins-of-the-2019-ncov-virus- wuhan-china/

[5] Perez J.C, (2020). “WUHAN COVID-19 SYNTHETIC ORIGINS AND EVOLUTION.” International Journal

of Research - Granthaalayah, 8(2), 285-324. https://doi.org/10.5281/zenodo.3724003.

[6] Perez J.C, Codex biogenesis - Les 13 codes de l'ADN (French Edition) [Jean-Claude ... 2009); Language: French; ISBN-10: 2874340448; ISBN-13: 978-2874340444 https://www.amazon.fr/Codex-Biogenesis-13-codes-lADN/dp/2874340448.

[7] Perez J.C, Deciphering Hidden DNA Meta-Codes -The Great Unification & Master Code of Biology. J Glycomics Lipidomics 5:131, 2015, doi: 10.4172/2153- 0637.1000131 https://www.longdom.org/abstract/deciphering-hidden-dna-metacodes-the-great-unification-amp-master-code-of-biology-11590.html

[8] Perez, J.C. Six Fractal Codes of Biological Life:perspectives in Exobiology, Cancers Basic Research and Artificial Intelligence Biomimetism Decisions Making. Preprints 2018, 2018090139 (doi: 10.20944/preprints201809.0139.v1). https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.preprints.org/manuscript/201809.0139/v1&ved=2ahUKEwj9wo-A_vfqAhUrDWMBHUCEAN0QFjAAegQIBBAB&usg=AOvVaw2FjttkMu-Pz4axTeyvU459

[9] Land A.M. Et al, Human immunodeficiency virus (HIV) type 1 proviral hypermutation correlates with CD4 count in HIV-infected women from Kenya., J Virol. 2008 Aug;82(16):8172-82. doi: 10.1128/JVI.01115- 08. Epub 2008 Jun 11., DOI: 10.1128/JVI.01115-08 https://www.ncbi.nlm.nih.gov/pubmed/18550667

[10] Venkatesan P, Franck Alla Plummer, The Lancet Infectious diseases, April 2020,

DOI: https://doi.org/10.1016/S1473-3099(20)30188-2 , https://www.thelancet.com/pdfs/journals/laninf/PIIS1473- 3099(20)30188-2.pdf

[11] Perez, J. Epigenetics Theoretical Limits of Synthetic Genomes: The Cases of Artificials Caulobacter (C. eth-2.0), Mycoplasma Mycoides (JCVI-Syn 1.0, JCVI-Syn 3.0 and JCVI_3A), E-coli and YEAST chr

XII. Preprints 2019, 2019070120 (doi:10.20944/preprints201907.0120.v1).https://www.preprints.org/manuscript/201907.0120/v1

[12] Zhou, P et al, 2020, A pneumonia outbreak associated with a new coronavirus of probable bat origin, Nature 579 (7798), 270-273 (2020), DOI: 10.1038/s41586-020-2012-7

[13] FISABIO, 2020, http://fisabio.san.gva.es/web/fisabio/noticia/-/asset_publisher/1vZL/content/secuenciacion- coronavirus.

[14] Andersen, K.G., Rambaut, A., Lipkin, W.I. et al. The proximal origin of SARS-CoV-2. Nat Med (2020). https://doi.org/10.1038/s41591-020-0820-9

[15] Prashant Pradhan et al, Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag,https://www.biorxiv.org/content/10.1101/2020.01.30.927871v1 , This biorxiv preprint was withdrawn by the authors.

[16] Yuanchen Ma et al., 2020-2-27, ACE2 shedding and furin abundance in target organs may influence the efficiency of SARS-CoV-2 , http://www.chinaxiv.org/abs/202002.00082

[17] Xiaolu Tang, Changcheng Wu, Xiang Li, Yuhe Song, Xinmin Yao, Xinkai Wu, Yuange Duan, Hong Zhang, Yirong Wang, Zhaohui Qian, Jie Cui, Jian Lu, On the origin and continuing evolution of SARS-CoV-

2, National Science Review, , nwaa036, https://doi.org/10.1093/nsr/nwaa036

[18] Lu, R et al., 2020. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding The Lancet. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2820%2930251-8/fulltext

[19] Wei Ji, et al, Homologous recombination within the spike glycoprotein of the newly identified coronavirus 2019-nCoV may boost cross-species transmission from snake to

human, https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/jmv.2568220.

[20] Peng Zhou et al, Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin, BioRxiv, January 2020, https://doi.org/10.1101/2020.01.22.914952

[21] Leoz M, Feyertag F, Kfutwah A, Mauclère P, Lachenal G, et al. (2015) The Two-Phase Emergence of Non Pandemic HIV-1 Group O in Cameroon. PLOS Pathogens 11(8):

e1005029. https://doi.org/10.1371/journal.ppat.1005029

[22] Hangping Yao, et al., Patient-derived mutations impact pathogenicity of SARS-CoV-2

medRxiv 2020.04.14.20060160; doi: . https://doi.org/10.1101/2020.04.14.20060160

[23] D. B. T. Cox et al., RNA editing with CRISPR-Cas13 , Science 24 Nov 2017: Vol. 358, Issue 6366, pp. 1019-1027, DOI: 10.1126/science.aaq0180

[24] LaRinda A. Holland et al, An 81 nucleotide deletion in SARS-CoV-2 ORF7a identified from sentinel surveillance in Arizona (Jan-Mar 2020), Journal of Virology (2020). DOI: 10.1128/JVI.00711-20

[25] ue Wu Zhang et al, Structural similarity between HIV1 gp41 and SARS-CoV S2 proteins suggests an analogous membrane fusion mechanism May 2004Journal of Molecular Structure THEOCHEM 677(1):73- 76, DOI: 10.1016/j.theochem.2004.02.018

[26] Pilani et al, In silico comparison of spike protein-ACE2 binding affinities across species;significance for the possible origin of the SARS-CoV-2 virus, https://arxiv.org/abs/2005.06199

[27] Perez, j., & Montagnier, L. (2020, April 25). COVID-19, SARS and Bats Coronaviruses Genomes unexpected Exogeneous RNA Sequences. https://doi.org/10.31219/osf.io/d9e5g

[28] Seong-Tshool Hong et al., The emergence of SARS-CoV-2 by an unusual genome

reconstitution, DOI 10.21203/rs.3.rs-33201/v1 https://www.researchsquare.com/article/rs-33201/v1

[29] Zhang, M., Kaneko, I., Tsao, T. et al. A highly infectious Plasmodium yoelii parasite, bearing Plasmodium falciparum circumsporozoite protein. Malar J 15, 201 (2016).

[30] F. Castro-Chavez, (June 2020), Anticovidian v.2: COVID-19: Hypothesis of the Lab Origin versus a Zoonotic

Event Which Can Also be of a Lab Origin, GJSFR (Submitted; to appear in: [https://pubmed.ncbi.nlm.nih.gov/? term=%22Castro-Chavez%20F%22])

[31] Perez JC (2018) The Optimal Multi-Isotopic Atomic Code of Life: Perspectives in Astrobiology. Astrobiol Outreach 6: 165. doi: 10.4172/2332-2519.1000165 , https://www.longdom.org/open-access/the-optimal- multiisotopic-atomic-code-of-life-perspectives-in-astrobiology-2332-2519-1000166.pdf

[32] Zhang et Al. The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity,

doi: https://doi.org/10.1101/2020.06.12.148726

[33] A Bauer & R. Sachez, Vivre au temps du Coronavirus, Cerf 2020, (ISBN : 978-2-204-14203-8),

[34] Sorensen, B. et Al, Biovacc-19: A Candidate Vaccine for Covid-19 (SARS-CoV-2) Developed from Analysis of its General Method of Action for Infectivity, DOI:https://doi.org/10.1017/qrd.2020.8 , Published online by Cambridge University Press: 02 June 2020.

COVID-19, SARS AND BATS CORONAVIRUSES GENOMES PECULIAR HOMOLOGOUS RNA SEQUENCES

Jean Claude Perez *1, Luc Montagnier 2

*1 PhD Maths § Computer Science Bordeaux University, RETIRED interdisciplinary researcher (IBM

Emeritus, IBM European Research Center on Artificial Intelligence Montpellier), Martignas sur jalles, Bordeaux Metropole, France

2 Fondation Luc Montagnier Quai Gustave-Ador 62 1207 Genève, Switzerland

DOI: https://doi.org/10.29121/granthaalayah.v8.i7.2020.678

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. ACCESS TO DATA BANKS

3. RESULTS AND DISCUSSION

4. CONCLUSIONS

SOURCES OF FUNDING

CONFLICT OF INTEREST

ACKNOWLEDGMENT

SUPPLEMENTRY FILE

REFERENCES

Jean Claude Perez ^*1, Luc Montagnier ²

^*1PhD Maths § Computer Science Bordeaux University, RETIRED interdisciplinary researcher (IBM

² Fondation Luc Montagnier Quai Gustave-Ador 62 1207 Genève, Switzerland