CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5MMO_1 4FAY_1 9CRI_1 Letter Amino acid
5 8 62 Q Glutamine
18 10 18 H Histidine
7 9 77 F Phenylalanine
6 13 57 P Proline
17 17 100 S Serine
22 17 97 V Valine
16 16 60 D Aspartic acid
1 5 40 C Cysteine
15 15 77 I Isoleucine
12 15 96 T Threonine
1 1 12 W Tryptophan
18 12 47 E Glutamic acid
25 36 83 G Glycine
16 21 108 L Leucine
6 8 14 M Methionine
6 3 53 Y Tyrosine
12 22 79 A Alanine
12 10 41 R Arginine
7 8 90 N Asparagine
12 12 62 K Lycine

5MMO_1|Chain A|DNA gyrase subunit B|Escherichia coli O157:H7 (83334)
>4FAY_1|Chains A, B, C|Microcompartments protein|Lactobacillus reuteri (557436)
>9CRI_1|Chains A, B, C|Spike glycoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5MMO , Knot 105 234 0.81 40 150 220
MRGSHHHHHHGIDHMSNSYDSSSIKVLKGLDAVRKRPGMYIGDTDDGTGLHHMVFEVVDNAIDEALAGHCKEIIVTIHADNSVSVQDDGRGIPTGIHPEEGVSAAEVIMTVLHAGGKFDDNSYKVSGGLHGVGVSVVNALSQKLELVIQREGKIHRQIYEHGVPQAPLAVTGETEKTGTMVRFWPSLETFTNVTEFEYEILAKRLRELSFLNSGVSIRLRDKRDGKEDHFHYEG
4FAY , Knot 111 258 0.79 40 165 251
MGSSHHHPPKSSGLVPRGSHMNDFLNSTSTVPEFVGASEIGDTIGMVIPRVDQQLLDKLHVTKQYKTLGILSDRTGAGPQIMAMDEGIKATNMECIDVEWPRDTKGGGGHGCLIIIGGDDPADARQAIRVALDNLHRTFGDVYNAKAGHLELQFTARAAGAAHLGLGAVEGKAFGLICGCPSGIGVVMGDKALKTAGVEPLNFTSPSHGTSFSNEGCLTITGDSGAVRQAVMAGREVGLKLLSQFGEEPVNDFPSYIK
9CRI , Knot 454 1273 0.85 40 339 1106
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASIEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVSNHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATKFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQNVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5MMO_1)}(2) \setminus P_{f(4FAY_1)}(2)|=57\), \(|P_{f(4FAY_1)}(2) \setminus P_{f(5MMO_1)}(2)|=72\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000000011001000000001011011011000111011000010110011101100110011110000111010100010100010111011010011011011101101110100000010111011110110110001011100010100010001110111110100000101101110100100100100011100100101100110101000001000010001
Pair \(Z_2\) Length of longest common subsequence
5MMO_1,4FAY_1 129 4
5MMO_1,9CRI_1 209 4
4FAY_1,9CRI_1 198 4

Newick tree

 
[
	9CRI_1:11.47,
	[
		5MMO_1:64.5,4FAY_1:64.5
	]:46.97
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{492 }{\log_{20} 492}-\frac{234}{\log_{20}234})=74.3\)
Status Protein1 Protein2 d d1/2
Query variables 5MMO_1 4FAY_1 90 85
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]