CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8BSK_1 2ICP_1 2KVE_1 Letter Amino acid
7 7 3 R Arginine
18 4 1 N Asparagine
9 1 0 H Histidine
25 12 7 L Leucine
2 2 1 W Tryptophan
12 0 3 Y Tyrosine
17 3 7 D Aspartic acid
11 0 2 C Cysteine
13 5 2 M Methionine
20 1 0 F Phenylalanine
19 11 5 A Alanine
28 3 3 G Glycine
25 9 3 S Serine
12 5 3 T Threonine
27 6 2 V Valine
9 4 1 Q Glutamine
15 7 4 E Glutamic acid
12 5 4 I Isoleucine
19 4 12 K Lycine
15 5 2 P Proline

8BSK_1|Chain A|Glutaminase kidney isoform, mitochondrial 65 kDa chain|Homo sapiens (9606)
>2ICP_1|Chain A|antitoxin higa|Escherichia coli (199310)
>2KVE_1|Chain A|Mesencephalic astrocyte-derived neurotrophic factor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8BSK , Knot 145 315 0.88 40 207 308
SMIPDFMSFTSHIDELYESAKKQSGGKVADYIPQLAKFSPDLWGVSVCTVDGQRHSTGDTKVPFCLQSCVKPLKYAIAVNDLGTEYVHRYVGKEPSGLRFNKLFLNEDDKPHNPMVNAGAIVVTSLIKQGVNNAEKFDYVMQFLNKMAGNEYVGFSNATFQSERESGDRNFAIGYYLKEKKCFPEGTDMVGILDFYFQLCSIEVTCESASVMAATLANGGFCPITGERVLSPEAVRNTLSLMHSCGMYDFSGQFAFHVGLPAKSGVAGGILLVVPNVMGMMCWSPPLDKMGNSVKGIHFCHDLVSLCNFHNYDNL
2ICP , Knot 50 94 0.80 36 82 91
MKMANHPRPGDIIQESLDELNVSLREFARAMEIAPSTASRLLTGKAALTPEMAIKLSVVIGSSPQMWLNLQNAWSLAEAEKTVDVSRLRRLVTQ
2KVE , Knot 36 65 0.77 36 56 62
MGKYDKQIDLSTVDLKKLRVKELKKILDDWGETCKGCAEKSDYIRKINELMPKYAPKAASARTDL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8BSK_1)}(2) \setminus P_{f(2ICP_1)}(2)|=162\), \(|P_{f(2ICP_1)}(2) \setminus P_{f(8BSK_1)}(2)|=37\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011101101000100100010000110110011011010101111010010100000100011101000101100111100110001000110010110100111000001001110111111001100110010010011011001110001110010100000010001111001000001101001111101010100101000010111101101110110100110101100010110001100101011101111100111111111110111110101110011001011010001101001000001
Pair \(Z_2\) Length of longest common subsequence
8BSK_1,2ICP_1 199 3
8BSK_1,2KVE_1 185 3
2ICP_1,2KVE_1 106 3

Newick tree

 
[
	8BSK_1:10.62,
	[
		2KVE_1:53,2ICP_1:53
	]:53.62
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{409 }{\log_{20} 409}-\frac{94}{\log_{20}94})=96.3\)
Status Protein1 Protein2 d d1/2
Query variables 8BSK_1 2ICP_1 123 79.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]