CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9IJM_1 5HRA_1 7DZZ_1 Letter Amino acid
19 5 15 P Proline
2 6 6 Y Tyrosine
20 15 18 R Arginine
25 11 9 Q Glutamine
2 2 6 W Tryptophan
17 19 11 I Isoleucine
26 25 23 L Leucine
16 6 5 M Methionine
10 4 10 N Asparagine
20 9 15 D Aspartic acid
3 6 1 C Cysteine
31 22 22 E Glutamic acid
9 9 10 H Histidine
13 9 14 F Phenylalanine
19 11 22 V Valine
17 26 21 A Alanine
21 21 22 G Glycine
15 5 8 K Lycine
23 9 14 S Serine
13 15 16 T Threonine

9IJM_1|Chains A, B|PomB|Vibrio alginolyticus (663)
>5HRA_1|Chains A, B|aspartate/glutamate racemase|Escherichia coli O157:H7 str. SS52 (1330457)
>7DZZ_1|Chains A, B|Phytanoyl-CoA dioxygenase|uncultured bacterium esnapd13 (1366593)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9IJM , Knot 138 321 0.82 40 193 302
MDDEDNKCDCPPPGLPLWMGTFADLMSLLMCFFVLLLSFSEMDVLKFKQIAGSMKFAFGVQNQLEVKDIPKGTSIIAQEFRPGRPEPTPIDVIMQQTMDITQQTLEFHEGESDRAGGTKRDEGKLTGGQSPATSTQNNESAEADMQQQQSKEMSQEMETLMESIKKALEREIEQGAIEVENLGQQIVIRMREKGAFPEGSAFLQPKFRPLVRQIAELVKDVPGIVRVSGHTDNRPLDSELYRSNWDLSSQRAVSVAQEMEKVRGFSHQRLRVRGMADTEPLLPNDSDDNRALNRRVEISIMQGEPLYSEEVPVIQHHHHHH
5HRA , Knot 106 235 0.82 40 145 224
MKTIGLLGGMSWESTIPYYRLINEGIKQRLGGLHSAQVLLHSVDFHEIEECQRRGEWDKTGDILAEAALGLQRAGAEGIVLCTNTMHKVADAIESRCTLPFLHIADATGRAITGAGMTRVALLGTRYTMEQDFYRGRLTEQFSINCLIPEADERAKINQIIFEELCLGQFTEASRAYYAQVIARLAEQGAQGVIFGCTEIGLLVPEERSVLPVFDTAAIHAEDAVAFMLSLEHHH
7DZZ , Knot 125 268 0.87 40 181 261
QIMEPHDTLSPAQVDEYRKNGFLVQEHVFDEEEIELLRAEAAQEFASGGERVTVEQNTGIVRGVHGCHLYSEVFGRLVRSPRLLPIARQLLRDDVYVHQFKINAKRAFKGEVWEWHQDYTFWHHEDGMPAPRALSAAIFLDEVTEFNGPLTFVPGGHGSGMIDADVKGEGWANTLTASLKYSLDVETMRGLIERNGMVAPKGPRGSVLWFDANIPHSSVPNISPFDRGLVLITYNSVENKTDVTRGTRPEWLAARDFTPLTALQATSF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9IJM_1)}(2) \setminus P_{f(5HRA_1)}(2)|=108\), \(|P_{f(5HRA_1)}(2) \setminus P_{f(9IJM_1)}(2)|=60\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000011111111110110110111011111101001011010011101011111000101001101001110010110101011011100010100001010010000111000001010110011000000001010100000001000100110010011000100111010011001110100011110101110101011100110110011111010100000110001000010100001101100100101100001010111000111100000001100010101101011000011110000000
Pair \(Z_2\) Length of longest common subsequence
9IJM_1,5HRA_1 168 4
9IJM_1,7DZZ_1 150 4
5HRA_1,7DZZ_1 150 4

Newick tree

 
[
	5HRA_1:81.11,
	[
		9IJM_1:75,7DZZ_1:75
	]:6.11
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{556 }{\log_{20} 556}-\frac{235}{\log_{20}235})=91.5\)
Status Protein1 Protein2 d d1/2
Query variables 9IJM_1 5HRA_1 115 99.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]