CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5UAW_1 2CMG_1 6DON_1 Letter Amino acid
28 9 10 G Glycine
17 21 11 I Isoleucine
3 8 5 Y Tyrosine
40 15 6 A Alanine
4 2 0 C Cysteine
35 33 10 L Leucine
15 26 14 K Lycine
0 1 6 W Tryptophan
5 12 6 N Asparagine
11 8 3 Q Glutamine
16 19 16 E Glutamic acid
16 13 2 H Histidine
10 19 2 F Phenylalanine
25 12 7 S Serine
16 17 7 D Aspartic acid
10 8 1 M Methionine
13 9 5 P Proline
18 9 10 T Threonine
27 13 9 V Valine
13 8 6 R Arginine

5UAW_1|Chains A, B, C, D, E|Pyrroline-5-carboxylate reductase 1, mitochondrial|Homo sapiens (9606)
>2CMG_1|Chains A, B|SPERMIDINE SYNTHASE|HELICOBACTER PYLORI (85962)
>6DON_1|Chain A|Ribonuclease H|Bacillus halodurans (86665)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5UAW , Knot 130 322 0.77 38 176 297
MHHHHHHSSGVDLGTENLYFQSMSVGFIGAGQLAFALAKGFTAAGVLAAHKIMASSPDMDLATVSALRKMGVKLTPHNKETVQHSDVLFLAVKPHIIPFILDEIGADIEDRHIVVSCAAGVTISSIEKKLSAFRPAPRVIRCMTNTPVVVREGATVYATGTHAQVEDGRLMEQLLSSVGFCTEVEEDLIDAVTGLSGSGPAYAFTALDALADGGVKMGLPRRLAVRLGAQALLGAAKMLLHSEQHPGQLKDNVSSPGGATIHALHVLESGGFRSLLINAVEASCIRTRELQSMADQEQVSPAAIKKTILDKVKLDSPAGTAL
2CMG , Knot 115 262 0.81 40 159 244
MWITQEITPYLRKEYTIEAKLLDVRSEHNILEIFKSKDFGEIAMLNRQLLFKNFLHIESELLAHMGGCTKKELKEVLIVDGFDLELAHQLFKYDTHIDFVQADEKILDSFISFFPHFHEVKNNKNFTHAKQLLDLDIKKYDLIFCLQEPDIHRIDGLKRMLKEDGVFISVAKHPLLEHVSMQNALKNMGGVFSVAMPFVAPLRILSNKGYIYASFKTHPLKDLMTPKIEALTSVRYYNEDIHRAAFALPKNLQEVFKDNIKS
6DON , Knot 65 136 0.78 38 108 134
EEIIWESLSVDVGSQGNPGIVEYKGVDTKTGEVLFEREPIPIGTNNMGEFLAIVHGLRYLKERNSRKPIYSDSQTAIKWVKDKKAKSTLVRNEETALIWKLVDEAEEWLNTHTYETPILKWQTDKWGEIKADYGRK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5UAW_1)}(2) \setminus P_{f(2CMG_1)}(2)|=84\), \(|P_{f(2CMG_1)}(2) \setminus P_{f(5UAW_1)}(2)|=67\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000001101100010100101111111011111101101111111001110010101101011001110101000001000011111101011111100111010000111001111010010001011011101100100011110011010101001010010110011001110001000110110110101110110110111011101111001110111011111101110000011010001001111010110110011100111011010010000100110000101111000110010100111011
Pair \(Z_2\) Length of longest common subsequence
5UAW_1,2CMG_1 151 4
5UAW_1,6DON_1 168 3
2CMG_1,6DON_1 167 3

Newick tree

 
[
	6DON_1:86.32,
	[
		5UAW_1:75.5,2CMG_1:75.5
	]:10.82
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{584 }{\log_{20} 584}-\frac{262}{\log_{20}262})=90.9\)
Status Protein1 Protein2 d d1/2
Query variables 5UAW_1 2CMG_1 109 100.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]