CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1VRE_1 1YWH_1 3TLK_1 Letter Amino acid
27 11 46 A Alanine
3 15 10 N Asparagine
6 13 11 H Histidine
2 12 15 P Proline
2 4 4 W Tryptophan
3 7 9 Y Tyrosine
6 23 14 E Glutamic acid
6 8 18 I Isoleucine
11 10 13 K Lycine
2 25 20 T Threonine
1 28 0 C Cysteine
5 7 4 M Methionine
12 25 20 S Serine
3 20 13 R Arginine
8 13 17 D Aspartic acid
4 15 26 Q Glutamine
20 29 21 G Glycine
10 31 46 L Leucine
5 5 7 F Phenylalanine
11 12 12 V Valine

1VRE_1|Chain A|PROTEIN (GLOBIN, MONOMERIC COMPONENT M-IV)|Glycera dibranchiata (6350)
>1YWH_1|Chains A, C, E, G, I, K, M, O|Urokinase plasminogen activator surface receptor|Homo sapiens (9606)
>3TLK_1|Chains A, B, C|Ferrienterobactin-binding periplasmic protein|Escherichia coli (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1VRE , Knot 68 147 0.77 40 103 136
GLSAAQRQVVASTWKDIAGSDNGAGVGKECFTKFLSAHHDMAAVFGFSGASDPGVADLGAKVLAQIGVAVSHLGDEGKMVAEMKAVGVRHKGYGNKHIKAEYFEPLGASLLSAMEHRIGGKMNAAAKDAWAAAYADISGALISGLQS
1YWH , Knot 137 313 0.83 40 183 296
LRCMQCKTNGDCRVEECALGQDLCRTTIVRLWEEGEELELVEKSCTHSEKTNRTLSYRTGLKITSLTEVVCGLDLCNQGNSGRAVTYSRSRYLECISCGSSDMSCERGRHQSLQCRSPEEQCLDVVTHWIQEGEEGRPKDDRHLRGCGYLPGCPGSNGFHNNDTFHFLKCCNTTKCNEGPILELENLPQNGRQCYSCKGQSTHGCSSEETFLIDCRGPMNQCLVATGTHEPKNQSYMVRGCATASMCQHAHLGDAFSMNHIDVSCCTKSGCNHPDLDVQYRSGAAPQPGPAHLSLTITLLMTARLWGGTLLWT
3TLK , Knot 134 326 0.79 38 177 304
MRLAPLYRNALLLTGLLLSGIAAVQAADWPRQITDSRGTHTLESQPQRIVSTSVTLTGSLLAIDAPVIASGATTPNNRVADDQGFLRQWSKVAKERKLQRLYIGEPSAEAVAAQMPDLILISATGGDSALALYDQLSTIAPTLIINYDDKSWQSLLTQLGEITGHEKQAAERIAQFDKQLAAAKEQIKLPPQPVTAIVYTAAAHSANLWTPESAQGQMLEQLGFTLAKLPAGLNASQSQGKRHDIIQLGGENLAAGLNGESLFLFAGDQKDADAIYANPLLAHLPAVQNKQVYALGTETFRLDYYSAMQVLDRLKALFLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1VRE_1)}(2) \setminus P_{f(1YWH_1)}(2)|=56\), \(|P_{f(1YWH_1)}(2) \setminus P_{f(1VRE_1)}(2)|=136\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110110001110010011100011111000100110100011111110110011110111011101111100110010111010111100010100010100101111011011000111010111001111101010111101100
Pair \(Z_2\) Length of longest common subsequence
1VRE_1,1YWH_1 192 4
1VRE_1,3TLK_1 158 4
1YWH_1,3TLK_1 184 3

Newick tree

 
[
	1YWH_1:98.52,
	[
		1VRE_1:79,3TLK_1:79
	]:19.52
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{460 }{\log_{20} 460}-\frac{147}{\log_{20}147})=92.8\)
Status Protein1 Protein2 d d1/2
Query variables 1VRE_1 1YWH_1 121 86.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]