CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6CIF_1 5DWR_1 6XHE_1 Letter Amino acid
26 12 7 Q Glutamine
17 19 3 I Isoleucine
12 12 10 K Lycine
14 7 6 Y Tyrosine
39 15 12 A Alanine
29 22 4 R Arginine
27 25 5 E Glutamic acid
20 8 10 T Threonine
12 7 0 W Tryptophan
43 20 4 P Proline
10 6 8 C Cysteine
28 23 3 G Glycine
11 19 4 H Histidine
6 4 4 M Methionine
17 15 3 F Phenylalanine
15 8 10 N Asparagine
18 20 5 D Aspartic acid
42 40 2 L Leucine
28 23 15 S Serine
26 23 9 V Valine

6CIF_1|Chains A, B, C, D|Nitric oxide synthase, endothelial|Homo sapiens (9606)
>5DWR_1|Chain A|Serine/threonine-protein kinase pim-1|Homo sapiens (9606)
>6XHE_1|Chains A, B|Ribonuclease pancreatic|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6CIF , Knot 186 440 0.85 40 246 410
APASLLPPAPEHSPPSSPLTQPPEGPKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQGRPSPGPPAPEQLLSQARDFINQYYSSIKRSGSQAHEQRLQEVEAEVAATGTYQLRESELVFGAKQAWRNAPRCVGRIQWGKLQVFDARDCRSAQEMFTYICNHIKYATNRGNLRSAITVFPQRCPGRGDFRIWNSQLVRYAGYRQQDGSVRGDPANVEITELCIQHGWTPGNGRFDVLPLLLQAPDEPPELFLLPPELVLEVPLEHPTLEWFAALGLRWYALPAVSNMLLEIGGLEFPAAPFSGWYMSTEIGTRNLCDPHRYNILEDVAVCMDLDTRTTSSLWKDKAAVEINVAVLHSYQLAKVTIVDHHAATASFMKHLENEQKARGGCPADWAWIVPPISGSLTPVFHQEMVNYFLSPAFRYQPDPW
5DWR , Knot 140 328 0.82 40 197 306
MAHHHHHHLEVLFQGPLLSKINSLAHLRAAPCNDLHATKLAPGKEKEPLESQYQVGPLLGSGGFGSVYSGIRVSDNLPVAIKHVEKDRISDWGELPNGTRVPMEVVLLKKVSSGFSGVIRLLDWFERPDSFVLILERPEPVQDLFDFITERGALQEELARSFFWQVLEAVRHCHNCGVLHRDIKDENILIDLNRGELKLIDFGSGALLKDTVYTDFDGTRVYSPPEWIRYHRYHGRSAAVWSLGILLYDMVCGDIPFEHDEEIIRGQVFFRQRVSSECQHLIRWCLALRPSDRPTFEEIQNHPWMQDVLLPQETAEIHLHSLSPGPSK
6XHE , Knot 64 124 0.83 38 107 121
KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6CIF_1)}(2) \setminus P_{f(5DWR_1)}(2)|=114\), \(|P_{f(5DWR_1)}(2) \setminus P_{f(6CIF_1)}(2)|=65\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11101111110001100110011011011010010110100001010100011001000110111100101010111111001100100110000001000100100001001010111010001000011111001100110011010110101101000001001100100010010001010011011100011010101100011001100000101010110101001010011011010101111110110011011111101110111001010111111101011111001110111101111110110100011000100100001100111010100000001100011101011110000110101100011010110010000010110110111111110101011100011001101110001011
Pair \(Z_2\) Length of longest common subsequence
6CIF_1,5DWR_1 179 4
6CIF_1,6XHE_1 225 3
5DWR_1,6XHE_1 218 3

Newick tree

 
[
	6XHE_1:11.99,
	[
		6CIF_1:89.5,5DWR_1:89.5
	]:27.49
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{768 }{\log_{20} 768}-\frac{328}{\log_{20}328})=120.\)
Status Protein1 Protein2 d d1/2
Query variables 6CIF_1 5DWR_1 154 133
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]