CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8VFR_1 6RVA_1 8JDT_1 Letter Amino acid
3 0 6 W Tryptophan
8 0 17 H Histidine
41 6 43 L Leucine
13 3 15 K Lycine
6 1 8 M Methionine
29 3 44 V Valine
45 6 47 A Alanine
29 6 25 R Arginine
13 1 18 N Asparagine
6 6 11 C Cysteine
31 8 47 G Glycine
16 1 16 I Isoleucine
19 4 13 F Phenylalanine
9 3 6 Y Tyrosine
25 5 27 S Serine
21 3 31 T Threonine
26 5 22 D Aspartic acid
12 2 22 Q Glutamine
25 3 29 E Glutamic acid
33 5 24 P Proline

8VFR_1|Chain A|Cytochrome P450|Rhodopseudomonas palustris HaA2 (316058)
>6RVA_1|Chain A[auth X]|Insulin-like growth factor I|Homo sapiens (9606)
>8JDT_1|Chain A|Probable D-lactate dehydrogenase, mitochondrial|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8VFR , Knot 172 410 0.84 40 214 389
MISNSSAESISAPPNDSTIPHLAIDPFSLDFFDDPYPDQQTLRDAGPVVYLDKWNVYGVARYAEVHAVLNDPTTFCSSRGVGLSDFKKEKPWRPPSLILEADPPAHTRPRAVLSKVLSPATMKTIRDGFAAAADAKVDELLQRGCIDAIADLAEAYPLSVFPDAMGLKQEGREHLLPYAGLVFNAFGPPNELRQTAIERSAPHQAYVNEQCQRPNLAPGGFGACIHAFTDTGEITPDEAPLLVRSLLSGALQETVNGIGAAVYCLARFPGELQRLRSDPTLARNAFEEAVRFESPVQTFFRTTTREVELGGAVIGEGEKVLMFLGSANRDPRRWSDPDLYDITRKTSGHVGFGSGVHMCVGQLVARLEGEVMLSALARKVAAIDIDGPVKRRFNNTLRGLESLPVKLTPA
6RVA , Knot 39 71 0.78 36 63 69
GGPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLDMYCAPLKPAKSA
8JDT , Knot 190 471 0.82 40 236 441
WSHPQFEKGSQGGLSQDFVEALKAVVGSPHVSTASAVREQHGHDESMHRCQPPDAVVWPQNVDQVSRVASLCYNQGVPIIPFGTGTGVEGGVCAVQGGVCINLTHMDQITELNTEDFSVVVEPGVTRKALNTHLRDSGLWFPVDPGADASLCGMAATGASGTNAVRYGTMRDNVINLEVVLPDGRLLHTAGRGRHYRKSAAGYNLTGLFVGSEGTLGIITSTTLRLHPAPEATVAATCAFPSVQAAVDSTVQILQAAVPVARIEFLDDVMMDACNRHSKLNCPVAPTLFLEFHGSQQTLAEQLQRTEAITQDNGGSHFSWAKEAEKRNELWAARHNAWYAALALSPGSKAYSTDVCVPISRLPEILVETKEEIKASKLTGAIVGHVGDGNFHCILLVDPDDAEEQRRVKAFAENLGRRALALGGTCTGEHGIGLGKRQLLQEEVGPVGVETMRQLKNTLDPRGLMNPGKVL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8VFR_1)}(2) \setminus P_{f(6RVA_1)}(2)|=174\), \(|P_{f(6RVA_1)}(2) \setminus P_{f(8VFR_1)}(2)|=23\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000010010111000011011101101011001010000100111110100101011100101011100100100001111001000011011011101011100010111001101101001001111110101001100101011101101011011101111000100011101111101111100100011000110010100000010111111110101100010101001111100110111000101111110011011101001000101100110011010011001100000010111111101001111110100010010010100100000101111011010110111010101110111001111010111000100010110011101011
Pair \(Z_2\) Length of longest common subsequence
8VFR_1,6RVA_1 197 3
8VFR_1,8JDT_1 160 4
6RVA_1,8JDT_1 213 3

Newick tree

 
[
	6RVA_1:10.07,
	[
		8VFR_1:80,8JDT_1:80
	]:29.07
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{481 }{\log_{20} 481}-\frac{71}{\log_{20}71})=124.\)
Status Protein1 Protein2 d d1/2
Query variables 8VFR_1 6RVA_1 158 92
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]