CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6FYC_1 1VMO_1 8CQX_1 Letter Amino acid
9 6 2 W Tryptophan
29 11 28 V Valine
59 7 50 A Alanine
13 2 3 H Histidine
10 8 5 K Lycine
30 5 22 P Proline
28 10 14 T Threonine
46 21 33 G Glycine
29 9 10 D Aspartic acid
5 8 0 C Cysteine
11 6 16 Q Glutamine
16 6 7 I Isoleucine
45 13 40 L Leucine
27 14 9 S Serine
15 3 1 Y Tyrosine
13 7 8 N Asparagine
20 6 20 E Glutamic acid
10 0 4 M Methionine
17 9 9 F Phenylalanine
32 12 19 R Arginine

6FYC_1|Chains A, B|Putative FAD-dependent oxygenase EncM|Streptomyces maritimus (115828)
>1VMO_1|Chains A, B|VITELLINE MEMBRANE OUTER LAYER PROTEIN I|Gallus gallus (9031)
>8CQX_1|Chains A, B, C, D|Ribokinase|Thermus sp. (275)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6FYC , Knot 187 464 0.82 40 226 433
MQFPQLDPATLAAFSAAFRGELIWPSDADYDEARRIWNGTIDRRPALIARCTSTPDVVAAVSFARKSGLLVAVRGGGHSMAGHSVCDGGIVIDLSLMNSIKVSRRLRRARAQGGCLLGAFDTATQAHMLATPAGVVSHTGLGGMVLGGGFGWLSRKYGLSIDNLTSVEIVTADGGVLTASDTENPDLFWAVRGGGGNFGVVTAFEFDLHRVGPVRFASTYYSLDEGPQVIRAWRDHMATAPDELTWALYLRLAPPLPELPADMHGKPVICAMSCWIGDPHEGERQLESILHAGKPHGLTKATLPYRALQAYSFPGAVVPDRIYTKSGYLNELSDEATDTVLEHAADIASPFTQLELLYLGGAVARVPDDATAYPNRQSPFVTNLAAAWMDPTEDARHTAWAREGYRALAGHLSGGYVNFMNPGEADRTREAYGAAKFERLQGVKAKYDPTNLFRLNQNIPPSSP
1VMO , Knot 75 163 0.78 38 109 148
RTREYTSVITVPNGGHWGKWGIRQFCHSGYANGFALKVEPSQFGRDDTALNGIRLRCLDGSVIESLVGKWGTWTSFLVCPTGYLVSFSLRSEKSQGGGDDTAANNIQFRCSDEAVLVGDGLSWGRFGPWSKRCKICGLQTKVESPQGLRDDTALNNVRFFCCK
8CQX , Knot 123 300 0.78 38 141 264
MILVVGSLNMDLVLRVKRLPRPGETVLGEDYQTHPGGKGANQAVAIARLGGKVRMLGRVGEDPFGQALKSGLAQEGVDVAWVLETPGPSGTGFILVDPEGQNQIAVAPGANARLVPEDLPATAFQGVGVVLLQLEIPLETVVRAAALGRKAGARILLNAAPAHALPSEILQSVDLLLVNEVEAAQLTEASPPRTPEEALALARQLRGRAPQAQVVLTLGAQGAVWSGTEESHFPAFPVRAVDTTAAGDAFAGALALGLAEGQNMRAALRFANAAGALATTRPGAQPSLPFRDEVEALLFG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6FYC_1)}(2) \setminus P_{f(1VMO_1)}(2)|=153\), \(|P_{f(1VMO_1)}(2) \setminus P_{f(6FYC_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110101101111011101011110010000100110101000111110000010111110110001111110111001110010011111010110010100010010101101111100100101110111110001111111111111100001101001001011010111101000001011111011110111101101010011110110000010011011011000110110010111010111111011101010111011001110100100010011011010110010110011010011111110010000101001000100011001101101100101101111110110010101000011100111111010001000111001001111010110101101101000001011101001011010001001101000111001
Pair \(Z_2\) Length of longest common subsequence
6FYC_1,1VMO_1 189 4
6FYC_1,8CQX_1 153 4
1VMO_1,8CQX_1 146 3

Newick tree

 
[
	6FYC_1:89.88,
	[
		8CQX_1:73,1VMO_1:73
	]:16.88
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{627 }{\log_{20} 627}-\frac{163}{\log_{20}163})=133.\)
Status Protein1 Protein2 d d1/2
Query variables 6FYC_1 1VMO_1 168 110.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]