CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6VCF_1 5MIN_1 3GUB_1 Letter Amino acid
11 19 12 Q Glutamine
12 7 7 H Histidine
7 3 4 M Methionine
34 14 17 F Phenylalanine
18 12 13 R Arginine
26 37 12 N Asparagine
25 30 17 D Aspartic acid
2 2 1 C Cysteine
36 28 18 S Serine
4 6 3 W Tryptophan
34 13 29 E Glutamic acid
23 37 16 G Glycine
34 30 18 V Valine
32 27 17 A Alanine
30 34 15 T Threonine
24 26 10 Y Tyrosine
30 22 22 I Isoleucine
35 38 28 L Leucine
35 36 24 K Lycine
20 32 12 P Proline

6VCF_1|Chains A, B, C, D, E, F|carotenoid cleavage dioxygenase|Candidatus Nitrosotalea devanaterra (1078905)
>5MIN_1|Chains A, B|Quinoprotein glucose dehydrogenase B|Acinetobacter calcoaceticus (471)
>3GUB_1|Chain A|Death-associated protein kinase 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6VCF , Knot 191 472 0.83 40 237 443
MAYTVTNKFQLGFSTLSEELDLESLQVKGTIPKWLSGTLIRNGPAKFEVGKEKFQHWFDGLAMLHKFSFKEGKVSYANKFLESKAYQSARDTDKISYREFATDPCRSIFKRVSSMFSTKFTDNANVNVTKIAERFVAMTETPLPVEFDINTLKTVGVFAYDDKIESGLTTAHPHYDFVKNELVNYATKISRSSNYNVYKIADKTNHRNLIGSIPVEEPAYMHSFAMTENYVVLVEYPFVVKPLDLLLSGKPFIENFSWKPENGTRFIIVNRQNGNLVGTYKSDAFFAFHHVNAFEKQEEIFVDIIAYQDSSIVNALYLDILRGQKTDTIPTSHIRRYRIPLSGGQVEYEMLSSEAVELPRINYKQYNTKDYRFVYGISTYSASDFANQLVKIDILRKSSKIWSEKDCYPGEPVFVGAPDATKEDEGLILSAVLDATNAKSFLLILDATTFEEVARAEVPHHIPFGFHGNYFE
5MIN , Knot 186 453 0.83 40 233 429
DVPLTPSQFAKAKSENFDKKVILSNLNKPHALLWGPDNQIWLTERATGKILRVNPESGSVKTVFQVPEIVNDADGQNGLLGFAFHPDFKNNPYIYISGTFKNPKSTDKALPNQTIIRRYTYNKSTDTLEKPVDLLAGLPSSKDHQSGRLVIGPDQKIYYTIGDQGRNQLAYLFLPNQAQHTPTQQELNGKDYHTYMGKVLRLNLDGSIPKDNPSFNGVVSHIYTLGHRNPQGLAFTPNGKLLQSEQGPNSDDEINLIVKGGNYGWPNVAGYKDDSGYAYANYSAAANKSIKDLAQNGVKVAAGVPVTKESEWTGKNFVPPLKTLYTVQDTYNYNDPTCGEMTYICWPTVAPSSAYVYKGGKKAITGWENTLLVPSLKRGVIFRIKLDPTYSTTYDDAVPMFKSNNRYRDVIASPDGNVLYVLTDTAGNVQKDDGSVTNTLENPGSLIKFTYKA
3GUB , Knot 124 295 0.79 40 179 284
MTVFRQENVDDYYDTGEELGSGQFAVVKKCREKSTGLQYAAKFIKKRRTKSSRRGVSREDIEREVSILKEIQHPNVITLHEVYENKTDVILIGELVAGGELFDFLAEKESLTEEEATEFLKQILNGVYYLHSLQIAHFDLKPENIMLLDRNVPKPRIKIIDFGLAHKIDFGNEFKNIFGTPEFVAPEIVNYEPLGLEADMWSIGVITYILLSGASPFLGDTKQETLANVSAVNYEFEDEYFSNTSALAKDFIRRLLVKDPKKRMTIQDSLQHPWIKPKDTQQALSSAWSHPQFEK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6VCF_1)}(2) \setminus P_{f(5MIN_1)}(2)|=67\), \(|P_{f(5MIN_1)}(2) \setminus P_{f(6VCF_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100100010111001000101001010101101101011001110101100010011011111001010010100100110001000100000100001100100011001001100010001010100110011110001111010100100111110000100110010100011000110010010000000100110000000111011100110100111000011110011110110111010111001010100100111100001011100000111110010110000011101110000011011010110100000110001000011101101000110001101101000000000001101100001001100110101100000110000001101111111010000011110111010010011111010010011010110011111010010
Pair \(Z_2\) Length of longest common subsequence
6VCF_1,5MIN_1 130 4
6VCF_1,3GUB_1 132 4
5MIN_1,3GUB_1 152 5

Newick tree

 
[
	3GUB_1:73.11,
	[
		6VCF_1:65,5MIN_1:65
	]:8.11
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{925 }{\log_{20} 925}-\frac{453}{\log_{20}453})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 6VCF_1 5MIN_1 155 154
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]