CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8KGY_1 2KHV_1 3WCG_1 Letter Amino acid
12 3 8 Q Glutamine
37 6 25 E Glutamic acid
15 8 22 H Histidine
21 4 15 Y Tyrosine
53 6 15 G Glycine
38 6 14 I Isoleucine
14 2 15 M Methionine
23 1 15 F Phenylalanine
37 6 19 S Serine
53 7 32 A Alanine
21 3 10 N Asparagine
30 6 27 D Aspartic acid
6 2 10 C Cysteine
26 5 9 P Proline
26 8 20 T Threonine
33 6 31 V Valine
35 8 24 R Arginine
40 7 38 L Leucine
33 7 13 K Lycine
5 5 3 W Tryptophan

8KGY_1|Chains A, B, C, D, E, F|Glutamate dehydrogenase 1, mitochondrial|Homo sapiens (9606)
>2KHV_1|Chain A|Phage integrase|Nitrosospira multiformis ATCC 25196 (323848)
>3WCG_1|Chains A, B, C, D|Farnesyltransferase, putative|Trypanosoma cruzi (353153)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8KGY , Knot 222 558 0.83 40 261 519
MYRYLGEALLLSRAGPAALGSASADSAALLGWARGQPAAAPQPGLALAARRHYSEAVADREDDPNFFKMVEGFFDRGASIVEDKLVEDLRTRESEEQKRNRVRGILRIIKPCNHVLSLSFPIRRDDGSWEVIEGYRAQHSQHRTPCKGGIRYSTDVSVDEVKALASLMTYKCAVVDVPFGGAKAGVKINPKNYTDNELEKITRRFTMELAKKGFIGPGIDVPAPDMSTGEREMSWIADTYASTIGHYDINAHACVTGKPISQGGIHGRISATGRGVFHGIENFINEASYMSILGMTPGFGDKTFVVQGFGNVGLHSMRYLHRFGAKCIAVGESDGSIWNPDGIDPKELEDFKLQHGSILGFPKAKPYEGSILEADCDILIPAASEKQLTKSNAPRVKAKIIAEGANGPTTPEADKIFLERNIMVIPDLYLNAGGVTVSYFEWLKNLNHVSYGRLTFKYERDSNYHLLMSVQESLERKFGKHGGTIPIVPTAEFQDRISGASEKDIVHSGLAYTMERSARQIMRTAMKYNLGLDLRTAAYVNAIEKVFKVYNEAGVTFT
2KHV , Knot 58 106 0.85 40 91 99
MTFSECAALYIKAHRSSWKNTKHADQWTNTIKTYCGPVIGPLSVQDVDTKLIMKVLDPIWEQKPETASRLRGRIESVLDWATVRGYREGDNPARWRGYLEHHHHHH
3WCG , Knot 154 365 0.83 40 203 348
MGSSHHHHHHSSGLVPRGSHMACNDEDLRFCYDILQAVSRSFAVVIMELDEEMRDAVCIFYLVLRALDTVEDDMSIPVEFKLRELPKFHEHLHDTTWCMSGVGVGRERELLERYTHVTRAYSRLGKAYQDVISGICERMANGMCDFLTRKVETKADYDLYCHYVAGLVGHGLTLLYVSSGLEDVRLADDLTNANHMGLFLQKTNIIRDFYEDICEVPPRVFWPREIWEKYTDDLHAFKDELHEAKAVECLNAMVADALVHVPHVVEYLASLRDPSVFAFSAIPQVMAMATLSLVFNNKDVFHTKVKTTRGATARIFHYSTELQATLQMLKTYTLRLAARMNAQDACYDRIEHLVNDAIRAMESHQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8KGY_1)}(2) \setminus P_{f(2KHV_1)}(2)|=190\), \(|P_{f(2KHV_1)}(2) \setminus P_{f(8KGY_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100011011110011111110101001111111010111110111111100000011100000101101101110011011000110010000000000001011101101000110101110000101011010010000000100111000001010010111011000011101111110111010100000001001000101011001111111011110100100010111000100110001010101010110011101010101011101100110010010111101111000111011101110010010011100111100010110101101001001010010111110101001011010001111110000100001101010111011011001010011100011111010101111010010110010010010101000000000111010001000110011011111010100010110000110011100100010011001100011101001101011001101000111010
Pair \(Z_2\) Length of longest common subsequence
8KGY_1,2KHV_1 210 3
8KGY_1,3WCG_1 172 4
2KHV_1,3WCG_1 192 6

Newick tree

 
[
	2KHV_1:10.01,
	[
		8KGY_1:86,3WCG_1:86
	]:19.01
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{664 }{\log_{20} 664}-\frac{106}{\log_{20}106})=161.\)
Status Protein1 Protein2 d d1/2
Query variables 8KGY_1 2KHV_1 206 121
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]