CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3DRW_1 5GIN_1 3KYH_1 Letter Amino acid
1 2 3 W Tryptophan
21 14 11 Y Tyrosine
30 38 11 A Alanine
20 19 18 N Asparagine
19 14 17 P Proline
47 41 25 L Leucine
25 14 11 F Phenylalanine
20 11 20 T Threonine
23 22 21 D Aspartic acid
0 0 2 C Cysteine
42 34 29 I Isoleucine
6 6 4 M Methionine
46 32 20 E Glutamic acid
13 14 5 H Histidine
34 26 25 K Lycine
22 21 32 S Serine
28 21 19 V Valine
34 28 17 R Arginine
11 11 11 Q Glutamine
32 20 9 G Glycine

3DRW_1|Chains A, B|ADP-specific phosphofructokinase|Pyrococcus horikoshii (53953)
>5GIN_1|Chains A, D[auth B], M[auth K]|C/D box methylation guide ribonucleoprotein complex aNOP56 subunit|Sulfolobus solfataricus (2287)
>3KYH_1|Chains A, B|mRNA-capping enzyme subunit beta|Saccharomyces cerevisiae (4932)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3DRW , Knot 193 474 0.83 38 227 442
MGSSHHHHHHSSGRENLYFQGHMIPEHLSIYTAYNANIAAIVKLNQETIQNLINAFDPDEVKRRIEEYPREINEPIDFVARLVHTLKLGKPAAVPLVNEKMNEWFDKTFRYEEERLGGQAGIIANTLAGLKIRKVIAYTPFLPKRLAELFKKGVLYPVVENGELQFKPIQEAYREGDPLKINRIFEFRKGLKFKLGDETIEIPNSGRFIVSARFESISRIETREDIKPFLGEIGKEVDGAIFSGYQGLRTKYSDGKDANYYLRRAKEDIIEFKEKDVKIHVEFASVQDRKLRKKIITNILPFVDSVGIDEAEIAQILSVLGYRELADRIFTYNRLEDSILGGMIILDELNFEILQVHTTYYLMYITHRDNPLSEEELAKSLEFGTTLAAARASLGDIRGPDDYKVGLKVPFNERSEYVKLRFEEAKSRLRMREYKVVVIPTRLVQNPVLTVGLGDTISAGAFLTYLEFLKRHGS
5GIN , Knot 160 388 0.82 38 212 364
MVKIYLIEHVIGAVAYDENGNIVDYITNPRDLGKITEELLNNEKGIPFSATVELLKKVNPQEVVVENEAEVPKLQALGYRVSYEPYSKVSRIFRESLPKVAIDIKFASNEEDYYNFLHELSLEYTRRKLRSAAQKRDLLAIQAVRAMDDIDKTINLFSERLREWYSIHFPELDKLIEDHEEYATIVSRFGDRGFLTIDSLKELGFNEQRINRILDAAKKSIGADISEDDLSAMRMIANTILDLYNIRRNLNNYLEGVMKEVAPNVTALVGPALGARLLSIAGSLDELAKMPASTIQVLGAEKALFRALRSGGRPPKHGIIFQYPAIHTSPRWQRGKIARALAAKLAIAARVDAFSGRFIGDQLNEQLKKRIDEIKEKFAQHHHHHHHH
3KYH , Knot 135 310 0.83 40 191 299
MYKNVPIWAQKWKPTIKALQSINVKDLKIDPSFLNIIPDDDLTKSVQDWVYATIYSIAPELRSFIELEMKFGVIIDAKGPDRVNPPVSSQCVFTELDAHLTPNIDASLFKELSKYIRGISEVTENTGKFSIIESQTRDSVYRVGLSTQRPRFLRMSTDIKTGRVGQFIEKRHVAQLLLYSPKDSYDVKISLNLELPVPDNDPPEKYKSQSPISERTKDRVSYIHNDSCTRIDITKVENHNQNSKSRQSETTHEVELEINTPALLNAFDNITNDSKEYASLIRTFLNNGTIIRRKLSSLSYEIFEGSKKVM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3DRW_1)}(2) \setminus P_{f(5GIN_1)}(2)|=64\), \(|P_{f(5GIN_1)}(2) \setminus P_{f(3DRW_1)}(2)|=49\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000000000100010101011100101001001011111010000100110110100100010001001001101110110010110111111100010011000100000011101111100111101001110011110011011001110111001010101100100010110100110100110101100010110010111010100100100000101111011001011110100110000001001000100100011010000101010110100001000110011111001110010110110111000110011000010001111111100101011010000011010000011000011001011001111010110101100001110111000000101010010001010000111110011001110111100101111100101100010
Pair \(Z_2\) Length of longest common subsequence
3DRW_1,5GIN_1 113 6
3DRW_1,3KYH_1 142 4
5GIN_1,3KYH_1 151 4

Newick tree

 
[
	3KYH_1:78.08,
	[
		3DRW_1:56.5,5GIN_1:56.5
	]:21.58
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{862 }{\log_{20} 862}-\frac{388}{\log_{20}388})=127.\)
Status Protein1 Protein2 d d1/2
Query variables 3DRW_1 5GIN_1 150 137.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]