CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8OQN_1 2JFY_1 6EHM_1 Letter Amino acid
89 20 53 A Alanine
31 4 33 R Arginine
24 10 53 Q Glutamine
45 18 38 K Lycine
41 14 42 P Proline
13 11 21 Y Tyrosine
53 14 42 V Valine
38 9 59 D Aspartic acid
81 16 41 G Glycine
20 4 20 M Methionine
18 7 33 N Asparagine
41 24 29 I Isoleucine
59 33 67 L Leucine
30 13 26 F Phenylalanine
41 12 38 T Threonine
5 2 4 W Tryptophan
2 4 3 C Cysteine
50 20 59 E Glutamic acid
12 7 30 H Histidine
43 13 48 S Serine

8OQN_1|Chains A, B|3-hydroxyacyl-CoA dehydrogenase|Mycobacterium tuberculosis H37Rv (83332)
>2JFY_1|Chains A, B|GLUTAMATE RACEMASE|HELICOBACTER PYLORI (210)
>6EHM_1|Chains A, B|Nucleoprotein|Zaire ebolavirus (strain Mayinga-76) (128952)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8OQN , Knot 267 736 0.79 40 257 640
MGSSHHHHHHSQDPNSMPDNTIQWDKDADGIVTLTMDDPSGSTNVMNEAYIESMGKAVDRLVAEKDSITGVVVASAKKTFFAGGDVKTMIQARPEDAGDVFNTVETIKRQLRTLETLGKPVVAAINGAALGGGLEIALACHHRIAADVKGSQLGLPEVTLGLLPGGGGVTRTVRMFGIQNAFVSVLAQGTRFKPAKAKEIGLVDELVATVEELVPAAKAWIKEELKANPDGAGVQPWDKKGYKMPGGTPSSPGLAAILPSFPSNLRKQLKGAPMPAPRAILAAAVEGAQVDFDTASRIESRYFASLVTGQVAKNMMQAFFFDLQAINAGGSRPEGIGKTPIKRIGVLGAGMMGAGIAYVSAKAGYEVVLKDVSLEAAAKGKGYSEKLEAKALERGRTTQERSDALLARITPTADAADFKGVDFVIEAVFENQELKHKVFGEIEDIVEPNAILGSNTSTLPITGLATGVKRQEDFIGIHFFSPVDKMPLVEIIKGEKTSDEALARVFDYTLAIGKTPIVVNDSRGFFTSRVIGTFVNEALAMLGEGVEPASIEQAGSQAGYPAPPLQLSDELNLELMHKIAVATRKGVEDAGGTYQPHPAEAVVEKMIELGRSGRLKGAGFYEYADGKRSGLWPGLRETFKSGSSQPPLQDMIDRMLFAEALETQKCLDEGVLTSTADANIGSIMGIGFPPWTGGSAQFIVGYSGPAGTGKAAFVARARELAAAYGDRFLPPESLLS
2JFY , Knot 115 255 0.83 40 159 245
MKIGVFDSGVGGFSVLKSLLKARLFDEIIYYGDSARVPYGTKDPTTIKQFGLEALDFFKPHEIELLIVACNTASALALEEMQKYSKIPIVGVIEPSILAIKRQVEDKNAPILVLGTKATIQSNAYDNALKQQGYLNISHLATSLFVPLIEESILEGELLETCMHYYFTPLEILPEVIILGCTHFPLIAQKIEGYFMGHFALPTPPLLIHSGDAIVEYLQQKYALKNNACTFPKVEFHASGDVIWLERQAKEWLKL
6EHM , Knot 288 739 0.85 40 284 683
MDSRPQKIWMAPSLTESDMDYHKILTAGLSVQQGIVRQRVIPVYQVNNLEEICQLIIQAFEAGVDFQESADSFLLMLCLHHAYQGDYKLFLESGAVKYLEGHGFRFEVKKRDGVKRLEELLPAVSSGKNIKRTLAAMPEEETTEANAGQFLSFASLFLPKLVVGEKACLEKVQRQIQVHAEQGLIQYPTAWQSVGHMMVIFRLMRTNFLIKFLLIHQGMHMVAGHDANDAVISNSVAQARFSGLLIVKTVLDHILQKTERGVRLHPLARTAKVKNEVNSFKAALSSLAKHGEYAPFARLLNLSGVNNLEHGLFPQLSAIALGVATAHGSTLAGVNVGEQYQQLREAATEAEKQLQQYAESRELDHLGLDDQEKKILMNFHQKKNEISFQQTNAMVTLRKERLAKLTEAITAASLPKTSGHYDDDDDIPFPGPINDDDNPGHQDDDPTDSQDTTIPDVVVDPDDGSYGEYQSYSENGMNAPDDLVLFDLDEDDEDTKPVPNRSTKGGQQKNSQKGQHIEGRQTQSRPIQNVPGPHRTIHHASAPLTDNDRRNEPSGSTSPRMLTPINEEADPLDDADDETSSLPPLESDDEEQDRDGTSNRTPTVAPPAPVYRDHSEKKELPQDEQQDQDHTQEARNQDSDNTQSEHSFEEMYRHILRSQGPFDAVLYYHMMKDEPVVFSTSDGKEYTYPDSLEEEYPPWLTEKEAMNEENRFVTLDGQQFYWPVMNHKNKFMAILQHHQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8OQN_1)}(2) \setminus P_{f(2JFY_1)}(2)|=132\), \(|P_{f(2JFY_1)}(2) \setminus P_{f(8OQN_1)}(2)|=34\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000000100110001010001011101010010100011001010011011001110000101111101000111110100110101001101100100100010010011011111101111111101111000011101010011110101111111111000101111001110111010010110100111100111010011111011100010101011110110001001111010011111111011001000101111111011111110110101001001000011011010110011011110101101110010111001100111111111111110101011001110010101110101000010101100100000000111101010101101011011101110000100011101001101011110000011101110110000011110110110011110110100000011101100011110011110000111000111011001111110110110100110011011111010001010110011110001100111000101101110011011001010111100010100011111100010010001110011001111011000001001110001010110111111111011010111100111101011111010011110100111100110
Pair \(Z_2\) Length of longest common subsequence
8OQN_1,2JFY_1 166 4
8OQN_1,6EHM_1 113 5
2JFY_1,6EHM_1 181 4

Newick tree

 
[
	2JFY_1:94.80,
	[
		8OQN_1:56.5,6EHM_1:56.5
	]:38.30
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{991 }{\log_{20} 991}-\frac{255}{\log_{20}255})=198.\)
Status Protein1 Protein2 d d1/2
Query variables 8OQN_1 2JFY_1 244 163
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]