CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8DDW_1 3PSE_1 6HZB_1 Letter Amino acid
89 6 14 K Lycine
54 5 6 M Methionine
72 13 45 A Alanine
69 12 31 D Aspartic acid
21 1 8 C Cysteine
61 6 18 Q Glutamine
36 5 21 H Histidine
24 3 11 W Tryptophan
41 8 17 Y Tyrosine
89 9 24 R Arginine
97 11 19 I Isoleucine
143 16 36 L Leucine
54 8 9 F Phenylalanine
63 11 38 T Threonine
56 6 21 P Proline
79 8 31 V Valine
51 6 29 N Asparagine
100 14 22 E Glutamic acid
80 11 47 G Glycine
92 12 35 S Serine

8DDW_1|Chains A, C[auth B], E[auth C], G[auth D]|Transient receptor potential cation channel, subfamily M, member 3|Mus musculus (10090)
>3PSE_1|Chain A|RNA polymerase|Crimean-Congo hemorrhagic fever virus (3052518)
>6HZB_1|Chain A|Furin|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8DDW , Knot 486 1371 0.85 40 353 1200
MGKKWRDAGELERGCSDREDSAESRRRSRSASRGRFAESWKRLSSKQGSTKRSGLPAQQTPAQKSWIERAFYKRECVHIIPSTKDPHRCCCGRLIGQHVGLTPSISVLQNEKNESRLSRNDIQSEKWSISKHTQLSPTDAFGTIEFQGGGHSNKAMYVRVSFDTKPDLLLHLMTKEWQLELPKLLISVHGGLQNFELQPKLKQVFGKGLIKAAMTTGAWIFTGGVNTGVIRHVGDALKDHASKSRGKICTIGIAPWGIVENQEDLIGRDVVRPYQTMSNPMSKLTVLNSMHSHFILADNGTTGKYGAEVKLRRQLEKHISLQKINTRIGQGVPVVALIVEGGPNVISIVLEYLRDTPPVPVVVCDGSGRASDILAFGHKYSEEGGLINESLRDQLLVTIQKTFTYTRTQAQHLFIILMECMKKKELITVFRMGSEGHQDIDLAILTALLKGANASAPDQLSLALAWNRVDIARSQIFIYGQQWPVGSLEQAMLDALVLDRVDFVKLLIENGVSMHRFLTISRLEELYNTRHGPSNTLYHLVRDVKKGNLPPDYRISLIDIGLVIEYLMGGAYRCNYTRKRFRTLYHNLFGPKRPKALKLLGMEDDIPLRRGRKTTKKREEEVDIDLDDPEINHFPFPFHELMVWAVLMKRQKMALFFWQHGEEAMAKALVACKLCKAMAHEASENDMVDDISQELNHNSRDFGQLAVELLDQSYKQDEQLAMKLLTYELKNWSNATCLQLAVAAKHRDFIAHTCSQMLLTDMWMGRLRMRKNSGLKVILGILLPPSILSLEFKNKDDMPYMTQAQEIHLQEKEPEEPEKPTKEKDEEDMELTAMLGRSNGESSRKKDEEEVQSRHRLIPVGRKIYEFYNAPIVKFWFYTLAYIGYLMLFNYIVLVKMERWPSTQEWIVISYIFTLGIEKMREILMSEPGKLLQKVKVWLQEYWNVTDLIAILLFSVGMILRLQDQPFRSDGRVIYCVNIIYWYIRLLDIFGVNKYLGPYVMMIGKMMIDMMYFVIIMLVVLMSFGVARQAILFPNEEPSWKLAKNIFYMPYWMIYGEVFADQIDPPCGQNETREDGKTIQLPPCKTGAWIVPAIMACYLLVANILLVNLLIAVFNNTFFEVKSISNQVWKFQRYQLIMTFHERPVLPPPLIIFSHMTMIFQHVCCRWRKHESDQDERDYGLKLFITDDELKKVHDFEEQCIEEYFREKDDRFNSSNDERIRVTSERVENMSMRLEEVNEREHSMKASLQTVDIRLAQLEDLIGRMATALERLTGLERAESNKIRSRTSSDCTDAAYIVRQSSFNSQEGNTFKLQESIDPAGEETISPTSPTLMPRMRSHSFYSVNVKDKGGIEKLESIFKERSLSLHRATS
3PSE , Knot 88 171 0.88 40 137 168
GPMDFLRSLDWTQVIAGQYVSNPRFNISDYFEIVRQPGDGNCFYHSIAELTMPNKTDHSYHYIKRLTESAARKYYQEEPEARLVGLSLEDYLKRMLSDNEWGSTLEASMLAKEMGITIIIWTVAASDEVEAGIKFGDGDVFTAVNLLHSGQTHFDALRILPQFETDTREAL
6HZB , Knot 198 482 0.84 40 245 450
DVYQEPTDPKFPQQWYLSGVTQRDLNVKAAWAQGYTGHGIVVSILDDGIEKNHPDLAGNYDPGASFDVNDQDPDPQPRYTQMNDNRHGTRCAGEVAAVANNGVCGVGVAYNARIGGVRMLDGEVTDAVEARSLGLNPNHIHIYSASWGPEDDGKTVDGPARLAEEAFFRGVSQGRGGLGSIFVWASGNGGREHDSCNCDGYTNSIYTLSISSATQFGNVPWYSEACSSTLATTYSSGNQNEKQIVTTDLRQKCTESHTGTSASAPLAAGIIALTLEANKNLTWRDMQHLVVQTSKPAHLNANDWATNGVGRKVSHSYGYGLLDAGAMVALAQNWTTVAPQRKCIIDILTEPKDIGKRLEVRKTVTACLGEPNHITRLEHAQARLTLSYNRRGDLAIHLVSPMGTRSTLLAARPHDYSADGFNDWAFMTTHSWDEDPSGEWVLEIENTSEANNYGTLTKFTLVLYGTASGSLVPRGSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8DDW_1)}(2) \setminus P_{f(3PSE_1)}(2)|=230\), \(|P_{f(3PSE_1)}(2) \setminus P_{f(8DDW_1)}(2)|=14\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110010011010010000000100000000100101100100100001000001111000110001100110000010111000010000010111001110101011000000001000010000101000001010011101010111000011010101000101110110001010110111010111001010101001110111011100111110111001110011011000100001010011111111100000111001101000100110010110010001111001001001101010001000101001000110111111111011101101110010001111111001010100111110000001111000100011101000100000010011111100100001101101100100010111101110110101100101111100101100011101001111010011101111001011011100110100110100100100000110001001100100101110001011011111001111100000000010010001111001011011110001110010000000000101010010100111110011111111000011111100100111011110010011100100001100100010000001101110110000000001110110001001001001011111000011100000111001111010100001101111111110110101000001101001001010000100100100000000101011110001000000000010000011111001001001111011100110110111100111101001100001111001101110010011100110110010111000101001111111011111010001100010110010110101011011110001110111110111011011111111110111100111110001010110011011011101011100101101000000010010111000111111111100111101111011111100011010010001101000011101000111111111100101110010001000000000000110111000010010010000100010000001000000010100001001010100100000010101001010110100111011011001011001000010000000000110110000100001001010001011100010100101110100001001010001110010011000010100100
Pair \(Z_2\) Length of longest common subsequence
8DDW_1,3PSE_1 244 4
8DDW_1,6HZB_1 152 4
3PSE_1,6HZB_1 186 4

Newick tree

 
[
	3PSE_1:11.31,
	[
		8DDW_1:76,6HZB_1:76
	]:41.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1542 }{\log_{20} 1542}-\frac{171}{\log_{20}171})=360.\)
Status Protein1 Protein2 d d1/2
Query variables 8DDW_1 3PSE_1 459 257
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]