CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4FLN_1 1QWC_1 5MOX_1 Letter Amino acid
48 36 18 L Leucine
4 11 5 M Methionine
20 19 12 F Phenylalanine
27 23 8 P Proline
38 26 22 S Serine
34 20 18 A Alanine
37 25 23 E Glutamic acid
14 14 2 H Histidine
23 25 11 T Threonine
35 23 16 I Isoleucine
19 16 6 Y Tyrosine
37 26 6 D Aspartic acid
10 10 2 C Cysteine
42 25 18 G Glycine
51 24 20 V Valine
26 22 8 R Arginine
20 18 14 N Asparagine
21 18 10 Q Glutamine
30 26 21 K Lycine
3 13 7 W Tryptophan

4FLN_1|Chains A, B, C|Protease Do-like 2, chloroplastic|Arabidopsis thaliana (3702)
>1QWC_1|Chain A|Nitric-oxide synthase, brain|Rattus norvegicus (10116)
>5MOX_1|Chains A, B|Beta-lactamase OXA-10|Pseudomonas aeruginosa (287)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4FLN , Knot 219 539 0.85 40 255 507
NAESSNPPQKMAFKAFGSPKKEKKESLSDFSRDQQTDPAKIHDASFLNAVVKVYCTHTAPDYSLPWQKQRQFTSTGSAFMIGDGKLLTNAHCVEHDTQVKVKRRGDDRKYVAKVLVRGVDCDIALLSVESEDFWKGAEPLRLGHLPRLQDSVTVVGYPLGGDTISVTKGVVSRIEVTSYAHGSSDLLGIQIDAAINPGNSGGPAFNDQGECIGVAFQVYRSEETENIGYVIPTTVVSHFLTDYERNGKYTGYPCLGVLLQKLENPALRECLKVPTNEGVLVRRVEPTSDASKVLKEGDVIVSFDDLHVGCEGTVPFRSSERIAFRYLISQKFAGDIAEIGIIRAGEHKKVQVVLRPRVHLVPYHIDGGQPSYIIVAGLVFTPLSEPLIEEECEDTIGLKLLTKARYSVARFRGEQIVILSQVLANEVNIGYEDMNNQQVLKFNGIPIRNIHHLAHLIDMCKDKYLVFEFEDNYVAVLEREASNSASLCILKDYGIPSERSADLLEPYVDPIDDTQALDQGIGDSPVSNLEIGFDGLVWA
1QWC , Knot 185 420 0.88 40 263 409
GPRFLKVKNWETDVVLTDTLHLKSTLETGCTEHICMGSIMLPSQHTRKPEDVRTKDQLFPLAKEFLDQYYSSIKRFGSKAHMDRLEEVNKEIESTSTYQLKDTELIYGAKHAWRNASRCVGRIQWSKLQVFDARDCTTAHGMFNYICNHVKYATNKGNLRSAITIFPQRTDGKHDFRVWNSQLIRYAGYKQPDGSTLGDPANVQFTEICIQQGWKAPRGRFDVLPLLLQANGNDPELFQIPPELVLEVPIRHPKFDWFKDLGLKWYGLPAVSNMLLEIGGLEFSACPFSGWYMGTEIGVRDYCDNSRYNILEEVAKKMDLDMRKTSSLWKDQALVEINIAVLYSFQSDKVTIVDHHSATESFIKHMENEYRCRGGCPADWVWIVPPMSGSITPVFHQEMLNYRLTPSFEYQPDPWNTHVW
5MOX , Knot 115 247 0.85 40 168 239
MGSITENTSWNKEFSAEAVNGVFVLCKSSSKSCATNDLARASKEYLPASTFKIPNAIIGLETGVIKNEHQVFKWDGKPRAMKQWERDLTLRGAIQVSAVPVFQQIAREVGEVRMQKYLKKFSYGNQNISGGIDKFWLEGQLRISAVNQVEFLESLYLNKLSASKENQLIVKEALVTEAAPEYLVHSKTGFSGVGTESNPGVAWWVGWVEKETEVYFFAFNMDIDNESKLPLRKSIPTKIMESEGIIG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4FLN_1)}(2) \setminus P_{f(1QWC_1)}(2)|=71\), \(|P_{f(1QWC_1)}(2) \setminus P_{f(4FLN_1)}(2)|=79\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01000011001110111010000000100100000001101001011011101000001100011100000100010111110101100100100000101000100000110111011000111101000011011011011011010001011101111001010011100101000101000111101011101100111110001001111101000000001101110011001100000010001010111110010011100010110001111001010001001100101110100101100101110000011100110001110110111101100001011101010111001011010011111111011001110000000111011001000110101001111001110010110001000011010111100100110110100000111010000111100010001010110001110000101101010110000110011100110010111011111
Pair \(Z_2\) Length of longest common subsequence
4FLN_1,1QWC_1 150 4
4FLN_1,5MOX_1 177 3
1QWC_1,5MOX_1 195 4

Newick tree

 
[
	5MOX_1:98.40,
	[
		4FLN_1:75,1QWC_1:75
	]:23.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{959 }{\log_{20} 959}-\frac{420}{\log_{20}420})=142.\)
Status Protein1 Protein2 d d1/2
Query variables 4FLN_1 1QWC_1 179 162
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]