CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9IVP_1 6YUN_1 7BIG_1 Letter Amino acid
14 4 0 Y Tyrosine
17 5 14 N Asparagine
24 8 21 D Aspartic acid
4 0 0 C Cysteine
8 7 7 F Phenylalanine
10 6 7 R Arginine
12 10 21 Q Glutamine
37 4 0 E Glutamic acid
47 5 28 L Leucine
3 2 7 W Tryptophan
26 12 14 K Lycine
8 8 7 P Proline
14 12 42 S Serine
10 11 35 T Threonine
7 2 1 M Methionine
18 3 21 V Valine
41 12 21 A Alanine
31 10 28 G Glycine
21 8 7 H Histidine
18 6 7 I Isoleucine

9IVP_1|Chains A, AA[auth BA], C, CA[auth DA], E, EA[auth FA], G, GA[auth HA], I, IA[auth JA], K, KA[auth LA], M, MA[auth NA], O, OA[auth PA], Q, QA[auth RA], S, SA[auth TA], UA[auth VA], U[auth V], W[auth X], Y[auth Z]|DARPin,Ferritin heavy chain, N-terminally processed|synthetic construct (32630)
>6YUN_1|Chains A, B|Nucleoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049)
>7BIG_1|Chains A, B|v13WRAP-T|synthetic construct (32630)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9IVP , Knot 137 370 0.73 40 183 305
MGHHHHHHGPGSAKEEILEEIKKAKQEIAGGGGGSELGKELLEAARAGQDDEVRILMARGAEVNAADNTGTTPLHLAAYSGHLEIVEVLLKYGAEVNAADVFGYTPLHLAAYWGHLEIVEVLLKNGADVNARDSDGMTPLHLAAKWGHLEIVEVLLRYGADVEAQDKFGKTPFDLAIDNGNEDIAEVLQALLAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFLHQSHEEREHAEKLMKLQNQRGGAISLQDIKKPDCDDWESGLNAMECALHLEKNVNQSLLELHKLATDCNDPHLCDFIETHYLNEQVKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGSGSGAEIEQAKKEIAYLIKK
6YUN , Knot 67 135 0.81 38 105 129
GSSHHHHHHSQDPNSSSTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFP
7BIG , Knot 27 288 0.17 34 37 42
MGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRNGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRNGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRNGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRNGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRNGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRNGQLLQTLTGHSSSVTGVAFSPDGQTIASASDDKTVKLWNRN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9IVP_1)}(2) \setminus P_{f(6YUN_1)}(2)|=124\), \(|P_{f(6YUN_1)}(2) \setminus P_{f(9IVP_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000011101000110010010001111111001100110110110000101111011010110001001101110010101101110011010110111001101110110101101110011010100001101101110110101101110011010100011001101110010001101101111100010101010010101000100001110011001100000000010011010000111101001001000010011011001101000100011010011000001010011000010001011001100100100111100111001100001101011010010001101100
Pair \(Z_2\) Length of longest common subsequence
9IVP_1,6YUN_1 170 6
9IVP_1,7BIG_1 176 2
6YUN_1,7BIG_1 112 3

Newick tree

 
[
	9IVP_1:94.51,
	[
		6YUN_1:56,7BIG_1:56
	]:38.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{505 }{\log_{20} 505}-\frac{135}{\log_{20}135})=109.\)
Status Protein1 Protein2 d d1/2
Query variables 9IVP_1 6YUN_1 123 85
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]