CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5TDV_1 8OFR_1 6FAQ_1 Letter Amino acid
24 9 3 F Phenylalanine
23 8 2 P Proline
27 16 4 S Serine
4 2 0 C Cysteine
15 0 7 H Histidine
20 7 2 Q Glutamine
33 7 6 G Glycine
24 12 3 I Isoleucine
25 15 6 K Lycine
22 16 6 T Threonine
23 3 1 W Tryptophan
45 10 14 A Alanine
35 11 13 D Aspartic acid
22 8 9 V Valine
34 20 14 L Leucine
23 12 5 Y Tyrosine
26 4 9 R Arginine
38 5 16 E Glutamic acid
16 20 6 N Asparagine
21 1 1 M Methionine

5TDV_1|Chains A, E[auth D]|Toluene-4-monooxygenase system protein A|Pseudomonas mendocina (300)
>8OFR_1|Chains A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X|Fiber|Human adenovirus 25 (46927)
>6FAQ_1|Chains A, B|DNA binding protein|Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) (64091)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5TDV , Knot 210 500 0.87 40 264 484
MAMHPRKDWYELTRATNWTPSYVTEEQLFPERMSGHMGIPLEKWESYDEPYKTSYPEYVSIQREKDAGAYSVKAALERAKIYENSDPGWISTLKSHYGAIAVGEYAAVTGEGRMARFSKAPGNRNMATFGMMDELRHGQLQLFFPHEYCKKDRQFDWAWRAYHSNEWAAIAAKHFFDDIITGRDAISVAIMLTFSFETGFTNMQFLGLAADAAEAGDYTFANLISSIATDESRHAQQGGPALQLLIENGKREEAQKKVDMAIWRAWRLFAVLTGPVMDYYTPLEDRSQSFKEFMYEWIIGQFERSLIDLGLDKPWYWDLFLKDIDELHHSYHMGVWYWRTTAWWNPAAGVTPEERDWLEEKYPGWNKRWGRCWDVITENVLDDRMDLVSPETLPSVCNMSQIPLVGVPGDDWNIEVFSLEHNGRLYHFGSEVDRWVFQQDPVQYQNHMNIVDRFLAGQIQPMTLDGALKYMGFQSIEEMGKDAHDFAWADKCKPAMKKSA
8OFR , Knot 84 186 0.78 38 125 177
KLTLWTTLDPSPNCRIDVDKDSKLTLVLTKCGSQILANVSLLVVKGRFQNLNYKTNPNLPKTFTIKLLFDENGILKDSSNLDKNYWNYRNGNSILAEQYKNAVGFMPNLAAYPKSTTTQSKLYARNTIFGNIYLDSQAYNPVVIKITFNQEADSAYSITLNYSWGKDYENIPFDSTSFTFSYIAQE
6FAQ , Knot 61 127 0.77 38 86 117
MSEAQPDARSDARDLTAFQKNILTVLGEEARYGLAIKRELEEYYGEEVNHGRLYPNLDDLVNKGLVEKSELDKRTNEYALTNEGFDAVVDDLEWTLSKFVADADRRERVETIVADDAAALEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5TDV_1)}(2) \setminus P_{f(8OFR_1)}(2)|=172\), \(|P_{f(8OFR_1)}(2) \setminus P_{f(5TDV_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11101000100100100101001000011100101011111001000001000001001010000011100101110010100000111100100001111110011101010110100111000110111100100101011110000000001011101000001111110011001101001101111101010011001011111101101100011011001100000010011111011100100001000101111011011111011110000110000001001100111101000110111001101011100100100000111101000111011111010000110000111000110010110001100010110100110100100111111110010101101000101001100100111000110000010110011110101101011100111001001100100111100001110001
Pair \(Z_2\) Length of longest common subsequence
5TDV_1,8OFR_1 205 3
5TDV_1,6FAQ_1 214 4
8OFR_1,6FAQ_1 143 3

Newick tree

 
[
	5TDV_1:11.72,
	[
		8OFR_1:71.5,6FAQ_1:71.5
	]:42.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{686 }{\log_{20} 686}-\frac{186}{\log_{20}186})=141.\)
Status Protein1 Protein2 d d1/2
Query variables 5TDV_1 8OFR_1 186 125.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]