CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7NDU_1 4IQW_1 2RTP_1 Letter Amino acid
20 24 4 R Arginine
15 16 4 Q Glutamine
20 22 16 G Glycine
5 15 3 I Isoleucine
9 28 6 K Lycine
15 14 3 P Proline
12 10 6 Y Tyrosine
19 26 19 A Alanine
5 10 7 N Asparagine
5 7 0 C Cysteine
26 32 5 E Glutamic acid
21 38 7 L Leucine
4 11 0 M Methionine
8 22 2 F Phenylalanine
17 24 6 D Aspartic acid
14 9 2 H Histidine
16 26 12 S Serine
22 20 19 T Threonine
9 5 6 W Tryptophan
15 22 8 V Valine

7NDU_1|Chain A[auth AAA]|HLA class I histocompatibility antigen, alpha chain E|Homo sapiens (9606)
>4IQW_1|Chain A|DNA nucleotidylexotransferase|Mus musculus (10090)
>2RTP_1|Chains A[auth B], B[auth D]|STREPTAVIDIN|Streptomyces avidinii (1895)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7NDU , Knot 121 277 0.82 40 181 264
MGSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQCAYDGKDYLTLNEDLRSWTAVDTAAQISEQKSNDASEAEHQRAYLEDTCVEWLHKYLEKGKETLLHLEPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQQDGEGHTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPEPVTLRWKP
4IQW , Knot 162 381 0.84 40 224 363
HMSPSPVPGSQNVPAPAVKKISQYACQRRTTLNNYNQLFTDALDILAENDELRENEGSCLAFMRASSVLKSLPFPITSMKDTEGIPCLGDKVKSIIEGIIEDGESSEAKAVLNDERYKSFKLFTSVFGVGLKTAEKWFRMGFRTLSKIQSDKSLRFTQMQKAGFLYYEDLVSCVNRPEAEAVSMLVKEAVVTFLPDALVTMTGGFRRGKMTGHDVDFLITSPEATEDEEQQLLHKVTDFWKQQGLLLYCDILESTFEKFKQPSRKVDAADHFQKCFLILKLDHGRVHSEKSGQQEGKGWKAIRVDLVMCPYDRRAFALLGWTGSRQFERDLRRYATHERKMMLDNHALYDRTKRVFLEAESEEEIFAHLGLDYIEPWERNA
2RTP , Knot 64 135 0.77 36 94 128
DPSKDSKAQVSAAEAGITGTWYNQLGSTFIVTAGADGALTGTYESAVGNAESRYVLTGRYDSAPATDGSGTALGWTVAWKNNYRNAHSATTWSGQYVGGAEARINTQWLLTSGTTEANAWKSTLVGHDTFTKVKP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7NDU_1)}(2) \setminus P_{f(4IQW_1)}(2)|=66\), \(|P_{f(4IQW_1)}(2) \setminus P_{f(7NDU_1)}(2)|=109\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100010010001001101010110110100001101000110101110111100010001000000100010110101001010000001100010110100111010110100001001000101000100101100110100000001001000010100001011000100100011010110001000110000101001111101101010100010100000011000111010100111111101000000001000111011010101
Pair \(Z_2\) Length of longest common subsequence
7NDU_1,4IQW_1 175 3
7NDU_1,2RTP_1 173 3
4IQW_1,2RTP_1 188 4

Newick tree

 
[
	4IQW_1:92.19,
	[
		7NDU_1:86.5,2RTP_1:86.5
	]:5.69
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{658 }{\log_{20} 658}-\frac{277}{\log_{20}277})=106.\)
Status Protein1 Protein2 d d1/2
Query variables 7NDU_1 4IQW_1 139 118.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]