CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7RDM_1 4ONU_1 7EMW_1 Letter Amino acid
33 9 26 S Serine
17 39 9 V Valine
7 30 2 R Arginine
5 1 0 C Cysteine
12 21 1 E Glutamic acid
12 4 3 K Lycine
12 23 7 P Proline
18 19 12 T Threonine
2 4 3 W Tryptophan
11 37 21 A Alanine
8 23 16 D Aspartic acid
13 11 8 Q Glutamine
3 4 5 H Histidine
19 32 17 L Leucine
8 6 8 Y Tyrosine
8 6 6 N Asparagine
15 26 25 G Glycine
5 19 7 I Isoleucine
0 10 2 M Methionine
8 16 7 F Phenylalanine

7RDM_1|Chain A|PCDN-38B Fab light chain|Homo sapiens (9606)
>4ONU_1|Chain A|Acetyltransferase Pat|Mycobacterium smegmatis (246196)
>7EMW_1|Chain A|Heme acquisition protein HasAp|Pseudomonas aeruginosa str. PAO1 (208964)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7RDM , Knot 98 216 0.81 38 146 203
VHEIVLTQSPGTLSLSPGETASLSCRASQSVSDKNLAWYQQRPGLPPRLLIYGVSLKNTGVPDRFSGSGSGTNFTLTITSLESEDSAVYFCQQYGSSPTFGQGTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC
4ONU , Knot 141 340 0.80 40 181 316
GAMDPGNVAELTEVRAADLAALEFFTGCRPSALEPLATQLRPLKAEPGQVLIRQGDPALTFMLIESGRVQVSHAVADGPPIVLDIEPGLIIGEIALLRDAPRTATVVAAEPVIGWVGDRDAFDTILHLPGMFDRLVRIARQRLAAFITPIPVQVRTGEWFYLRPVLPGDVERTLNGPVEFSSETLYRRFQSVRKPTRALLEYLFEVDYADHFVWVMTEGALGPVIADARFVREGHNATMAAVAFTVGDDYQGRGIGSFLMGALIVSANYVGVQRFNARVLTDNMAMRKIMDRLGAVWVREDLGVVMTEVDVPPVDTVPFEPELIDQIRDATRKVIRAVSQ
7EMW , Knot 84 185 0.79 38 122 175
GSMSISISYSTTYSGWTVADYLADWSAYFGDVNHRPGQGVDGSNTGGFNPGPFDGSQYALQSTASDAAFIAGGDLHYTLFSNPSHTLWGKLDSIALGDTLTGGASSGGYALDSQEVSFSNLGLDSPIAQGRDGTVHKVVYGLMSGDSSALQGQIDALLKAVDPSLSINSTFDQLAAAGVAHATPA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7RDM_1)}(2) \setminus P_{f(4ONU_1)}(2)|=66\), \(|P_{f(4ONU_1)}(2) \setminus P_{f(7RDM_1)}(2)|=101\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111000110101011001010001000100001110000111110111011010001110010101010010101001000001101000010010110100101000111101111110000100101011011001010010101010011001000001000000000001000101001000000101001000110011000100100
Pair \(Z_2\) Length of longest common subsequence
7RDM_1,4ONU_1 167 3
7RDM_1,7EMW_1 144 4
4ONU_1,7EMW_1 161 4

Newick tree

 
[
	4ONU_1:85.09,
	[
		7RDM_1:72,7EMW_1:72
	]:13.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{556 }{\log_{20} 556}-\frac{216}{\log_{20}216})=97.3\)
Status Protein1 Protein2 d d1/2
Query variables 7RDM_1 4ONU_1 124 101.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]