CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4NHG_1 5YQL_1 4ZAO_1 Letter Amino acid
13 17 19 D Aspartic acid
23 22 22 G Glycine
4 3 7 W Tryptophan
26 13 18 V Valine
9 15 7 R Arginine
7 3 9 N Asparagine
4 11 12 Q Glutamine
4 7 12 H Histidine
17 16 17 P Proline
8 11 8 Y Tyrosine
5 14 8 I Isoleucine
18 39 26 L Leucine
3 8 2 M Methionine
7 17 11 F Phenylalanine
19 11 13 T Threonine
15 20 13 A Alanine
7 11 1 C Cysteine
8 24 13 E Glutamic acid
13 19 23 K Lycine
33 25 19 S Serine

4NHG_1|Chains A, D, E, H, M[auth I], P[auth M]|2G12 IgG dimer heavy chain|Homo sapiens (9606)
>5YQL_1|Chain A|NAD-dependent protein deacetylase sirtuin-2|Homo sapiens (9606)
>4ZAO_1|Chain A|Carbonic anhydrase 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4NHG , Knot 107 243 0.80 40 152 229
EVQLVESGGGLVKAGGSLILSCGVSNFRISAHTMNWVRRVPGGGLEWVASISTSSTYRDYADAVKGRFTVSRDDLEDFVYLQMHKMRVEDTAIYYCARKGSDRLSDNDPFDAWGPGTVVTVSPASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTSTCPPCPAPELLGG
5YQL , Knot 135 306 0.84 40 192 290
GASGSERLLDELTLEGVARYMQSERCRRVICLVGAGISTSAGIPDFRSPSTGLYDNLEKYHLPYPEAIFEISYFKKHPEPFFALAKELYPGQFKPTICHYFMRLLKDKGLLLRCYTQNIDTLERIAGLEQEDLVEAHGTFYTSHCVSASCRHEYPLSWMKEKIFSEVTPKCEDCQSLVKPDIVFFGESLPARFFSCMQSDFLKVDLLLVMGTSLQVQPFASLISKAPLSTPRLLINKEKAGQSDPFLGMIMGLGGGMDFDSKKAYRDVAWLGECDQGCLALAELLGWKKELEDLVRREHASIDAQS
4ZAO , Knot 112 260 0.79 40 171 249
MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHSFQVTFDDSQDKAVLKGGPLDGTYRLLQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDVGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTEGKSADFTNFDPRGLLPESLDYWTYPGSLTTPPLAECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4NHG_1)}(2) \setminus P_{f(5YQL_1)}(2)|=64\), \(|P_{f(5YQL_1)}(2) \setminus P_{f(4NHG_1)}(2)|=104\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010110011111011101110011001010100101100111111011101000000000101101010100001001101010010100011000100100010000110111110110101100011011111100000011011110110001101101010011100110011111000110010011011000110000100100010000100010100000000011011101111
Pair \(Z_2\) Length of longest common subsequence
4NHG_1,5YQL_1 168 4
4NHG_1,4ZAO_1 165 4
5YQL_1,4ZAO_1 181 4

Newick tree

 
[
	5YQL_1:88.85,
	[
		4NHG_1:82.5,4ZAO_1:82.5
	]:6.35
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{549 }{\log_{20} 549}-\frac{243}{\log_{20}243})=87.1\)
Status Protein1 Protein2 d d1/2
Query variables 4NHG_1 5YQL_1 111 96.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]