CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1ITQ_1 1BTK_1 5EKJ_1 Letter Amino acid
26 11 7 R Arginine
20 16 13 E Glutamic acid
39 13 26 L Leucine
10 3 12 H Histidine
8 2 2 M Methionine
5 2 7 W Tryptophan
11 9 8 Y Tyrosine
16 10 11 Q Glutamine
13 8 12 F Phenylalanine
27 12 17 S Serine
20 6 12 T Threonine
10 15 24 K Lycine
16 11 17 P Proline
28 4 14 A Alanine
18 6 10 N Asparagine
26 5 19 D Aspartic acid
6 6 1 C Cysteine
26 7 22 G Glycine
12 12 9 I Isoleucine
32 11 17 V Valine

1ITQ_1|Chains A, B|RENAL DIPEPTIDASE|Homo sapiens (9606)
>1BTK_1|Chains A, B|BRUTON'S TYROSINE KINASE|Homo sapiens (9606)
>5EKJ_1|Chain A|Carbonic anhydrase 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1ITQ , Knot 160 369 0.85 40 214 353
DFFRDEAERIMRDSPVIDGHNDLPWQLLDMFNNRLQDERANLTTLAGTHTNIPKLRAGFVGGQFWSVYTPCDTQNKDAVRRTLEQMDVVHRMCRMYPETFLYVTSSAGIRQAFREGKVASLIGVEGGHSIDSSLGVLRALYQLGMRYLTLTHSCNTPWADNWLVDTGDSEPQSQGLSPFGQRVVKELNRLGVLIDLAHVSVATMKATLQLSRAPVIFSHSSAYSVCASRRNVPDDVLRLVKQTDSLVMVNFYNNYISCTNKANLSQVADHLDHIKEVAGARAVGFGGDFDGVPRVPEGLEDVSKYPDLIAELLRRNWTEAEVKGALADNLLRVFEAVEQASNLTQAPEEEPIPLDQLGGSCRTHYGYSS
1BTK , Knot 80 169 0.81 40 132 165
AAVILESIFLKRSQQKKKTSPLNFKKCLFLLTVHKLSYYEYDFERGRRGSKKGSIDVEKITCVETVVPEKNPPPERQIPRRGEESSEMEQISIIERFPYPFQVVYDEGPLYVFSPTEELRKRWIHQLKNVIRYNSDLVQKYHPCFWIDGQYLCCSQTAKNAMGCQILEN
5EKJ , Knot 112 260 0.79 40 176 249
MAHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPPLLECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1ITQ_1)}(2) \setminus P_{f(1BTK_1)}(2)|=148\), \(|P_{f(1BTK_1)}(2) \setminus P_{f(1ITQ_1)}(2)|=66\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011000100110001110100011101101100010000101001110000110101111110110100100000001100010010110010010100110100011100110010110111101100100011110110011100101000000111001110010001000110111001100100111110110101101010101001111100001001010000110011011000001111010000100000101001100100100111101111110101110110110010001011101100010010101111001101101100100100110001111001110000001000
Pair \(Z_2\) Length of longest common subsequence
1ITQ_1,1BTK_1 214 3
1ITQ_1,5EKJ_1 170 3
1BTK_1,5EKJ_1 172 4

Newick tree

 
[
	1BTK_1:10.77,
	[
		1ITQ_1:85,5EKJ_1:85
	]:15.77
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{538 }{\log_{20} 538}-\frac{169}{\log_{20}169})=107.\)
Status Protein1 Protein2 d d1/2
Query variables 1ITQ_1 1BTK_1 141 102
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]