CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7OUP_1 3AZH_1 1GCP_1 Letter Amino acid
6 2 1 C Cysteine
58 8 7 G Glycine
27 7 2 I Isoleucine
8 3 2 M Methionine
39 4 4 F Phenylalanine
63 18 4 A Alanine
44 18 3 R Arginine
37 4 1 D Aspartic acid
52 6 1 S Serine
33 12 2 K Lycine
31 3 3 Y Tyrosine
19 1 5 N Asparagine
66 7 7 E Glutamic acid
10 3 4 H Histidine
43 6 5 V Valine
35 9 1 Q Glutamine
35 6 9 P Proline
10 0 3 W Tryptophan
84 12 3 L Leucine
37 10 3 T Threonine

7OUP_1|Chain A|Dipeptidyl peptidase 3|Homo sapiens (9606)
>3AZH_1|Chains A, E|Histone H3.1|Homo sapiens (9606)
>1GCP_1|Chains A, B, C, D|VAV PROTO-ONCOGENE|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7OUP , Knot 285 737 0.85 40 277 664
MADTQYILPNDIGVSSLDCREAFRLLSPTERLYAYHLSRAAWYGGLAVLLQTSPEAPYIYALLSRLFRAQDPDQLRQHALAEGLTEEEYQAFLVYAAGVYSNMGNYKSFGDTKFVPNLPKEKLERVILGSEAAQQHPEEVRGLWQTCGELMFSLEPRLRHLGLGKEGITTYFSGNCTMEDAKLAQDFLDSQNLSAYNTRLFKEVDGEGKPYYEVRLASVLGSEPSLDSEVTSKLKSYEFRGSPFQVTRGDYAPILQKVVEQLEKAKAYAANSHQGQMLAQYIESFTQGSIEAHKRGSRFWIQDKGPIVESYIGFIESYRDPFGSRGEFEGFVAVVNKAMSAKFERLVASAEQLLKELPWPPTFEKDKFLTPDFTSLDVLTFAGSGIPAGINIPNYDDLRQTEGFKNVSLGNVLAVAYATQREKLTFLEEDDKDLYILWKGPSFDVQVGLHALLGHGSGKLFVQDEKGAFNFDQETVINPETGEQIQSWYRSGETWDSKFSTIASSYEECRAESVGLYLCLHPQVLEIFGFEGADAEDVIYVNWLNMVRAGLLALEFYTPEAFNWRQAHMQARFVILRVLLEAGEGLVTITPTTGSDGRPDARVRLDRSKIRSVGKPALERFLRRLQVLKSTGDVAGGRALYEGYATVTDAPPECFLTLRDTVLLRKESRKLIVQPNTRLEGSDVQLLEYEASAAGLIRSFSERFPEDGPELEEILTQLATADARFWKGPSEAPSGQA
3AZH , Knot 68 139 0.80 38 106 131
GSHMARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSSAVMALQEACEAYLVGLFEDTNLCAIHAKRVTIMPQDIQLARRIRGERA
1GCP , Knot 41 70 0.83 40 63 67
GSHMPKMEVFQEYYGIPPPPGAFGPFLRLNPGDIVELTKAEAEHNWWEGRNTATNEVGWFPCNRVHPYVH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7OUP_1)}(2) \setminus P_{f(3AZH_1)}(2)|=191\), \(|P_{f(3AZH_1)}(2) \setminus P_{f(7OUP_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000011100111001000011011010001010010011101111111000101101011100110100100100011101100000011110111100011000011000111011000100111100110001001011100010111010101001111001100010100010010110011000010100001100101010100010110111001010001000100001010110100100111100110010010101100001011100100100101010001001110001111000111100000111001010111111001101010011101001100111110100001101010010110111011111101100001000011001011011111010000010110000001011101101010111011110101011100001110100001101001001001000100100010011000000010011101010101101111011010011010110110111111010010110100101010111101110110111010100100101010101000010011011100110010110001011110110010101001110011010001110000001110100010100101100010111110010001100110100110011010101101100110101
Pair \(Z_2\) Length of longest common subsequence
7OUP_1,3AZH_1 211 4
7OUP_1,1GCP_1 248 3
3AZH_1,1GCP_1 129 4

Newick tree

 
[
	7OUP_1:12.60,
	[
		3AZH_1:64.5,1GCP_1:64.5
	]:63.10
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{876 }{\log_{20} 876}-\frac{139}{\log_{20}139})=205.\)
Status Protein1 Protein2 d d1/2
Query variables 7OUP_1 3AZH_1 261 153.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]