CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5HZL_1 7WOO_1 1NKK_1 Letter Amino acid
12 36 26 V Valine
4 15 5 H Histidine
6 65 6 K Lycine
22 36 11 T Threonine
7 15 3 M Methionine
23 23 14 P Proline
3 8 3 W Tryptophan
9 33 14 Q Glutamine
19 32 20 G Glycine
29 69 3 I Isoleucine
0 5 5 C Cysteine
32 108 27 L Leucine
26 45 21 A Alanine
1 38 23 R Arginine
27 67 3 N Asparagine
17 72 25 S Serine
12 34 8 Y Tyrosine
18 49 19 D Aspartic acid
23 59 13 E Glutamic acid
13 30 7 F Phenylalanine

5HZL_1|Chain A[auth B]|Lmo2445 protein|Listeria monocytogenes EGD-e (169963)
>7WOO_1|Chains A, L[auth Z]|Nucleoporin NIC96|Saccharomyces cerevisiae (4932)
>1NKK_1|Chains A, B, C, D|Capsid protein P40|Human herpesvirus 5 (10359)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5HZL , Knot 125 303 0.78 38 168 283
MLFAPTIKAQADTVPLPAPIIEAFPVEAIAEAIAGELDKDSVNDTITQADLDTMTAIPLPSLGLTGEDLSVLNNEVFTNAIELAIWSNNIGELPDLSEALPALENIEANGANITVFPDANYPNLTNVDLSQNNFGFNIPKFVGMEGLVSINMENAGLSGYIAEDIWMNMPNLDSLILNENHLISIPEDIFLSQQLGTHSFANQTATYPPTTIKQGENLKVFVPFIYQALDFIAPSNHDLIIIKDNGRTLYEPPYPTYDGSYMYTIETAGLQPGEHLLEISLGYNSGEYTGWYDFPVTITESNA
7WOO , Knot 314 839 0.84 40 282 742
MLETLRGNKLHSGTSKGANKKLNELLESSDNLPSASSELGSIQVSINELRRRVFQLRSKNKASKDYTKAHYLLANSGLSFEDVDAFIKDLQTNQFLEPNPPKIIESEELEFYIRTKKEENILMSIEQLLNGATKDFDNFINHNLNLDWAQHKNEVMKNFGILIQDKKTVDHKKSISSLDPKLPSWGNKGNNILNSNESRLNVNENNILREKFENYARIVFQFNNSRQANGNFDIANEFISILSSANGTRNAQLLESWKILESMKSKDINIVEVGKQYLEQQFLQYTDNLYKKNMNEGLATNVNKIKSFIDTKLKKADKSWKISNLTVINGVPIWALIFYLLRAGLIKEALQVLVENKANIKKVEQSFLTYFKAYASSKDHGLPVEYSTKLHTEYNQHIKSSLDGDPYRLAVYKLIGRCDLSRKNIPAVTLSIEDWLWMHLMLIKEKDAENDPVYERYSLEDFQNIIISYGPSRFSNYYLQTLLLSGLYGLAIDYTYTFSEMDAVHLAIGLASLKLFKIDSSTRLTKKPKRDIRFANILANYTKSFRYSDPRVAVEYLVLITLNEGPTDVELCHEALRELVLETKEFTVLLGKIGRDGARIPGVIEERQPLLHVRDEKEFLHTITEQAARRADEDGRIYDSILLYQLAEEYDIVITLVNSLLSDTLSASDLDQPLVGPDDNSETNPVLLARRMASIYFDNAGISRQIHVKNKEICMLLLNISSIRELYFNKQWQETLSQMELLDLLPFSDELSARKKAQDFSNLDDNIVKNIPNLLIITLSCISNMIHILNESKYQSSTKGQQIDSLKNVARQCMIYAGMIQYRMPRETYSTLINIDVSL
1NKK , Knot 109 256 0.78 40 154 241
MTMDEQQSQAVAPVYVGGFLARYDQSPDEARLLLPRDVVEHWLHAQGQGQPSLSVALPLNINHDDTAVVGHVAAMQSVRDGLFCLGCVTSPRFLEIVRRASEKSELVSRGPVSPLQPDKVVEFLSGSYAGLSLSSRRCDDVEQATSLSGSETTPFKHVALCSVGRRRGTLAVYGRDPEWVTQRFPDLTAADRDGLRAQWQRCGSTAVDASGDPFRSDSYGLLGNSVDALYIRERLPKLRYDKQLVGVTERESYVKA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5HZL_1)}(2) \setminus P_{f(7WOO_1)}(2)|=24\), \(|P_{f(7WOO_1)}(2) \setminus P_{f(5HZL_1)}(2)|=138\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111110101010011111111011110111011110100001000100101001011111011101001011000110011011110001101101001111100101011010111010010100101000011101101111011101010011101011001110110100111000011011001110001100011000100110010010010111111001101111000011110001001001101000100100100111011001101011000100011001110100001
Pair \(Z_2\) Length of longest common subsequence
5HZL_1,7WOO_1 162 4
5HZL_1,1NKK_1 168 4
7WOO_1,1NKK_1 196 4

Newick tree

 
[
	1NKK_1:94.44,
	[
		5HZL_1:81,7WOO_1:81
	]:13.44
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1142 }{\log_{20} 1142}-\frac{303}{\log_{20}303})=222.\)
Status Protein1 Protein2 d d1/2
Query variables 5HZL_1 7WOO_1 279 189.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]