CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7ZET_1 5OVN_1 8TZR_1 Letter Amino acid
22 24 10 N Asparagine
27 50 15 K Lycine
18 8 14 F Phenylalanine
10 22 11 Y Tyrosine
16 36 28 A Alanine
26 34 11 Q Glutamine
12 6 13 H Histidine
12 46 14 I Isoleucine
30 21 30 S Serine
10 4 25 C Cysteine
11 29 30 G Glycine
13 11 6 M Methionine
15 36 16 P Proline
25 24 14 T Threonine
4 18 9 W Tryptophan
41 48 22 E Glutamic acid
24 23 14 D Aspartic acid
37 53 20 L Leucine
24 18 21 V Valine
25 21 29 R Arginine

7ZET_1|Chain A|Clusterin|Homo sapiens (9606)
>5OVN_1|Chain A|POL protein|Feline immunodeficiency virus (11673)
>8TZR_1|Chain A|Protein Wnt-3a|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7ZET , Knot 172 402 0.85 40 237 383
DQTVSDNELQEMSNQGSKYVNKEIQNAVNGVKQIKTLIEKTNEERKTLLSNLEEAKKKKEDALNETRESETKLKELPGVCNETMMALWEECKPCLKQTCMKFYARVCRSGSGLVGRQLEEFLNQSSPFYFWMNGDRIDSLLENDRQQTHMLDVMQDHFSRASSIIDELFQDRFFTREPQDTYHYLPFSLPHNFHAMFQPFLEMIHEAQQAMDIHFHSPAFQHPPTEFIREGDDDRTVCREIRHNSTGCLRMKDQCDKCREILSVDCSTNNPSQAKLRRELDESLQVAERLTRKYNELLKSYQWKMLNTSSLLEQLNEQFNWVSRLANLTQGEDQYYLRVTTVASHTSDSDVPSGVTEVVVKLFDSDPITVTVPVEVSRKNPKFMETVAEKALQEYRKKHREE
5OVN , Knot 214 532 0.84 40 252 494
QIKQWPLTNEKIEALTEIVERLEREGKVKRADPNNPWNTPVFAIKKKSGKWRMLIDFRELNKLTEKGAEVQLGLPHPAGLQIKKQVTVLDIGDAYFTIPLDPDYAPYTAFTLPRKNNAGPGRRFVWCSLPQGWILSPLIYQSTLDNIIQPFIRQNPQLDIYQYMDDIYIGSNLSKKEHKEKVEELRKLLLWWGFETPEDKLQEEPPYTWMGYELHPLTWTIQQKQLDIPEQPTLNELQKLAGKINWASQAIPDLSIKALTNMMRGNQNLNSTRQWTKEARLEVQKAKKAIEEQVQLGYYDPSKELYAKLSLVGPHQISYQVYQKDPEKILWYGKMSRQKKKAENTCDIALRACYKIREESIIRIGKEPRYEIPTSREAWESNLINSPYLKAPPPEVEYIHAALNIKRALSMIKDAPIPGAETWYIDGGRKLGKAAKAAYWTDTGKWQVMELEGSNQKAEIQALLLALKAGSEEMNIITDSQYVINIILQQPDMMEGIWQEVLEELEKKTAIFIDWVPGHKGIPGNEEVDKLC
8TZR , Knot 155 352 0.86 40 216 338
MAPLGYFLLLCSLKQALGSYPIWWSLAVGPQYSSLGSQPILCASIPGLVPKQLRFCRNYVEIMPSVAEGIKIGIQECQHQFRGRRWNCTTVHDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTAAICGCSSRHQGSPGKGWKWGGCSEDIEFGGMVSREFADARENRPDARSAMNRHNNEAGRQAIASHMHLKCKCHGLSGSCEVKTCWWSQPDFRAIGDFLKDKYDSASEMVVEKHRESRGWVETLRPRYTYFKVPTERDLVYYEASPNFCEPNPETGSFGTRDRTCNVSSHGIDGCDLLCCGRGHNARAERRREKCRCVFHWCCYVSCQECTRVYDVHTCK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7ZET_1)}(2) \setminus P_{f(5OVN_1)}(2)|=80\), \(|P_{f(5OVN_1)}(2) \setminus P_{f(7ZET_1)}(2)|=95\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000100001001000100010001001101100100110000000001100100100000011000000000100111100001111100001010000101010100010111100100110000110111010010011000000001101100010010011001100011000100000011101100101110111011001001101010011100110011001000001000100000101010000000001101000000100101000100010110010000001100001011000011001000101100110100100000101001100000001101100111011000110101110100001011001100110000000000
Pair \(Z_2\) Length of longest common subsequence
7ZET_1,5OVN_1 175 4
7ZET_1,8TZR_1 199 4
5OVN_1,8TZR_1 192 3

Newick tree

 
[
	8TZR_1:10.95,
	[
		7ZET_1:87.5,5OVN_1:87.5
	]:13.45
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{934 }{\log_{20} 934}-\frac{402}{\log_{20}402})=141.\)
Status Protein1 Protein2 d d1/2
Query variables 7ZET_1 5OVN_1 179 158.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]