CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7UUR_1 2NSP_1 6EGA_1 Letter Amino acid
10 3 9 H Histidine
20 16 12 F Phenylalanine
14 5 1 W Tryptophan
25 15 9 Y Tyrosine
10 2 5 C Cysteine
15 9 11 Q Glutamine
33 6 27 E Glutamic acid
25 18 19 I Isoleucine
39 18 28 L Leucine
7 2 11 M Methionine
44 33 22 A Alanine
21 20 15 N Asparagine
33 26 21 D Aspartic acid
18 22 23 K Lycine
32 11 9 P Proline
30 39 22 S Serine
28 16 10 R Arginine
43 29 20 G Glycine
25 32 17 T Threonine
41 20 21 V Valine

7UUR_1|Chains A[auth C], C[auth F]|Hydrogenase-2, large subunit|Mycolicibacterium smegmatis MC2 155 (246196)
>2NSP_1|Chains A, B|Pectinesterase A|Erwinia chrysanthemi (198628)
>6EGA_1|Chains A, B|Interleukin-1 receptor-associated kinase 4|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7UUR , Knot 208 513 0.84 40 257 480
LDLFVSPLGRVEGDLDVRVTINDGVVTSAWTEAAMFRGFEIILRGKDPQAGLIVCPRICGICGGSHLYKSAYALDTAWRTHMPPNATLIRNICQACETLQSIPRYFYALFAIDLTNKNYAKSKLYDEAVRRFAPYVGTSYQPGVVLSAKPVEVYAIFGGQWPHSSFMVPGGVMSAPTLSDVTRAIAILEHWNDNWLEKQWLGCSVDRWLENKTWNDVLAWVDENESQYNSDCGFFIRYCLDVGLDKYGQGVGNYLATGTYFEPSLYENPTIEGRNAALIGRSGVFADGRYFEFDQANVTEDVTHSFYEGNRPLHPFEGETIPVNPEDGRRQGKYSWAKSPRYAVPGLGNVPLETGPLARRMAASAPDAETHQDDDPLFADIYNAIGPSVMVRQLARMHEGPKYYKWVRQWLDDLELKESFYTKPVEYAEGKGFGSTEAARGALSDWIVIEDSKIKNYQVVTPTAWNIGPRDASEVLGPIEQALVGSPIVDAEDPVELGHVARSFDSCLVCTVH
2NSP , Knot 145 342 0.82 40 185 321
ATTYNAVVSKSSSDGKTFKTIADAIASAPAGSTPFVILIKNGVYNERLTITRNNLHLKGESRNGAVIAAATAAGTLKSDGSKWGTAGSSTITISAKDFSAQSLTIRNDFDFPANQAKSDSDSSKIKDTQAVALYVTKSGDRAYFKDVSLVGYQATLYVSGGRSFFSDCRISGTVDFIFGDGTALFNNCDLVSRYRADVKSGNVSGYLTAPSTNINQKYGLVITNSRVIRESDSVPAKSYGLGRPWHPTTTFSDGRYADPNAIGQTVFLNTSMDNHIYGWDKMSGKDKNGNTIWFNPEDSRFFEYKSYGAGATVSKDRRQLTDAQAAEYTQSKVLGDWTPTLP
6EGA , Knot 135 312 0.82 40 193 299
GAMGSENKSLEVSDTRFHSFSFYELKNVTNNFDERPISVGGNKMGEGGFGVVYKGYVNNTTVAVKKLAAMVDITTEELKQQFDQEIKVMAKCQHENLVELLGFSSDGDDLCLVYVYMPNGSLLDRLSCLDGTPPLSWHMRCKIAQGAANGINFLHENHHIHRDIKSANILLDEAFTAKISDFGLARASEKFAQTVMTSRIVGTTAYMAPEALRGEITPKSDIYSFGVVLLEIITGLPAVDEHREPQLLLDIKEEIEDEEKTIEDYIDKKMNDADSTSVEAMYSVASQCLHEKKNKRPDIKKVQQLLQEMTAS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7UUR_1)}(2) \setminus P_{f(2NSP_1)}(2)|=118\), \(|P_{f(2NSP_1)}(2) \setminus P_{f(7UUR_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101110111010101010101001110011001111011011101001011111010101101100100010110011000111010110010010001001100101111101000001000100011001110110000111110101101011111011000111111110110100100111110010001100011100100110000100111110000000000011110001011100010111001101001010100010101001111100111101001010010100010001001001101101001110100100010001100100111111011100111100111011010000000111101001111011100110100110000110011001010001000110010101110001101110011110000100001101011011100100111110011110111010011011011001000110010
Pair \(Z_2\) Length of longest common subsequence
7UUR_1,2NSP_1 164 4
7UUR_1,6EGA_1 176 4
2NSP_1,6EGA_1 176 4

Newick tree

 
[
	6EGA_1:89.91,
	[
		7UUR_1:82,2NSP_1:82
	]:7.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{855 }{\log_{20} 855}-\frac{342}{\log_{20}342})=138.\)
Status Protein1 Protein2 d d1/2
Query variables 7UUR_1 2NSP_1 178 147
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]