CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7NJH_1 6KOJ_1 6RUR_1 Letter Amino acid
8 9 5 H Histidine
4 8 3 F Phenylalanine
15 12 20 S Serine
12 4 14 A Alanine
2 2 26 C Cysteine
13 15 12 L Leucine
5 7 10 K Lycine
1 2 11 W Tryptophan
4 3 5 Y Tyrosine
11 11 14 R Arginine
3 5 4 N Asparagine
8 4 0 I Isoleucine
20 14 8 V Valine
12 4 4 D Aspartic acid
11 13 16 Q Glutamine
6 13 15 E Glutamic acid
16 6 27 G Glycine
6 1 1 M Methionine
7 5 25 P Proline
12 6 14 T Threonine

7NJH_1|Chain A|Woronin body major protein|Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) (367110)
>6KOJ_1|Chains A, B|Sorting nexin-11|Homo sapiens (9606)
>6RUR_1|Chains A[auth U], C[auth X]|Properdin|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7NJH , Knot 84 176 0.82 40 132 171
MGYYDDDAHGHVEADAAPRATTGTGTGSASQTVTIPCHHIRLGDILILQGRPCQVIRISTSAATGQHRYLGVDLFTKQLHEESSFVSNPAPSVVVQTMLGPVFKQYRVLDMQDGSIVAMTETGDVKQNLPVIDQSSLWNRLQKAFESGRGSVRVLVVSDHGREMAVDMKVVHGSRL
6KOJ , Knot 70 144 0.80 40 106 135
MSENQEQEEVITVRVQDPRVQNEGSWNSYVDYKIFLHTNSKAFTAKTSCVRRRYREFVWLRKQLQRNAGLVPVPELPGKSTFFGTSDEFIEKRRQGLQHFLEKVLQSVVLLSDSQLHLFLQSQLSVPEIEACVQGRLEHHHHHH
6RUR , Knot 100 234 0.77 38 147 218
DPVLCFTQYEESSGKCKGLLGGGVSVEDCCLNTAFAYQKRSGGLCQPCRSPRWSLWSTWAPCSVTCSEGSQLRYRRCVGWNGQCSGKVAPGTLEWQLQACEDQQCCPEMGGWSGWGPWEPCSVTCSKGTRTRRRACNHPAPKCGGHCPGQAQESEACDTQQVCPTHGAWATWGPWTPCSASCHGGPHEPKETRSRKCSAPEPSQKPPGKPCPGLAYEQRRCTGLPPCPENLYFQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7NJH_1)}(2) \setminus P_{f(6KOJ_1)}(2)|=89\), \(|P_{f(6KOJ_1)}(2) \setminus P_{f(7NJH_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000001010101011101001010101000101100010110111101010011010001101000011101100010000011001110111001111110000110100101111000101000111100001100100110010101011110001001110101101001
Pair \(Z_2\) Length of longest common subsequence
7NJH_1,6KOJ_1 152 3
7NJH_1,6RUR_1 169 4
6KOJ_1,6RUR_1 171 3

Newick tree

 
[
	6RUR_1:87.79,
	[
		7NJH_1:76,6KOJ_1:76
	]:11.79
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{320 }{\log_{20} 320}-\frac{144}{\log_{20}144})=53.9\)
Status Protein1 Protein2 d d1/2
Query variables 7NJH_1 6KOJ_1 71 62.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]