CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4CIT_1 8OJN_1 2MEW_1 Letter Amino acid
33 14 7 S Serine
24 21 3 T Threonine
31 12 6 V Valine
3 0 0 C Cysteine
31 9 10 I Isoleucine
29 1 10 K Lycine
18 7 4 R Arginine
31 7 6 D Aspartic acid
22 5 0 F Phenylalanine
25 8 8 L Leucine
13 3 2 M Methionine
29 4 6 P Proline
18 4 1 Y Tyrosine
12 0 1 Q Glutamine
29 3 5 E Glutamic acid
16 2 2 H Histidine
8 2 0 W Tryptophan
46 19 5 A Alanine
19 12 2 N Asparagine
21 19 4 G Glycine

4CIT_1|Chain A|VANADIUM-DEPENDENT HALOPEROXIDASE|ZOBELLIA GALACTANIVORANS (63186)
>8OJN_1|Chains A, B, C, D, E, F|Cell wall surface anchor family protein|Bdellovibrio bacteriovorus HD100 (264462)
>2MEW_1|Chain A|30S ribosomal protein S10|Thermotoga sp. (126740)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4CIT , Knot 189 458 0.84 40 251 432
HHHHHHGSMKKILIALISFAFAVSCKAPQKEEPINITPEELDASIDRVTEIMIHDIFSPPVASRIFAYPNVAAYEIVAATNDNYNSLAGQLNGLTAIPEPDTTKTINYELAAVVAHMELSKRLIFSEDRMESLRDSLYMVWEGKNPVLFSDSKAYGLQVADHIGEWMNKDNYAQTRTMPKFTVDADDPGRWQPTPPAYMDGIEPHWNKIRPFVLDSAAQFKPVPPPAYSLEEDSAFYKELKEVYDVRNKITEEGDSSEEIQIARFWDCNPYVSVTRGHLMFATKKITPGAHWMGIAKIAARKTNSDFAKTLFAYTKASVAMADAFISCWDEKYRSNLIRPETVINQHIDDSWKPVLQTPPFPEYTSGHSVVSGAASVVLTEVFGDNFSFDDDTEVPYGLPIRSFKSFKQAADEAAISRMYGGIHYRAAIEVGVKQGRDLGTFVVNKLHMLSDKKVAQN
8OJN , Knot 72 152 0.79 36 101 147
GGTNDAFSLSFETNNTPRMTIDDVGRVGVGTTAPTSALHVIGTGEVARFVTSATGGVVIDSTALNYNPSLIYRKTNINRWSMMVNAASETGGNAGSNLSILRYDDTGATLGAAVTIDRASGFFGINTAAPAYNIHVTGTAGLSTGSAWTVAS
2MEW , Knot 45 82 0.80 34 71 79
SMGGQKIRIKLKAYDHELLDESAKKIVEVAKSTNSKVSGPIPLPTESRVHKRLIDIIDPSPKTIDALMRINLPAGVDVEIKL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4CIT_1)}(2) \setminus P_{f(8OJN_1)}(2)|=170\), \(|P_{f(8OJN_1)}(2) \setminus P_{f(4CIT_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00000010100111111011111000110000110101001010100100111001101111001110101110011110000000111010110111010000010001111110101000111000010010001011101001111000010110110011011000001000011010101001101010111010110101001011110011010111111001000011000100100100010001000001011011000101010010111100010111011111011100000011001110001011110111001000000011010011000100010111001111000010011011101110011100101000001101111001001001100111001011100011101110010011011100101100001100
Pair \(Z_2\) Length of longest common subsequence
4CIT_1,8OJN_1 190 4
4CIT_1,2MEW_1 204 4
8OJN_1,2MEW_1 122 3

Newick tree

 
[
	4CIT_1:10.22,
	[
		8OJN_1:61,2MEW_1:61
	]:47.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{610 }{\log_{20} 610}-\frac{152}{\log_{20}152})=132.\)
Status Protein1 Protein2 d d1/2
Query variables 4CIT_1 8OJN_1 169 110.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]