CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3ZIK_1 3CQV_1 5HCH_1 Letter Amino acid
15 13 5 F Phenylalanine
18 8 2 R Arginine
17 10 5 D Aspartic acid
8 8 0 H Histidine
18 6 3 I Isoleucine
22 16 17 V Valine
12 7 11 G Glycine
60 27 8 L Leucine
19 13 1 K Lycine
11 8 3 P Proline
27 20 11 S Serine
13 2 1 Y Tyrosine
31 13 10 A Alanine
20 9 14 N Asparagine
13 4 7 Q Glutamine
8 8 0 M Methionine
7 1 0 C Cysteine
27 17 3 E Glutamic acid
27 8 12 T Threonine
5 1 1 W Tryptophan

3ZIK_1|Chains A, B|WPL1|EREMOTHECIUM GOSSYPII (33169)
>3CQV_1|Chain A|Nuclear receptor subfamily 1 group D member 2|Homo sapiens (9606)
>5HCH_1|Chain A|Fucose-binding lectin|Pseudomonas aeruginosa (287)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3ZIK , Knot 157 378 0.82 40 203 354
DEALSRHFNELRTMGETLKYSEDLDFILSDNSMTTPEHRRNNMLRLCLDMMNNEDLCQYIVKYRHREVWEWCFQGTDPKQKVTSLLQCFIADKIPLLRHDKRWAMLSLENFILPLATDEVFPKKIAGSRLVKLNYQDLLRKLKFTNTCEYALYIWATYLLYTEAVYGAVPALARLISRGQLKDWDTACSLLENNIVAAPSGSDIEEYAQAFQTLAGLSREKLTNEGVLKCLIKLTNHTTVLELSADLLPSLVRSLAMSVQLHQNNIVSSISEIKTNLLILQLGLLLNIVSEATTAASTEELTNFGAVFRSVFVKKPTEMSFVLQLFLLVYAYSAGAAGVQLPPAEADFLKSELEAFATDVSSYNHNIHTRITRVLETL
3CQV , Knot 93 199 0.82 40 138 192
HLVCPMSKSPYVDPHKSGHEIWEEFSMSFTPAVKEVVEFAKRIPGFRDLSQHDQVNLLKAGTFEVLMVRFASLFDAKERTVTFLSGKKYSVDDLHSMGAGDLLNSMFEFSEKLNALQLSDEEMSLFTAVVLVSADRSGIENVNSVEALQETLIRALRTLIMKNHPNEASIFTKLLLKLPDLRSLNNMHSEELLAFKVHP
5HCH , Knot 53 114 0.73 34 79 109
ATQGVFTLPANTRFGVTAFANSSGTQTVNVLVNNETAATFSGQSTNNAVIGTQVLNSGSSGKVQVQVSVNGRPSDLVSAQVILTNELNFALVGSEDGTDNDYNDAVVVINWPLG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3ZIK_1)}(2) \setminus P_{f(3CQV_1)}(2)|=124\), \(|P_{f(3CQV_1)}(2) \setminus P_{f(3ZIK_1)}(2)|=59\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001100010010011001000001011100001001000000110101011000010001100000011010101001000100110011100111100000111101001111110001110011100110100001100101000000110111001100011011111110110010100100100110001111101001000101100111100001000111001101000001101010111011001110101000011001001000111101111101100100110000100111110011100100101110111110100111111011110101100010111001000000100010011001
Pair \(Z_2\) Length of longest common subsequence
3ZIK_1,3CQV_1 183 3
3ZIK_1,5HCH_1 174 4
3CQV_1,5HCH_1 141 4

Newick tree

 
[
	3ZIK_1:94.71,
	[
		5HCH_1:70.5,3CQV_1:70.5
	]:24.21
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{577 }{\log_{20} 577}-\frac{199}{\log_{20}199})=108.\)
Status Protein1 Protein2 d d1/2
Query variables 3ZIK_1 3CQV_1 136 103.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]