CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9JXX_1 1NPY_1 9HKX_1 Letter Amino acid
26 18 41 G Glycine
11 6 12 H Histidine
11 8 16 P Proline
21 33 46 A Alanine
27 9 17 R Arginine
26 16 21 N Asparagine
18 14 27 T Threonine
2 0 7 W Tryptophan
2 3 8 C Cysteine
10 11 18 Q Glutamine
23 13 19 S Serine
22 13 12 Y Tyrosine
24 19 35 V Valine
23 11 30 D Aspartic acid
6 8 15 M Methionine
13 16 11 F Phenylalanine
32 19 34 K Lycine
40 14 31 E Glutamic acid
45 21 31 I Isoleucine
47 19 41 L Leucine

9JXX_1|Chains A, B, D[auth C], E[auth D]|PIN domain-containing protein|Saccharolobus islandicus REY15A (930945)
>1NPY_1|Chains A, B, C, D|Hypothetical shikimate 5-dehydrogenase-like protein HI0607|Haemophilus influenzae (727)
>9HKX_1|Chains A, B, C, D|Adenosylhomocysteinase|Pseudomonas aeruginosa PAO1 (208964)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9JXX , Knot 171 429 0.80 40 206 400
MLILATLGSDKSVTTINAILTEIFTGLNPNKIIIFREDPQKKDIKGMEKALEYLGVNTLIEEKVIGEGIKLWREKIRNEEIDIFDITPGRKYMALSATYYSRAEEIRYVYLKDEREGYNIFGYVPFEQLKVINVRIGDEIPYDPPLTQNVNEAESLLDVDSLRAFINILGLHGKVEINGIDLENPDQVEEICLFRSGKYKYEEEKDIIKEAERGSLFLADTNVYIRLGNRLRSLVYNRKYGFRLLSSKNTFNELYNHTAQDTQKIDENKVKFILGMLSYRSLHVPPITSQVRSSGDMGLINEALEIKKNVEDNVVLITADKALGLTAQSKGLRTIILSKVRKEIGEWDIGELLFCLSFYNDYRNGIRRMIEISLNGSKIAELHSYYHLQERRVKVRVVDKRYNYPKILEILSEILATADQAAAHHHHHH
1NPY , Knot 122 271 0.84 38 174 262
MINKDTQLCMSLSGRPSNFGTTFHNYLYDKLGLNFIYKAFTTQDIEHAIKGVRALGIRGCAVSMPFKETCMPFLDEIHPSAQAIESVNTIVNDNGFLRAYNTDYIAIVKLIEKYHLNKNAKVIVHGSGGMAKAVVAAFKNSGFEKLKIYARNVKTGQYLAALYGYAYINSLENQQADILVNVTSIGMKGGKEEMDLAFPKAFIDNASVAFDVVAMPVETPFIRYAQARGKQTISGAAVIVLQAVEQFELYTHQRPSDELIAEAAAFARTKF
9HKX , Knot 195 472 0.84 40 247 448
SNAMSAVMTPAGFTDYKVADITLAAWGRRELIIAESEMPALMGLRRKYAGQQPLKGAKILGCIHMTIQTGVLIETLVALGAEVRWSSCNIFSTQDQAAAAIAAAGIPVFAWKGETEEEYEWCIEQTILKDGQPWDANMVLDDGGDLTEILHKKYPQMLERIHGITEETTTGVHRLLDMLKNGTLKVPAINVNDSVTKSKNDNKYGCRHSLNDAIKRGTDHLLSGKQALVIGYGDVGKGSSQSLRQEGMIVKVAEVDPICAMQACMDGFEVVSPYKNGINDGTEASIDAALLGKIDLIVTTTGNVNVCDANMLKALKKRAVVCNIGHFDNEIDTAFMRKNWAWEEVKPQVHKIHRTGKDGFDAHNDDYLILLAEGRLVNLGNATGHPSRIMDGSFANQVLAQIHLFEQKYADLPAAEKAKRLSVEVLPKKLDEEVALEMVKGFGGVVTQLTPKQAEYIGVSVEGPFKPDTYRY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9JXX_1)}(2) \setminus P_{f(1NPY_1)}(2)|=89\), \(|P_{f(1NPY_1)}(2) \setminus P_{f(9JXX_1)}(2)|=57\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111110110000100101110011011010011110001000010110011001110011000111011011000100001011010110001110100000100100101000001001110111001011010110011001110001001001101001011101111010101011010010010010110010000000001100100101111000101011001001100000110110000010010000100000100001011111100001011110001000101111001101000100011110100111101000110011100100011010110111010100000011001101010100110100000100001010110000001011011001110100111000000
Pair \(Z_2\) Length of longest common subsequence
9JXX_1,1NPY_1 146 3
9JXX_1,9HKX_1 139 5
1NPY_1,9HKX_1 155 3

Newick tree

 
[
	1NPY_1:77.11,
	[
		9JXX_1:69.5,9HKX_1:69.5
	]:7.61
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{700 }{\log_{20} 700}-\frac{271}{\log_{20}271})=119.\)
Status Protein1 Protein2 d d1/2
Query variables 9JXX_1 1NPY_1 149 121
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]