CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1QKP_1 9MQF_1 1SIQ_1 Letter Amino acid
3 13 16 N Asparagine
0 4 9 C Cysteine
9 27 27 E Glutamic acid
11 9 13 Y Tyrosine
18 22 19 T Threonine
21 21 18 V Valine
4 12 16 Q Glutamine
25 30 38 G Glycine
0 13 9 H Histidine
7 16 14 K Lycine
29 32 33 A Alanine
15 7 22 I Isoleucine
11 21 14 P Proline
13 11 11 F Phenylalanine
13 21 26 S Serine
8 6 4 W Tryptophan
7 24 26 R Arginine
9 20 17 D Aspartic acid
36 30 46 L Leucine
9 12 14 M Methionine

1QKP_1|Chain A|BACTERIORHODOPSIN|HALOBACTERIUM SALINARIUM (2242)
>9MQF_1|Chain A|Acyl-[acyl-carrier-protein] hydrolase|Chlamydomonas reinhardtii (3055)
>1SIQ_1|Chain A|Glutaryl-CoA dehydrogenase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1QKP , Knot 113 248 0.83 36 153 239
QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALLVDADQGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFFGFTSKAESMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSRAIFGEAEAPEPSAGDGAAATS
9MQF , Knot 148 351 0.82 40 197 332
MVPPDKRSFREEHRIRGYEVSPDQRATIVTVANLLQEVAGNHAVGMWGRTDEGFASLPSMKDLLFVMTRLQVRMYEYPKWGDVVAVETYFTEEGRLAFRREWKLMDVATGKLLGAGTSTWVTINTATRRLSKLPEDVRKRFLRFAPPSSVHILPPEETKKKLQDMELPGQVQSAQQVARRADMDMNGHINNVTYLAWTLESLPERVMSGGYKMQEIELDFKAECTAGNAIEAHCNPLDDHSASFVGPAPANGNGNGHAAEPAADSAPLYFLSMLQKCDENGCTELVRARTTWSRTLEGAKPAPPPLSELSAAQGTGENLYFQGSGGGGSDYKDDDDKGTGRSRLEHHHHHH
1SIQ , Knot 166 392 0.84 40 236 374
EFDWQDPLVLEEQLTTDEILIRDTFRTYCQERLMPRILLANRNEVFHREIISEMGELGVLGPTIKGYGCAGVSSVAYGLLARELERVDSGYRSAMSVQSSLVMHPIYAYGSEEQRQKYLPQLAKGELLGCFGLTEPNSGSDPSSMETRAHYNSSNKSYTLNGTKTWITNSPMADLFVVWARCEDGCIRGFLLEKGMRGLSAPRIQGKFSLRASATGMIIMDGVEVPEENVLPGASSLGGPFGCLNNARYGIAWGVLGASEFCLHTARQYALDRMQFGVPLARNQLIQKKLADMLTEITLGLHACLQLGRLKDQDKAAPEMVSLLKRNNCGKALDIARQARDMLGGNGISDEYHVIRHAMNLEAVNTYEGTHDIHALILGRAITGIQAFTASK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1QKP_1)}(2) \setminus P_{f(9MQF_1)}(2)|=62\), \(|P_{f(9MQF_1)}(2) \setminus P_{f(1QKP_1)}(2)|=106\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01010101011111110111111010111011110010100101100111111101010111101101111110001101100101110011111011111010010111111101111101111110010000111111001111011011111100010010101100101100101111010111111100111111101001111110101011111111000111101011010110111100
Pair \(Z_2\) Length of longest common subsequence
1QKP_1,9MQF_1 168 4
1QKP_1,1SIQ_1 179 3
9MQF_1,1SIQ_1 165 3

Newick tree

 
[
	1QKP_1:88.17,
	[
		9MQF_1:82.5,1SIQ_1:82.5
	]:5.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{599 }{\log_{20} 599}-\frac{248}{\log_{20}248})=99.1\)
Status Protein1 Protein2 d d1/2
Query variables 1QKP_1 9MQF_1 127 106
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]