CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6QSI_1 2BSM_1 5NQZ_1 Letter Amino acid
25 16 19 D Aspartic acid
6 0 0 C Cysteine
23 27 15 E Glutamic acid
54 18 24 L Leucine
20 19 16 T Threonine
23 11 15 Q Glutamine
50 15 29 G Glycine
9 17 24 K Lycine
47 6 4 P Proline
25 16 17 S Serine
14 7 8 Y Tyrosine
29 8 9 R Arginine
12 8 10 N Asparagine
12 4 13 H Histidine
18 10 9 F Phenylalanine
46 11 13 V Valine
76 15 28 A Alanine
22 20 14 I Isoleucine
9 6 2 M Methionine
8 1 0 W Tryptophan

6QSI_1|Chains A, B|Benzoylformate decarboxylase|Pseudomonas protegens Pf-5 (220664)
>2BSM_1|Chain A|HEAT SHOCK PROTEIN HSP90-ALPHA|HOMO SAPIENS (9606)
>5NQZ_1|Chains A, B|Factor H binding protein,Major outer membrane protein P.IA,Factor H binding protein| Neisseria meningitidis MC58 (122586)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6QSI , Knot 211 528 0.83 40 236 473
MKTVHSASYDILRQQGLTTVFGNPGSNELPFLKGFPEDFRYILGLHEGAVVGMADGFALASGQPAFVNLHAAAGTGNGMGALTNAWYSHSPLVITAGQQVRSMIGVEAMLANVDAPQLPKPLVKWSAEPACAEDVPRALSQAIHMANQAPKGPVYLSIPYDDWARPAPAGVEHLARRQVATAGLPSAAQLRSLVQRLAAARNPVLVLGPDVDGSRSNHLAVQLAEKLRMPAWVAPSASRCPFPTRHPSFRGVLPAAIAGISRCLADHDLILVVGAPVFRYHQFAPGDYLPAGTELLHITCDPGEAARAPMGDALVGDIVETLQALVWALPDCDRPQPQALPPAAPVEELGGLLRPETVFDVIDELAPKDAIYVKESTSTVGAFWQRVEMREPGSYYFPAAGGLGFGLPAAVGVQLARPERRVIGVIGDGSANYGITALWTAAQYQIPVVFIILKNGTYGALRWFAGVLQVSDAPGLDVPGLDFCAIGRGYGVHSVQANTREAFAQALSEALAGDRPVLIEVPTLTIEP
2BSM , Knot 104 235 0.80 38 156 226
PEETQTQDQPMEEEEVETFAFQAEIAQLMSLIINTFYSNKEIFLRELISNSSDALDKIRYESLTDPSKLDSGKELHINLIPNKQDRTLTIVDTGIGMTKADLINNLGTIAKSGTKAFMEALQAGADISMIGQFGVGFYSAYLVAEKVTVITKHNDDEQYAWESSAGGSFTVRTDTGEPMGRGTKVILHLKEDQTEYLEERRIKEIVKKHSQFIGYPITLFVEKERDKEVSDDEAE
5NQZ , Knot 117 269 0.81 36 156 247
MVAADIGAGLADALTAPLDHKDKGLQSLTLDQSVRKNEKLKLAAQGAEKTYGNGDSLNTGKLKNDKVSRFDFIRQIEVDGQLITLESGEFQVYKQSHSALTAFQTEQIQDSEHSGKMVAKRQFRIGDIAGEHTSFDKLPEGGRATYRGTAFGSDDAGGKLTYTIDFAAKQGNGKIEHLKSPELNVDLAAADIKPDGKRHAVISGSVLYNQAEKGSYSLGIFGGKAQEVAGSAEVKTVYYTKDTNNNLTLVGIRHIGLAAKQLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6QSI_1)}(2) \setminus P_{f(2BSM_1)}(2)|=134\), \(|P_{f(2BSM_1)}(2) \setminus P_{f(6QSI_1)}(2)|=54\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100100011000110011101100011110111001001111001111111011111010111101011110101111100110000111101100100111101111010110110111010101101001101100110110011011101011000110111111001100011011110110100110011110011111110101000001110110010111111101000111000101011111111110001100011111111110000111100111100110100011011011110111101100101111111000010101111111100111110100110110011100110100000011111001010011000111111111111111110110100011111101010011011101100011111111001001110111111010011110111101011101011001010000111011001111001111011010101
Pair \(Z_2\) Length of longest common subsequence
6QSI_1,2BSM_1 188 4
6QSI_1,5NQZ_1 162 4
2BSM_1,5NQZ_1 136 3

Newick tree

 
[
	6QSI_1:93.39,
	[
		5NQZ_1:68,2BSM_1:68
	]:25.39
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{763 }{\log_{20} 763}-\frac{235}{\log_{20}235})=146.\)
Status Protein1 Protein2 d d1/2
Query variables 6QSI_1 2BSM_1 182 131.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]