CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3RBE_1 3NQF_1 5TBN_1 Letter Amino acid
27 16 2 L Leucine
12 9 2 F Phenylalanine
15 4 3 Y Tyrosine
30 17 6 V Valine
13 7 2 N Asparagine
2 2 2 H Histidine
11 10 1 P Proline
19 11 0 S Serine
32 16 9 E Glutamic acid
18 21 3 G Glycine
37 22 2 I Isoleucine
38 7 1 K Lycine
0 0 1 W Tryptophan
24 20 2 R Arginine
4 3 5 Q Glutamine
1 3 8 C Cysteine
5 10 3 M Methionine
10 7 1 T Threonine
24 25 0 A Alanine
19 18 4 D Aspartic acid

3RBE_1|Chains A, D[auth B]|DNA polymerase IV|Sulfolobus solfataricus (273057)
>3NQF_1|Chains A, B|Orotidine 5'-phosphate decarboxylase|Methanothermobacter thermautotrophicus (145262)
>5TBN_1|Chain A|PHD finger protein 20|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3RBE , Knot 145 341 0.82 38 180 322
GIVLFVDFDYFYAQVEEVLNPSLKGKPVVVCVFSGRFEDSGAVATANYEARKFGVKAGIPIVEAKKILPNAVYLPMRKEVYQQVSSRIMNLLREYSEKIEIASIDEAYLDISDKVRDYREAYNLGLEIKNKILEKEKITVTVGISKNKVFAKIAADMAKPNGIKVIDDEEVKRLIRELDIADVPGIGNITAEKLKKLGINKLVDTLSIEFDKLKGMIGEAKAKYLISLARDEYNEPIRTRVRKSIGRIVTMKRNSRNLEEIKPYLFRAIEESYYKLDKRIPKAIHVVAVTEDLDIVSRGRTFPHGISKETAYSESVKLLQKILEEDERKIRRIGVRFSKFI
3NQF , Knot 97 228 0.77 38 144 215
MRSRRVDVMDVMNRLILAMDLMNRDDALRVTGEVREYIDTVKIGYPLVLSEGMDIIAEFRKRFGCRIIADFKVADIPETNEKICRATFKAGADAIIVHGFPGADSVRACLNVAEEMGREVFLSTEMSHPGAEMFIQGAADEIARMGVDLGVKNYVGPSTRPERLSRLREIIGQDSFLISPGVGAQGGDPGETLRFADAIIVGRSIYLADNPAAAAAGIIESIKDLLNP
5TBN , Knot 33 57 0.78 36 47 54
GHMDRYDFEVVRCICEVQEENDFMIQCEECQCWQHGVCMGLLEENVPEKYTCYVCQD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3RBE_1)}(2) \setminus P_{f(3NQF_1)}(2)|=87\), \(|P_{f(3NQF_1)}(2) \setminus P_{f(3RBE_1)}(2)|=51\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11111101001010100110101010111101101010001111010001001110111111010011101101110001000100011011000000101101001010100010000010011101000110000101011100001110111011010110110000100110010110111110101001001110011001010100101111010100110110000001100010001101101000000100101011011000000100011011011110001011001001101100001000010110011000000100111010011
Pair \(Z_2\) Length of longest common subsequence
3RBE_1,3NQF_1 138 3
3RBE_1,5TBN_1 195 3
3NQF_1,5TBN_1 159 2

Newick tree

 
[
	5TBN_1:94.67,
	[
		3RBE_1:69,3NQF_1:69
	]:25.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{569 }{\log_{20} 569}-\frac{228}{\log_{20}228})=97.1\)
Status Protein1 Protein2 d d1/2
Query variables 3RBE_1 3NQF_1 121 100
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]