CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2MXC_1 6ELL_1 3DAY_1 Letter Amino acid
8 33 45 S Serine
11 21 39 V Valine
8 10 17 N Asparagine
1 5 10 C Cysteine
9 3 16 H Histidine
10 4 33 I Isoleucine
15 8 22 R Arginine
13 6 32 E Glutamic acid
10 10 37 K Lycine
10 6 17 F Phenylalanine
10 19 29 T Threonine
4 14 15 Y Tyrosine
10 8 30 D Aspartic acid
7 8 24 Q Glutamine
10 20 46 G Glycine
14 18 54 L Leucine
8 12 39 A Alanine
2 3 20 M Methionine
11 14 30 P Proline
1 5 15 W Tryptophan

2MXC_1|Chain A|Sorting nexin-3|Homo sapiens (9606)
>6ELL_1|Chains A, C[auth H]|fAB heavy chain|Homo sapiens (9606)
>3DAY_1|Chain A|Acyl-coenzyme A synthetase ACSM2A, mitochondrial precursor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2MXC , Knot 83 172 0.82 40 130 167
MAHHHHHHVGTAETVADTRRLITKPQNLNDAYGPPSNFLEIDVSNPQTVGVGRGRFTTYEIRVKTNLPIFKLKESTVRRRYSDFEWLRSELERESKVVVPPLPGKAFLRQLPFRGDDGIFDDNFIEERKQGLEQFINKVAGHPLAQNERCLHMFLQDEIIDKSYTPSKIRHA
6ELL , Knot 103 227 0.82 40 151 214
EVQLVESGGGLVQPGGSLRLSCATSGFDFSRYWMSWVRQAPGKGLVWIGEVNPDSTSINYTPSLKDQFTISRDNAKNTLYLQMNSLRAEDTAVYYCTRPNYYGSRYHYYAMDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSC
3DAY , Knot 226 570 0.83 40 263 528
MGHHHHHHSSGVDLGTENLYFQSMSLQWGHQEVPAKFNFASDVLDHWADMEKAGKRPPSPALWWVNGKGKELMWNFRELSENSQQAANVLSGACGLQRGDRVAVVLPRVPEWWLVILGCIRAGLIFMPGTIQMKSTDILYRLQMSKAKAIVAGDEVIQEVDTVASECPSLRIKLLVSEKSCDGWLNFKKLLNEASTTHHCVETGSQEASAIYFTSGTSGLPKMAEHSYSSLGLKAKMDAGWTGLQASDIMWTISDTGWILNILCSLMEPWALGACTFVHLLPKFDPLVILKTLSSYPIKSMMGAPIVYRMLLQQDLSSYKFPHLQNCVTVGESLLPETLENWRAQTGLDIRESYGQTETGLTCMVSKTMKIKPGYMGTAASCYDVQIIDDKGNVLPPGTEGDIGIRVKPIRPIGIFSGYVDNPDKTAANIRGDFWLLGDRGIKDEDGYFQFMGRADDIINSSGYRIGPSEVENALMEHPAVVETAVISSPDPVRGEVVKAFVVLASQFLSHDPEQLTKELQQHVKSVTAPYKYPRKIEFVLNLPKTVTGKIQRAKLRDKEWKMSGKARAQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2MXC_1)}(2) \setminus P_{f(6ELL_1)}(2)|=79\), \(|P_{f(6ELL_1)}(2) \setminus P_{f(2MXC_1)}(2)|=100\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000011010011000011001001001011100110101001001111010100001010001111010000100000010110001000001111111101110011101001110001100000110011001110111000001011100011000001001001
Pair \(Z_2\) Length of longest common subsequence
2MXC_1,6ELL_1 179 4
2MXC_1,3DAY_1 201 6
6ELL_1,3DAY_1 200 4

Newick tree

 
[
	3DAY_1:10.58,
	[
		2MXC_1:89.5,6ELL_1:89.5
	]:14.08
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{399 }{\log_{20} 399}-\frac{172}{\log_{20}172})=67.6\)
Status Protein1 Protein2 d d1/2
Query variables 2MXC_1 6ELL_1 84 74
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]