CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4XUF_1 5JDN_1 3FDY_1 Letter Amino acid
10 6 44 P Proline
21 22 50 S Serine
12 6 28 R Arginine
26 8 34 E Glutamic acid
20 32 42 G Glycine
10 9 17 M Methionine
18 15 36 F Phenylalanine
9 6 44 T Threonine
5 4 9 W Tryptophan
14 10 15 Y Tyrosine
14 15 31 N Asparagine
8 3 29 Q Glutamine
32 46 44 L Leucine
19 29 40 V Valine
8 2 10 C Cysteine
3 3 14 H Histidine
17 39 26 I Isoleucine
17 28 40 A Alanine
14 9 39 D Aspartic acid
20 10 31 K Lycine

4XUF_1|Chains A, B|Receptor-type tyrosine-protein kinase FLT3|Homo sapiens (9606)
>5JDN_1|Chain A|sodium-calcium exchanger NCX_Mj|Methanocaldococcus jannaschii (strain ATCC 43067 / DSM 2661 / JAL-1 / JCM 10045 / NBRC 100440) (243232)
>3FDY_1|Chain A|Pyranose oxidase (Pyranose 2-oxidase)|Trametes multicolor (230624)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4XUF , Knot 129 297 0.82 40 194 284
DLKWEFPRENLEFGKVLGSGAFGKVMNATAYGISKTGVSIQVAVKMLKEKADSSEREALMSELKMMTQLGSHENIVNLLGACTLSGPIYLIFEYCCYGDLLNYLRSKREKFSEDEIEYENQKRLEEEEDLNVLTFEDLLCFAYQVAKGMEFLEFKSCVHRDLAARNVLVTHGKVVKICDFGLARDIMSDSNYVVRGNARLPVKWMAPESLFEGIYTIKSDVWSYGILLWEIFSLGVNPYPGIPVDANFYKLIQNGFKMDQPFYATEEIYIIMQSCWAFDSRKRPSFPNLTSFLGCQL
5JDN , Knot 125 302 0.78 40 146 273
MVILGVGYFLLGLILLYYGSDWFVLGSERIARHFNVSNFVIGATVMAIGTSLPEILTSAYASYMHAPGISIGNAIGSCICNIGLVLGLSAIISPIIVDKNLQKNILVYLLFVIFAAVIGIDGFSWIDGVVLLILFIIYLRWTVKNGSAEIEENNDKNNPSVVFSLVLLIIGLIGVLVGAELFVDGAKKIALALDISDKVIGFTLVAFGTSLPELMVSLAAAKRNLGGMVLGNVIGSNIADIGGALAVGSLFMHLPAENVQMAVLVIMSLLLYLFAKYSKIGRWQGILFLALYIIAIASLRMG
3FDY , Knot 246 623 0.84 40 285 585
MSTSSSDPFFNFAKSSFRSAAAQKASASSLPPLPGPDKKVPGMDIKYDVVIVGSGPIGCTYARELVGAGYKVAMFDIGEIDSGLKIGAHKKNTVEYQKNIDKFVNVIQGQLMSVSVPVNTLVVDTLSPTSWQASTFFVRNGSNPEQDPLRNLSGQAVTRVVGGMSTHWGCATPRFDREQRPLLVKDDADADDAEWDRLYTKAESYFQTGTDQFKESIRHNLVLNKLTEEYKGQRDFQQIPLAATRRSPTFVEWSSANTVFDLQNRPNTDAPEERFNLFPAVACERVVRNALNSEIESLHIHDLISGDRFEIKADVYVLTAGAVHNTQLLVNSGFGQLGRPNPANPPELLPSLGSYITEQSLVFCQTVMSTELIDSVKSDMTIRGTPGELTYSVTYTPGASTNKHPDWWNEKVKNHMMQHQEDPLPIPFEDPEPQVTTLFQPSHPWHTQIHRDAFSYGAVQQSIDSRLIVDWRFFGRTEPKEENKLWFSDKITDAYNMPQPTFDFRFPAGRTSKEAEDMMTDMCVMSAKIGGFLPGSLPQFMKPGLCLHLGGTHRMGFDEKEDNCCVNTDSRVFGFKNLFLGGCGNIPTAYGANPTLTAMSLAIKSCEYIKQNFTPSPFTSEAQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4XUF_1)}(2) \setminus P_{f(5JDN_1)}(2)|=107\), \(|P_{f(5JDN_1)}(2) \setminus P_{f(4XUF_1)}(2)|=59\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010101100010110111011110110101011000110101110110001000000111001011001100001101111001011101110000010110010000001000010000000100000101101001101100110110110100010001110011100101101001111001100000110101011101111001101100100011001111101101110101111101010011001101001101000101110001110000010110100111001
Pair \(Z_2\) Length of longest common subsequence
4XUF_1,5JDN_1 166 4
4XUF_1,3FDY_1 169 4
5JDN_1,3FDY_1 199 4

Newick tree

 
[
	3FDY_1:95.20,
	[
		4XUF_1:83,5JDN_1:83
	]:12.20
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{599 }{\log_{20} 599}-\frac{297}{\log_{20}297})=84.5\)
Status Protein1 Protein2 d d1/2
Query variables 4XUF_1 5JDN_1 105 103
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]