CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2KXK_1 8ESE_1 5UOW_1 Letter Amino acid
2 3 54 E Glutamic acid
2 1 59 I Isoleucine
1 1 57 V Valine
0 3 36 P Proline
2 1 27 Y Tyrosine
0 1 38 D Aspartic acid
2 1 50 G Glycine
0 2 39 F Phenylalanine
0 1 18 H Histidine
2 3 72 L Leucine
0 2 66 A Alanine
0 1 45 R Arginine
2 0 35 N Asparagine
0 0 24 M Methionine
2 1 56 S Serine
1 0 45 T Threonine
0 0 13 W Tryptophan
4 1 9 C Cysteine
2 0 31 Q Glutamine
0 1 40 K Lycine

2KXK_1|Chain A|Insulin A chain|Homo sapiens (9606)
>8ESE_1|Chain A[auth X]|VPS35 endosomal protein-sorting factor-like|Homo sapiens (9606)
>5UOW_1|Chains A, C|N-methyl-D-aspartate receptor subunit NR1-8a|Xenopus laevis (8355)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2KXK , Knot 15 22 0.70 22 21 20
GIVEQCCTSICSLYQLENYCNG
8ESE , Knot 17 23 0.77 30 19 21
EFASCRLEAVPLEFGDYHPLKPI
5UOW , Knot 318 814 0.87 40 319 768
DPKIVNIGAVLSTKKHEQIFREAVNQANKRHFTRKIQLNATSVTHRPNAIQMALSVCEDLISSQVYAILVSHPPAPTDHLTPTPISYTAGFYRIPVIGLTTRMSIYSDKSIHLSFLRTVPPYSHQALVWFEMMRLFNWNHVILIVSDDHEGRAAQKKLETLLEEKESKADKVLQFEPGTKNLTALLLEAKELEARVIILSASEDDATAVYKSAAMLDMTGAGYVWLVGEREISGSALRYAPDGIIGLQLINGKNESAHISDAVAVVAQAIHELFEMEQITDPPRGCVGNTNIWKTGPLFKRVLMSSKYPDGVTGRIEFNEDGDRKFAQYSIMNLQNRKLVQVGIFDGSYIIQNDRKIIWPGGETERPQGYQMSTRLKIVTIHQEPFVYVRPTTSDGTCREEYTINGDPIKKVICNGPDETIPGRPTVPQCCYGFCVDLLIKLAREMDFTYEVHLVADGKFGTQERVNNSNAAAWNGMMGELLSGQADMIVAPLTINNERAQYIEFSKPFKYQGLTILVKKEIPRSTLDSFMQPFQSTLWLLVGLSVHVVAVMLYLLDRFSPFGRFKVNSAAAEEDALTLSSAMWFSWRVLLNSGLGEGAPRSFSARILGMVWALFAMIIVASYTANLAAFLVLRRPEERITGINDPRLRNPSDKFIYATVKQSSVDIYFRRQVELSTMYRHMEKHNYESAAEAIQAVRDNKLHAFIWDSAVLEFEASQDCDLVTTGELFFRSGFGIGMRKDSPWKQEVSLNILKSHENGFMEELDKTWVRYQECDSRSNAPATLTFENMAGVFYLVAGGIVAGIFLIFIEIAYK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2KXK_1)}(2) \setminus P_{f(8ESE_1)}(2)|=20\), \(|P_{f(8ESE_1)}(2) \setminus P_{f(2KXK_1)}(2)|=18\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1110000001001001000001
Pair \(Z_2\) Length of longest common subsequence
2KXK_1,8ESE_1 38 2
2KXK_1,5UOW_1 304 3
8ESE_1,5UOW_1 306 3

Newick tree

 
[
	5UOW_1:17.75,
	[
		2KXK_1:19,8ESE_1:19
	]:15.75
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{45 }{\log_{20} 45}-\frac{22}{\log_{20}22})=9.58\)
Status Protein1 Protein2 d d1/2
Query variables 2KXK_1 8ESE_1 12 11.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: