CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3UTP_1 5NDZ_1 7QOS_1 Letter Amino acid
10 30 15 N Asparagine
9 19 7 Q Glutamine
8 24 28 G Glycine
12 74 39 L Leucine
12 41 37 K Lycine
31 32 22 S Serine
19 29 26 D Aspartic acid
2 9 8 W Tryptophan
11 25 21 Y Tyrosine
11 74 6 A Alanine
9 25 24 R Arginine
6 23 36 E Glutamic acid
5 15 13 M Methionine
12 30 26 F Phenylalanine
5 8 2 C Cysteine
0 21 9 H Histidine
8 42 27 I Isoleucine
11 18 11 P Proline
9 33 16 T Threonine
11 47 27 V Valine

3UTP_1|Chains A[auth D], C[auth K]|1E6 TCR alpha chain|Homo sapiens (9606)
>5NDZ_1|Chain A|Lysozyme,Proteinase-activated receptor 2,Soluble cytochrome b562,Proteinase-activated receptor 2|Enterobacteria phage T4 (10665)
>7QOS_1|Chains A, B|Cyclopropane-fatty-acyl-phospholipid synthase|Aquifex aeolicus VF5 (224324)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3UTP , Knot 94 201 0.82 38 142 192
KEVEQDPGPLSVPEGAIVSLNCTYSNSAFQYFMWYRQYSRKGPELLMYTYSSGNKEDGRFTAQVDKSSKYISLFIRDSQPSDSATYLCAMRGDSSYKLIFGSGTRLLVRPDIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKCVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSP
5NDZ , Knot 239 619 0.82 40 267 555
MVSAIVLYVLLAAAAHSAFAAASNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYIYEFFSVDEFSASVLTGKLTTVFLPIVYTIVFVVALPSNGMALWVFLFRTKKKAPAVIYMANLALADLLSVIWFPLKIAYHIHGNNWIYGEALCNVLIGFFYANMYCSILFLTCLSVQRAWEIVNPMGHSRKKANIAIGISLAIWLLILLVTIPLYVVKQTIFIPALQITTCHDVLPEQLLVGDMFNYFLSLAIGVFLFPAFLTASAYVLMIRALADLEDNWETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLEDKSPDSPEMKDFRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRNAYIQKYLENSEKKRKRAIKLAVTVAAMYLICFTPSNLLLVVHYFLIKSQGQSHVYALYIVALCLSTLNSCIDPFVYYFVSHDFRDHAKNALLCRSVRTVKQMQVSLTSKAAAHHHHHHHHHH
7QOS , Knot 166 400 0.83 40 219 382
MIKEAIVERIVNKLNENQKEKIGVELPSGKRIPEFPVSHLIRFKTWKSLDYVLKDPEMGFGEGYMNGDIEVEGDLEEVIKRGMTLFKDTRKFEKLFGILRHVPLFRTIRDERNVKHHYDLGNDFYRLWLDKSMTYSCAFFEDPSMSIDEAQSLKRRMIYEKLQLKEGDTLLDIGCGWGSIILESAELYNVKSVGITLSDNQYEYVKEEIKKRGLQDKVEVYKLHYVDLPKLGRKFNKVVSVGMFEHVGKENYETFFNTVYRVMEEGGLFLLHTIGKLHPDTQSRWIRKYIFPGGYLPSISEIVESFRDMDFTLIDFDNWRMHYYWTLKKWKERFYENLDKIRNMFDDRFIRMWELYLTASAVSFLIGSNYVFQTLLSKGVKDDYPVIKREFSGVLFKEGT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3UTP_1)}(2) \setminus P_{f(5NDZ_1)}(2)|=40\), \(|P_{f(5NDZ_1)}(2) \setminus P_{f(3UTP_1)}(2)|=165\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001000111101101111010000000110011100000001101110000010000101010100000010111000010001001011010000011110100111010100101110010000000001011001000001000000010100001101001010000111100000110101100011100011101
Pair \(Z_2\) Length of longest common subsequence
3UTP_1,5NDZ_1 205 3
3UTP_1,7QOS_1 175 3
5NDZ_1,7QOS_1 156 5

Newick tree

 
[
	3UTP_1:10.40,
	[
		7QOS_1:78,5NDZ_1:78
	]:22.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{820 }{\log_{20} 820}-\frac{201}{\log_{20}201})=171.\)
Status Protein1 Protein2 d d1/2
Query variables 3UTP_1 5NDZ_1 217 143.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]