CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8WQM_1 2EQB_1 8DUB_1 Letter Amino acid
12 3 5 Y Tyrosine
24 13 13 D Aspartic acid
11 6 9 Q Glutamine
12 10 9 N Asparagine
11 1 13 H Histidine
21 17 12 I Isoleucine
35 15 47 L Leucine
23 15 13 K Lycine
9 3 16 M Methionine
29 10 15 A Alanine
21 6 11 R Arginine
27 12 13 V Valine
26 10 7 F Phenylalanine
2 2 3 W Tryptophan
21 14 11 T Threonine
5 1 1 C Cysteine
20 3 9 P Proline
38 9 23 S Serine
32 11 15 E Glutamic acid
38 13 10 G Glycine

8WQM_1|Chain A|4-hydroxyphenylpyruvate dioxygenase|Arabidopsis thaliana (3702)
>2EQB_1|Chain A|Ras-related protein SEC4|Saccharomyces cerevisiae (4932)
>8DUB_1|Chains A, B|Estrogen receptor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8WQM , Knot 175 417 0.84 40 224 401
GSHMVRKNPKSDKFKVKRFHHIEFWCGDATNVARRFSWGLGMRFSAKSDLSTGNMVHASYLLTSGDLRFLFTAPYSPSLSAGEIKPTTTASIPSFDHGSCRSFFSSHGLGVRAVAIEVEDAESAFSISVANGAIPSSPPIVLNEAVTIAEVKLYGDVVLRYVSYKAEDTEKSEFLPGFERVEDASSFPLDYGIRRLDHAVGNVPELGPALTYVAGFTGFHQFAEFTADDVGTAESGLNSAVLASNDEMVLLPINEPVHGTKRKSQIQTYLEHNEGAGLQHLALMSEDIFRTLREMRKRSSIGGFDFMPSPPPTYYQNLKKRVGDVLSDDQIKECEELGILVDRDDQGTLLQIFTKPLGDRPTIFIEIIQRVGCMMKDEEGKAYQSGGCGGFGKGNFSELFKSIEEYEKTLEAKQLVG
2EQB , Knot 83 174 0.82 40 128 167
GPLGSSIMKILLIGDSGVGKSCLLVRFVEDKFNPSFITTIGIDFKIKTVDINGKKVKLQLWDTAGQERFRTITTAYYRGAMGIILVYDVTDERTFTNIKQWFKTVNEHANDEAQLLLVGNKSDMETRVVTADQGEALAKELGIPFIESSAKNDDNVNEIFFTLAKLIQEKIDSN
8DUB , Knot 113 255 0.81 40 161 244
MSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLESAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKSVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKSKNVVPSYDLLLEMLDAHRLHAPTS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8WQM_1)}(2) \setminus P_{f(2EQB_1)}(2)|=130\), \(|P_{f(2EQB_1)}(2) \setminus P_{f(8WQM_1)}(2)|=34\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110001000010100100101101010011001011111010100010010110100110010101110110010101101010001011010010000110001111011110100100110101101111001111100110110101010111001000100000001111100100100111001100100111011011111001111011001101010011010011001111000011111100110100000010001000011110011110001100100100000111101110111000001000110110000100000111110000010110110011100101110110011011000010100011011110101001100100000010100111
Pair \(Z_2\) Length of longest common subsequence
8WQM_1,2EQB_1 164 4
8WQM_1,8DUB_1 155 3
2EQB_1,8DUB_1 159 5

Newick tree

 
[
	2EQB_1:81.81,
	[
		8WQM_1:77.5,8DUB_1:77.5
	]:4.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{591 }{\log_{20} 591}-\frac{174}{\log_{20}174})=119.\)
Status Protein1 Protein2 d d1/2
Query variables 8WQM_1 2EQB_1 152 104.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]