CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5YWH_1 1SOJ_1 1MGX_1 Letter Amino acid
9 9 1 M Methionine
29 18 4 F Phenylalanine
2 7 1 W Tryptophan
12 15 2 Y Tyrosine
13 14 2 Q Glutamine
25 17 3 K Lycine
5 7 2 C Cysteine
36 46 2 L Leucine
21 20 3 R Arginine
26 28 1 D Aspartic acid
41 22 2 G Glycine
21 24 0 I Isoleucine
21 19 0 P Proline
41 25 2 S Serine
21 16 3 T Threonine
29 19 3 V Valine
33 28 1 A Alanine
15 31 3 N Asparagine
33 38 12 E Glutamic acid
12 17 0 H Histidine

5YWH_1|Chains A, B|4-hydroxyphenylpyruvate dioxygenase|Arabidopsis thaliana (3702)
>1SOJ_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|cGMP-inhibited 3',5'-cyclic phosphodiesterase B|Homo sapiens (9606)
>1MGX_1|Chain A|COAGULATION FACTOR IX|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5YWH , Knot 187 445 0.85 40 234 425
MGHQNAAVSENQNHDDGAASSPGFKLVGFSKFVRKNPKSDKFKVKRFHHIEFWCGDATNVARRFSWGLGMRFSAKSDLSTGNMVHASYLLTSGDLRFLFTAPYSPSLSAGEIKPTTTASIPSFDHGSCRSFFSSHGLGVRAVAIEVEDAESAFSISVANGAIPSSPPIVLNEAVTIAEVKLYGDVVLRYVSYKAEDTEKSEFLPGFERVEDASSFPLDYGIRRLDHAVGNVPELGPALTYVAGFTGFHQFAEFTADDVGTAESGLNSAVLASNDEMVLLPINEPVHGTKRKSQIQTYLEHNEGAGLQHLALMSEDIFRTLREMRKRSSIGGFDFMPSPPPTYYQNLKKRVGDVLSDDQIKECEELGILVDRDDQGTLLQIFTKPLGDRPTIFIEIIQRVGCMMKDEEGKAYQSGGCGGFGKGNFSELFKSIEEYEKTLEAKQLVG
1SOJ , Knot 177 420 0.84 40 235 402
EQEVSLDLILVEEYDSLIEKMSNWNFPIFELVEKMGEKSGRILSQVMYTLFQDTGLLEIFKIPTQQFMNYFRALENGYRDIPYHNRIHATDVLHAVWYLTTRPVPGLQQIHNGCGTGNETDSDGRINHGRIAYISSKSCSNPDESYGCLSSNIPALELMALYVAAAMHDYDHPGRTNAFLVATNAPQAVLYNDRSVLENHHAASAWNLYLSRPEYNFLLHLDHVEFKRFRFLVIEAILATDLKKHFDFLAEFNAKANDVNSNGIEWSNENDRLLVCQVCIKLADINGPAKVRDLHLKWTEGIVNEFYEQGDEEANLGLPISPFMDRSSPQLAKLQESFITHIVGPLCNSYDAAGLLPGQWLEAEEDNDTESGDDEDGEELDTEDEEMENNLNPKPPRRKSRRRIFCQLMHHLTENHKIWK
1MGX , Knot 27 47 0.73 34 38 45
YNSGKLEEFVQGNLERECMEEKCSFEEAREVFENTERTTEFWKQYVD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5YWH_1)}(2) \setminus P_{f(1SOJ_1)}(2)|=77\), \(|P_{f(1SOJ_1)}(2) \setminus P_{f(5YWH_1)}(2)|=78\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100011100000000111001110111100110001000010100100101101010011001011111010100010010110100110010101110110010101101010001011010010000110001111011110100100110101101111001111100110110101010111001000100000001111100100100111001100100111011011111001111011001101010011010011001111000011111100110100000010001000011110011110001100100100000111101110111000001000110110000100000111110000010110110011100101110110011011000010100011011110101001100100000010100111
Pair \(Z_2\) Length of longest common subsequence
5YWH_1,1SOJ_1 155 3
5YWH_1,1MGX_1 214 3
1SOJ_1,1MGX_1 227 3

Newick tree

 
[
	1MGX_1:11.24,
	[
		5YWH_1:77.5,1SOJ_1:77.5
	]:41.74
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{865 }{\log_{20} 865}-\frac{420}{\log_{20}420})=118.\)
Status Protein1 Protein2 d d1/2
Query variables 5YWH_1 1SOJ_1 149 146.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]