CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2FTX_1 5YQZ_1 8OIN_1 Letter Amino acid
5 37 12 G Glycine
3 34 3 F Phenylalanine
5 31 8 S Serine
1 19 1 Y Tyrosine
1 12 3 C Cysteine
2 23 9 Q Glutamine
5 14 4 H Histidine
2 24 10 I Isoleucine
13 73 24 L Leucine
1 19 1 W Tryptophan
6 35 9 R Arginine
9 25 6 D Aspartic acid
2 29 6 T Threonine
11 45 14 V Valine
11 45 29 A Alanine
2 26 4 N Asparagine
4 25 16 E Glutamic acid
1 29 19 K Lycine
3 12 4 M Methionine
3 18 16 P Proline

2FTX_1|Chain A|Hypothetical 25.2 kDa protein in AFG3-SEB2 intergenic region|Saccharomyces cerevisiae (4932)
>5YQZ_1|Chain A[auth R]|Glucagon receptor,Endolysin,Glucagon receptor|Homo sapiens (9606)
>8OIN_1|Chains A[auth B1], B[auth B2], C[auth B3], D[auth B4], E[auth B5], F[auth B6]|Mitochondrial ribosomal protein L12|Sus scrofa (9823)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2FTX , Knot 48 90 0.80 40 75 87
MNDAAEVALYERLLQLRVLPGASDVHDVRFVFGDDSRCWIEVAMHGDHVIGNSHPALDPKSRATLEHVLTVQGDLAAFLVVARDMLLASL
5YQZ , Knot 233 575 0.85 40 273 534
GAPQVMDFLFEKWKLYGDQCHHNLSLLPPPTELVCNRTFDKYSCWPDTPANTTANISCPWYLPWHHKVQHRFVFKRCGPDGQWVRGPRGQPWRDASQCQMDGEEIEVQKEVAKMYSSFQVMYTVGYSLSLGALLLALAILGGLSKLHCTANAIHANLFASFVLKASSVLVIDGLLRTRYSQKIGDDLSVSTWLSDGAVAGCRVAAVFMQYGIVANYCWLLVEGLYLHNLLGLATNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYERSFFSLYLGIGWGAPMLFVVPWAVVKCLFENVQCWTSNDNMGFWWILRFPVFLAILINFFIFVRIVQLLVAKLRARQMHHTDYKFRLAKSTLTLIPLLGVHEVVFAFVTDEHAQGTLRSAKLFFDLFLSSFQGLLVAVLYCFLNKEVQSELRRRWHRWRLGKVLWEERNTSNEFLEVLFQ
8OIN , Knot 89 198 0.79 40 125 185
MLPAAASSLWGPCFGLRAAALRVARHQGPRLCGVRLMRCSSHRKGEALAGAPLDNAPKEYPPKIQQLVQDIASLTLLEISDLNELLKKTLKIQDVGFMPMGAVAPGAPPAAAAPEAAEEDLPKRKEQTHFTVRLTEAKPVDKVKLIKEIKSHIQGINLVQAKKLVESLPQEIKANVPKAEAEKIKAALEAVGGTVVLE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2FTX_1)}(2) \setminus P_{f(5YQZ_1)}(2)|=14\), \(|P_{f(5YQZ_1)}(2) \setminus P_{f(2FTX_1)}(2)|=212\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110111000110101111100100101111000001101110100111000111010001010011010101111111100111101
Pair \(Z_2\) Length of longest common subsequence
2FTX_1,5YQZ_1 226 4
2FTX_1,8OIN_1 124 3
5YQZ_1,8OIN_1 210 4

Newick tree

 
[
	5YQZ_1:12.75,
	[
		2FTX_1:62,8OIN_1:62
	]:58.75
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{665 }{\log_{20} 665}-\frac{90}{\log_{20}90})=167.\)
Status Protein1 Protein2 d d1/2
Query variables 2FTX_1 5YQZ_1 210 119.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: