CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7DCR_1 7APT_1 5GWL_1 Letter Amino acid
27 4 0 Y Tyrosine
47 2 0 Q Glutamine
24 2 0 M Methionine
29 6 0 F Phenylalanine
38 6 0 P Proline
56 8 0 S Serine
44 7 0 D Aspartic acid
13 2 4 C Cysteine
45 15 2 G Glycine
70 16 0 K Lycine
50 3 0 R Arginine
43 4 0 N Asparagine
14 3 0 H Histidine
45 9 0 V Valine
66 6 2 T Threonine
5 1 0 W Tryptophan
42 7 0 A Alanine
68 10 0 E Glutamic acid
63 8 0 I Isoleucine
87 9 0 L Leucine

7DCR_1|Chain A[auth x]|PRP2 isoform 1|Saccharomyces cerevisiae (4932)
>7APT_1|Chain A|Peptidyl-prolyl cis-trans isomerase FKBP5|Homo sapiens (9606)
>5GWL_1|Chain A|DNA (5'-D(*CP*CP*TP*GP*CP*CP*TP*G)-3')|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7DCR , Knot 328 876 0.84 40 297 798
MSSITSETGKRRVKRTYEVTRQNDNAVRIEPSSLGEEEDKEAKDKNSALQLKRSRYDPNKVFSNTNQGPEKNNLKGEQLGSQKKSSKYDEKITSNNELTTKKGLLGDSENETKYASSNSKFNVEVTHKIKNAKEIDKINRQRMWEEQQLRNAMAGQSDHPDDITLEGSDKYDYVFDTDAMIDYTNEEDDLLPEEKLQYEARLAQALETEEKRILTIQEARKLLPVHQYKDELLQEIKKNQVLIIMGETGSGKTTQLPQYLVEDGFTDQGKLQIAITQPRRVAATSVAARVADEMNVVLGKEVGYQIRFEDKTTPNKTVLKYMTDGMLLREFLTDSKLSKYSCIMIDEAHERTLATDILIGLLKDILPQRPTLKLLISSATMNAKKFSEFFDNCPIFNVPGRRYPVDIHYTLQPEANYIHAAITTIFQIHTTQSLPGDILVFLTGQEEIERTKTKLEEIMSKLGSRTKQMIITPIYANLPQEQQLKIFQPTPENCRKVVLATNIAETSLTIDGIRYVIDPGFVKENSYVPSTGMTQLLTVPCSRASVDQRAGRAGRVGPGKCFRIFTKWSYLHELELMPKPEITRTNLSNTVLLLLSLGVTDLIKFPLMDKPSIPTLRKSLENLYILGALNSKGTITRLGKMMCEFPCEPEFAKVLYTAATHEQCQGVLEECLTIVSMLHETPSLFIGQKRDAAASVLSEVESDHILYLEIFNQWRNSKFSRSWCQDHKIQFKTMLRVRNIRNQLFRCSEKVGLVEKNDQARMKIGNIAGYINARITRCFISGFPMNIVQLGPTGYQTMGRSSGGLNVSVHPTSILFVNHKEKAQRPSKYVLYQQLMLTSKEFIRDCLVIPKEEWLIDMVPQIFKDLIDDKTNRGRR
7APT , Knot 65 128 0.82 40 104 125
GAPATVTEQGEDITSKKDRGVLKIVKRVGNGEETPMIGDKVYVHYKGKLSNGKKFDSSHDRNEPFVFSLGKGQVIKAWDIGVATMKKGEICHLLCKPEYAYGSAGSLPKIPSNATLFFEIELLDFKGE
5GWL , Knot 4 8 0.34 6 4 4
CCTGCCTG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7DCR_1)}(2) \setminus P_{f(7APT_1)}(2)|=213\), \(|P_{f(7APT_1)}(2) \setminus P_{f(7DCR_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100001000100000100000011010100110000001000001101000000100110000011000010100110000000000010000010000111100000000100000101010001001001001000011000010011110000100101010000001100011100000000111000100010110110000001101001001111000000110010000111111001010000110011001100010101110010011100111011001011110011001010000010001100100111100110000100000111001000011001111110011100101011100101010010011000111011100011010001010100101110011010000011101111101000100000010011001100000111011010110000101101010000011110011000101011001101111000001100110011011000101000110110111100101100100100101110101000010001111101110011011110010110100010010111110001010011011001100101101100110000001110001011011000101111000011101100100001101011001000010001000001010011010010001100000111100000101011011101010100011011110110111010001100011101010100111100000100100011000111000011000111100011101110110011000000100
Pair \(Z_2\) Length of longest common subsequence
7DCR_1,7APT_1 233 3
7DCR_1,5GWL_1 299 2
7APT_1,5GWL_1 108 1

Newick tree

 
[
	7DCR_1:15.57,
	[
		7APT_1:54,5GWL_1:54
	]:97.57
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1004 }{\log_{20} 1004}-\frac{128}{\log_{20}128})=242.\)
Status Protein1 Protein2 d d1/2
Query variables 7DCR_1 7APT_1 306 174
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]