CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7OIJ_1 4FCQ_1 4TXP_1 Letter Amino acid
53 15 6 G Glycine
67 16 6 S Serine
34 7 8 Y Tyrosine
66 11 6 V Valine
69 15 16 A Alanine
47 16 9 D Aspartic acid
67 17 15 K Lycine
36 7 6 M Methionine
37 20 8 I Isoleucine
137 18 17 L Leucine
52 6 7 P Proline
43 19 9 T Threonine
26 0 4 C Cysteine
51 11 7 Q Glutamine
87 27 10 E Glutamic acid
38 4 7 H Histidine
50 10 6 F Phenylalanine
18 1 1 W Tryptophan
60 8 8 R Arginine
46 8 7 N Asparagine

7OIJ_1|Chain A[auth AAA]|Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform|Mus musculus (10090)
>4FCQ_1|Chain A|Heat shock protein HSP 90-alpha|Homo sapiens (9606)
>4TXP_1|Chains A, B, C|Vacuolar protein sorting-associated protein VTA1 homolog|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7OIJ , Knot 396 1084 0.85 40 336 954
MSYHNHNHNHNHNHNDYDIPTTENLYFQGAMDLMPPGVDCPMEFWTKEESQSVVVDFLLPTGVYLNFPVSRNANLSTIKQVLWHRAQYEPLFHMLSDPEAYVFTCVNQTAEQQELEDEQRRLCDIQPFLPVLRLVAREENLYFQGGDRVKKLINSQISLLIGKGLHEFDSLRDPEVNDFRTKMRQFCEEAAAHRQQLGWVEWLQYSFPLQLEPSARGWRAGLLRVSNRALLVNVKFEGSEESFTFQVSTKDMPLALMACALRKKATVFRQPLVEQPEEYALQVNGRHEYLYGNYPLCHFQYICSCLHSGLTPHLTMVHSSSILAMRDEQSNPAPQVQKPRAKPPPIPAKKPSSVSLWSLEQPFSIELIEGRKVNADERMKLVVQAGLFHGNEMLCKTVSSSEVNVCSEPVWKQRLEFDISVCDLPRMARLCFALYAVVEKAKKARSTKKKSKKADCPIAWANLMLFDYKDQLKTGERCLYMWPSVPDEKGELLNPAGTVRGNPNTESAAALVIYLPEVAPHPVYFPALEKILELGRHGERGRITEEEQLQLREILERRGSGELYEHEKDLVWKMRHEVQEHFPEALARLLLVTKWNKHEDVAQMLYLLCSWPELPVLSALELLDFSFPDCYVGSFAIKSLRKLTDDELFQYLLQLVQVLKYESYLDCELTKFLLGRALANRKIGHFLFWHLRSEMHVPSVALRFGLIMEAYCRGSTHHMKVLMKQGEALSKLKALNDFVKVSSQKTTKPQTKEMMHMCMRQETYMEALSHLQSPLDPSTLLEEVCVEQCTFMDSKMKPLWIMYSSEEAGSAGNVGIIFKNGDDLRQDMLTLQMIQLMDVLWKQEGLDLRMTPYGCLPTGDRTGLIEVVLHSDTIANIQLNKSNMAATAAFNKDALLNWLKSKNPGEALDRAIEEFTLSCAGYCVATYVLGIGDRHSDNIMIRESGQLFHIDFGHFLGNFKTKFGINRERVPFILTYDFVHVIQQGKTNNSEKFERFRGYCERAYTILRRHGLLFLHLFALMRAAGLPELSCSKDIQYLKDSLALGKTEEEALKHFRVKFNEALRESWKTKVNWLAHNVSKDNRQ
4FCQ , Knot 104 236 0.80 38 157 227
MPEETQTQDQPMEEEEVETFAFQAEIAQLMSLIINTFYSNKEIFLRELISNSSDALDKIRYESLTDPSKLDSGKELHINLIPNKQDRTLTIVDTGIGMTKADLINNLGTIAKSGTKAFMEALQAGADISMIGQFGVGFYSAYLVAEKVTVITKHNDDEQYAWESSAGGSFTVRTDTGEPMGRGTKVILHLKEDQTEYLEERRIKEIVKKHSQFIGYPITLFVEKERDKEVSDDEAE
4TXP , Knot 79 163 0.82 40 132 157
SMAALAPLPPLPAQFKSIQHHLRTAQEHDKRDPVVAYYCRLYAMQTGMKIDSKTPECRKFLSKLMDQLEALKKQLGDNEAITQEIVGCAHLENYALKMFLYADNEDRAGRFHKNMIKSFYTASLLIDVITVFGELTDENVKHRKYARWKATYIHNCLKNGETP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7OIJ_1)}(2) \setminus P_{f(4FCQ_1)}(2)|=189\), \(|P_{f(4FCQ_1)}(2) \setminus P_{f(7OIJ_1)}(2)|=10\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000000000000001100001010111011111100110110000000111011110110101110001010010011100100011101100101011001000100001000000100101111110111000010101100100110001011110110010010010100100010010001110000111101100011101010101101111010001111010101000010101000011111110110001011001110010001101010000101001100100100010011010101100001111000000111010010101111110010010110100110101101001010001011101111010011000100001010001110001010101001101101011101110010010000000001001111101111000001001000101110110001011011101010100001111110110111011011110011011001001010000010100110001010100000011101000100011011101111001000001101101100110111101101101011000110111001001000011001101101100000100010011110111000110111101000101101110111110100010000101110010110010110011010000000100001101010000010110010011010011001010000110001011111000001101101111100100100011010110110111000110101010101101000111011100001101010000111011100011101100001101100110010100110011001111100000011100010110101101110100011100001111100011011001000000010010100001001100011111011111011111010000010010001111000001100101010011000100010111001000000
Pair \(Z_2\) Length of longest common subsequence
7OIJ_1,4FCQ_1 199 4
7OIJ_1,4TXP_1 240 4
4FCQ_1,4TXP_1 167 3

Newick tree

 
[
	7OIJ_1:11.79,
	[
		4FCQ_1:83.5,4TXP_1:83.5
	]:34.29
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1320 }{\log_{20} 1320}-\frac{236}{\log_{20}236})=286.\)
Status Protein1 Protein2 d d1/2
Query variables 7OIJ_1 4FCQ_1 365 220
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]