CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7RFZ_1 2KYU_1 3OLF_1 Letter Amino acid
36 3 9 V Valine
49 2 13 A Alanine
6 8 3 C Cysteine
35 3 7 G Glycine
15 2 11 H Histidine
24 5 15 K Lycine
8 3 9 M Methionine
15 1 12 F Phenylalanine
28 7 11 D Aspartic acid
25 1 14 Q Glutamine
38 1 15 I Isoleucine
55 4 31 L Leucine
6 4 5 Y Tyrosine
13 2 9 R Arginine
33 6 22 E Glutamic acid
25 2 10 P Proline
33 6 13 S Serine
27 4 9 N Asparagine
32 2 13 T Threonine
7 1 2 W Tryptophan

7RFZ_1|Chain A|Importin subunit alpha-1|Mus musculus (10090)
>2KYU_1|Chain A|Histone-lysine N-methyltransferase MLL|Homo sapiens (9606)
>3OLF_1|Chains A, C|Bile acid receptor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7RFZ , Knot 206 510 0.84 40 248 467
MHHHHHHSSGLVPRGSGMLETAAALFERNHMDSPDLGTDDDDLAMADIGSNQGTVNWSVEDIVKGINSNNLESQLQATQAARKLLSREKQPPIDNIIRAGLIPKFVSFLGKTDCSPIQFESAWALTNIASGTSEQTKAVVDGGAIPAFISLLASPHAHISEQAVWALGNIAGDGSAFRDLVIKHGAIDPLLALLAVPDLSTLACGYLRNLTWTLSNLCRNKNPAPPLDAVEQILPTLVRLLHHNDPEVLADSCWAISYLTDGPNERIEMVVKKGVVPQLVKLLGATELPIVTPALRAIGNIVTGTDEQTQKVIDAGALAVFPSLLTNPKTNIQKEATWTMSNITAGRQDQIQQVVNHGLVPFLVGVLSKADFKTQKEAAWAITNYTSGGTVEQIVYLVHCGIIEPLMNLLSAKDTKIIQVILDAISNIFQAAEKLGETEKLSIMIEECGGLDKIEALQRHENESVYKASLNLIEKYFSVEEEEDQNVVPETTSEGFAFQVQDGAPGTFNF
2KYU , Knot 39 67 0.81 40 56 64
GSAKGNFCPLCDKCYDDDDYESKMMQCGKCDRWVHSKCENLSDEMYEILSNLPESVAYTCVNCTERH
3OLF , Knot 105 233 0.81 40 158 223
GSHMELTPDQQTLLHFIMDSYNKQRMPQEITNKILKEAFSAEENFLILTEMATNHVQVLVEFTKKLPGFQTLDHEDQIALLKGSAVEAMFLRSAEIFNKKLPSGHSDLLEARIRNSGISDEYITPMFSFYKSIGELKMTQEEYALLTAIVILSPDRQYIKDREAVEKLQEPLLDVLQKLCKIHQPENPQHFACLLGRLTELRTFNHHHAEMLMSWRVNDHKFTPLLCEIWDVQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7RFZ_1)}(2) \setminus P_{f(2KYU_1)}(2)|=217\), \(|P_{f(2KYU_1)}(2) \setminus P_{f(7RFZ_1)}(2)|=25\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000111101011100111110000100101100000111101100010101010011011000010001010011001100000111001101111101101110000011010011110011010000001110111111110111010101000111111011101011001110011101111111110100110101001010100100000111110110011101101100001011100011100100110001011100111101101111001111011101110110100000001101111111101100100010001010100101100001001100111111111100101000001111100000110100110110011101110110100001101110110011011001100001011100011100101100000001001010110001010000000111000001111010011110101
Pair \(Z_2\) Length of longest common subsequence
7RFZ_1,2KYU_1 242 4
7RFZ_1,3OLF_1 172 3
2KYU_1,3OLF_1 172 3

Newick tree

 
[
	2KYU_1:11.57,
	[
		7RFZ_1:86,3OLF_1:86
	]:24.57
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{577 }{\log_{20} 577}-\frac{67}{\log_{20}67})=152.\)
Status Protein1 Protein2 d d1/2
Query variables 7RFZ_1 2KYU_1 192 108
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]