CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7YLM_1 5IXL_1 6ANS_1 Letter Amino acid
62 6 28 R Arginine
9 0 7 C Cysteine
80 5 17 S Serine
3 2 10 W Tryptophan
28 4 12 Y Tyrosine
102 11 33 L Leucine
58 5 28 T Threonine
57 7 46 A Alanine
56 6 12 N Asparagine
74 9 30 D Aspartic acid
98 8 17 E Glutamic acid
33 9 29 G Glycine
22 9 14 H Histidine
86 10 12 I Isoleucine
55 5 19 Q Glutamine
116 9 10 K Lycine
24 1 9 M Methionine
37 4 12 F Phenylalanine
32 3 22 P Proline
61 5 23 V Valine

7YLM_1|Chain A|Structural maintenance of chromosomes protein 5|Saccharomyces cerevisiae (4932)
>5IXL_1|Chains A, B, C, D, E, F, G, H|Endoribonuclease HigB|Proteus vulgaris (585)
>6ANS_1|Chains A, B, C, D|Uncharacterized protein|Burkholderia cenocepacia (strain ATCC BAA-245 / DSM 16553 / LMG 16656 / NCTC 13227 / J2315 / CF5610) (216591)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7YLM , Knot 398 1093 0.85 40 311 951
MTSLIDLGRYVERTHHGEDTEPRSKRVKIAKPDLSSFQPGSIIKIRLQDFVTYTLTEFNLSPSLNMIIGPNGSGKSTFVCAVCLGLAGKPEYIGRSKKVEDFIKNGQDVSKIEITLKNSPNVTDIEYIDARDETIKITRIITRSKRRSDYLINDYQVSESVVKTLVAQLNIQLDNLCQFLSQERVEEFARLKSVKLLVETIRSIDASLLDVLDELRELQGNEQSLQKDLDFKKAKIVHLRQESDKLRKSVESLRDFQNKKGEIELHSQLLPYVKVKDHKEKLNIYKEEYERAKANLRAILKDKKPFANTKKTLENQVEELTEKCSLKTDEFLKAKEKINEIFEKLNTIRDEVIKKKNQNEYYRGRTKKLQATIISTKEDFLRSQEILAQTHLPEKSVFEDIDIKRKEIINKEGEIRDLISEIDAKANAINHEMRSIQRQAESKTKSLTTTDKIGILNQDQDLKEVRDAVLMVREHPEMKDKILEPPIMTVSAINAQFAAYLAQCVDYNTSKALTVVDSDSYKLFANPILDKFKVNLRELSSADTTPPVPAETVRDLGFEGYLSDFITGDKRVMKMLCQTSKIHTIPVSRRELTPAQIKKLITPRPNGKILFKRIIHGNRLVDIKQSAYGSKQVFPTDVSIKQTNFYQGSIMSNEQKIRIENEIINLKNEYNDRKSTLDALSNQKSGYRHELSELASKNDDINREAHQLNEIRKKYTMRKSTIETLREKLDQLKREARKDVSQKIKDIDDQIQQLLLKQRHLLSKMASSMKSLKNCQKELISTQILQFEAQNMDVSMNDVIGFFNEREADLKSQYEDKKKFVKEMRDTPEFQSWMREIRSYDQDTKEKLNKVAEKYEEEGNFNLSFVQDVLDKLESEIAMVNHDESAVTILDQVTAELRELEHTVPQQSKDLETIKAKLKEDHAVLEPKLDDIVSKISARFARLFNNVGSAGAVRLEKPKDYAEWKIEIMVKFRDNAPLKKLDSHTQSGGERAVSTVLYMIALQEFTSAPFRVVDEINQGMDSRNERIVHKAMVENACAENTSQYFLITPKLLTGLHYHEKMRIHCVMAGSWIPNPSEDPKMIHFGETSNYSFD
5IXL , Knot 61 118 0.82 38 94 113
MGIKSFKHKGLKLLFEKGVTSGVPAQDVDRINDRLQAIDTATEIGELNRQIYKLHPLKGDREGYWSITVRANWRITFQFINGDAYILNYEDAHKLGPEQKLISEEDLNSAVDHHHHHH
6ANS , Knot 162 390 0.82 40 219 372
MAHHHHHHMGTLEAQTQGPGSMTDDTRATQLLSGQTWADFCDTLKRSGEQILRTDAPDDPLTRAEGFRYLSRLMRIALEMHVEFADGAWPGFFSPSHETAKIGADNPDNLYQYARVDGRCEYRVTGRRGTVAYLSFGTQKGGYETDGKMLQTGFLDAKQLEIAPDGSVEIVLSATPRAGNWVRMEPDTNALLVRQTFLDRRTETPAQLKIERIDAQARPAPLDPLALQGGLMRAAQFVEQTSKLFADWAASYRPHVNALPPADQALCQSVGGDPNIYYYHSCWSLAADEALVIDVDTVPDCDFWNVQLNNYWMESLDYRHFDICVNKHSARPNADGGVTVIVAATRPGSANWLDTAGHRTGTICWRWVGAAQPVHPRTRVVKLAALKEAA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7YLM_1)}(2) \setminus P_{f(5IXL_1)}(2)|=224\), \(|P_{f(5IXL_1)}(2) \setminus P_{f(7YLM_1)}(2)|=7\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001101100100000100001000010110101001011011010100110001001010101011111010100011011011111010011000010011001001001010100010100100101000010100110000000001100001000110011101010100100110000100110100101110010010101101100100101000010001010010110100000010001001001000010101000111010100000010100000001010101110000111000001000100100000100001101000100110010010001100000000001000010101100000110000111000110001100101000011000101001100101010110001001000100000010000011110000010010011111000101000110111101011010111011001000000110110000001110111001010100100100011111001001110101001101000110110000010011100001011010011010101011100110100110100010100011100101000010010110000010100011010000000000101100000100001001100000100010010010000010000100100010010001000100010010001001110000110011001001000000110001101010010101001111100001010000000001100100010100110010000000000100110000001010101100110010001111000001101100101010010001100000100101010000111010100110010101101100110111101001000101010111010001110010000001100110011011110010011101100100110000001100111001010000001110101101100000101001111011101000101101100000010
Pair \(Z_2\) Length of longest common subsequence
7YLM_1,5IXL_1 231 4
7YLM_1,6ANS_1 164 4
5IXL_1,6ANS_1 191 6

Newick tree

 
[
	5IXL_1:11.83,
	[
		7YLM_1:82,6ANS_1:82
	]:30.83
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1211 }{\log_{20} 1211}-\frac{118}{\log_{20}118})=297.\)
Status Protein1 Protein2 d d1/2
Query variables 7YLM_1 5IXL_1 372 205
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]