CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4XMW_1 7LGN_1 8GJX_1 Letter Amino acid
37 40 8 P Proline
49 51 6 T Threonine
88 96 15 A Alanine
60 55 14 E Glutamic acid
19 29 2 M Methionine
40 18 5 F Phenylalanine
93 86 27 L Leucine
41 37 14 S Serine
53 56 15 R Arginine
8 10 4 C Cysteine
38 78 9 G Glycine
41 55 9 I Isoleucine
35 34 4 K Lycine
52 84 7 V Valine
44 34 10 N Asparagine
62 63 11 D Aspartic acid
41 31 14 Q Glutamine
22 25 3 H Histidine
13 3 1 W Tryptophan
31 24 11 Y Tyrosine

4XMW_1|Chain A|Aminopeptidase N|Escherichia coli K-12 (83333)
>7LGN_1|Chains A, B|Cyanophycin synthase|Tatumella morbirosei (642227)
>8GJX_1|Chains A, B|Stimulator of interferon genes protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4XMW , Knot 324 867 0.84 40 309 789
QPQAKYRHDYRAPDYQITDIDLTFDLDAQKTVVTAVSQAVRHGASDAPLRLNGEDLKLVSVHINDEPWTAWKEEEGALVISNLPERFTLKIINEISPAANTALEGLYQSGDALCTQCEAEGFRHITYYLDRPDVLARFTTKIIADKIKYPFLLSNGNRVAQGELENGRHWVQWQDPFPKPCYLFALVAGDFDVLRDTFTTRSGREVALELYVDRGNLDRAPWAMTSLKNSMKWDEERFGLEYDLDIYMIVAVDFFNAGAMENKGLNIFNSKYVLARTDTATDKDYLDIERVIGHEYFHNWTGNRVTCRDWFQLSLKEGLTVFRDQEFSSDLGSRAVNRINNVRTMRGLQFAEDASPMAHPIRPDMVIEMNNFYTLTVYEKGAEVIRMIHTLLGEENFQKGMQLYFERHDGSAATCDDFVQAMEDASNVDLSHFRRWYSQSGTPIVTVKDDYNPETEQYTLTISQRTPATPDQAEKQPLHIPFAIELYDNEGKVIPLQKGGHPVNSVLNVTQAEQTFVFDNVYFQPVPALLCEFSAPVKLEYKWSDQQLTFLMRHARNDFSRWDAAQSLLATYIKLNVARHQQGQPLSLPVHVADAFRAVLLDEKIDPALAAEILTLPSVNEMAELFDIIDPIAIAEVREALTRTLATELADELLAIYNANYQSEYRVEHEDIAKRTLRNACLRFLAFGETHLADVLVSKQFHEANNMTDALAALSAAVAAQLPCRDALMQEYDDKWHQNGLVMDKWFILQATSPAANVLETVRGLLQHRSFTMSNPNRIRSLIGAFAGSNPAAFHAEDGSGYLFLVEMLTDLNSRNPQVASRLIEPLIRLKRYDAKRQEKMRAALEQLKGLENLSGDLYEKITKALA
7LGN , Knot 338 909 0.84 40 296 805
MNIISTSVFVGPNTFARTPLIRLTVDIDPHYAEKLNTLGSEVYQALDQVVPGMSSDPVEQAPGMLIARLALKLQHLAGMEGGIAFTSTSQADDEAEVLYSYETEDIGLEAGEVACDMLVALARAEADVRAVDLSHHIARYLRYADKRTLGPSAMELVKAAQERDIPWYRMNDASLIQVGQGKYQKRIEAALTSKTSHIAVEIAADKNMCNQLLGDLGLPVPKQRVVYDEDEAVSAANRIGYPVVVKPLDGNHGRGVSVSLTDEQAVKKAYGLAEPEGSAVIVESMIRGDDHRLLVVNGELVAAARRVPGHVAGDGIHTIRELIALVNQDPRRGVGHENVLTRLELDEQAIRLLQSYGYTADSIPPSGEEVYLRKTANISTGGTAVDVTDVIHPDNKLMAERAILAVGLDVGAVDFLTTDITKSYRETLGAICEINAGPGLRMHISPSEGKPRDVGGKIMDMLFPAGSQCRVPIAALTGTNGKTTCARMLSHILKMAGHVVGQTSTDAVLIDGNVTVKGDMTGPVSAKMVLRDPSVDIAVLETARGGIVRSGLGYMFCDVGAVLNVTSDHLGLGGVDTLDELAKVKRVIAEVTRDTVVLNADNEYTLKMAAHSPAKHIMYVTRNPEHTLVREHIRLGKRAVVLEQGLNGEQIVIYDNGMQIPLTWTHLIPATLEGKALHNVENAMFAAGMAYALGKTLDQIRSGLRTFDNTFFQSPGRMNVFDGHGFRVILDYGHNEAAIGAMVELVGRLNPQGRRLVAVTCPGDRRDEDVAAIAAKVAGHFDSYICHRDDDLRDRGPDEMPRLMKQALMDRGVKEEAIQIVEQEVDALSTLLKMANRNDLVLFFCENITRCWKQIINFKPAFGEALPAAAPEVLPVQTTDIPAGYQITQGERGVLIIPNDAAAENLYFQ
8GJX , Knot 88 189 0.81 40 131 185
SNVAHGLAWSYYIGYLRLILPELQARIRTYNQHYNNLLRGAVSQRLYILLPLDCGVPDNLSMADPNIRFLDKLPQQTGDRAGIKDRVYSNSIYELLENGQRAGTCVLEYATPLQTLFAMSQYSQAGFSREDRLEQAKLFCRTLEDILADAPESQNNCRLIAYQEPADDSSFSLSQEVLRHLRQEEKEEV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4XMW_1)}(2) \setminus P_{f(7LGN_1)}(2)|=57\), \(|P_{f(7LGN_1)}(2) \setminus P_{f(4XMW_1)}(2)|=44\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010100000001100010010101010100011011001100110011101010010110101000110110000111110011001010110010111001101100010110000010110010001001011101000111001001111001001101010010011010011101001111111010110001000010011101010010100111110010001010000111000101011111011011110001101100001110000100000101001110001001010010000110101001101100001000110011001001001011011001011101101011101001001010001101101100111000100110101000010110000110110010010100100100001011101000001000000101000011010010001101111101000010111100110110011010010001110010101111110010111010001000010111001000100101100111001010110000101101110110110111100010111110110110100110110110111110100110001100110011110010000000100001100010010101111100011011100010010010011111011111011000111000000100011110011110100111011001011100001010010010011111110011110100101011110110010000101100110111010000100000101110010110010101000100111
Pair \(Z_2\) Length of longest common subsequence
4XMW_1,7LGN_1 101 5
4XMW_1,8GJX_1 208 4
7LGN_1,8GJX_1 211 3

Newick tree

 
[
	8GJX_1:11.39,
	[
		4XMW_1:50.5,7LGN_1:50.5
	]:66.89
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1776 }{\log_{20} 1776}-\frac{867}{\log_{20}867})=222.\)
Status Protein1 Protein2 d d1/2
Query variables 4XMW_1 7LGN_1 281 274.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]