CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9FIV_1 1VZT_1 6DDC_1 Letter Amino acid
27 20 28 E Glutamic acid
12 18 23 I Isoleucine
13 15 24 A Alanine
14 12 18 N Asparagine
2 8 5 W Tryptophan
17 15 31 Y Tyrosine
27 24 30 V Valine
22 11 15 Q Glutamine
22 15 34 G Glycine
31 26 36 K Lycine
9 9 16 M Methionine
13 18 33 F Phenylalanine
17 11 46 S Serine
16 10 32 R Arginine
30 17 38 D Aspartic acid
7 3 7 C Cysteine
13 10 22 H Histidine
34 20 60 L Leucine
13 13 23 P Proline
16 14 33 T Threonine

9FIV_1|Chains A, B|Ubiquitin carboxyl-terminal hydrolase 7|Homo sapiens (9606)
>1VZT_1|Chains A, B|N-ACETYLLACTOSAMINIDE ALPHA-1,3-GALACTOSYLTRANSFERASE|BOS TAURUS (9913)
>6DDC_1|Chains A, B|Cytosolic purine 5'-nucleotidase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9FIV , Knot 151 355 0.83 40 212 347
GSKKHTGYVGLKNQGATCYMNSLLQTLFFTNQLRKAVYMMPTEGDDSSKSVPLALQRVFYELQHSDKPVGTKKLTKSFGWETLDSFMQHDVQELCRVLLDNVENKMKGTCVEGTIPKLFRGKMVSYIQCKEVDYRSDRREDYYDIQLSIKGKKNIFESFVDYVAVEQLDGDNKYDAGEHGLQEAEKGVKFLTLPPVLHLQLMRAMYDPQTDQNIKINDRFEFPEQLPLDEFLQKTDPKDPANYILHAVLVHSGDNHGGHYVVYLNPKGDGKWCKFDDDVVSRCTKEEAIEHNYGGHDDDLSVRHCTNAYMLVYIRESKLSEVLQAVTDHDIPQQLVERLQEEKRIEAQKRKERQE
1VZT , Knot 136 289 0.89 40 200 286
ESKLKLSDWFNPFKRPEVVTMTKWKAPVVWEGTYNRAVLDNYYAKQKITVGLTVFAVGRYIEHYLEEFLTSANKHFMVGHPVIFYIMVDDVSRMPLIELGPLRSFKVFKIKPEKRWQDISMMRMKTIGEHIVAHIQHEVDFLFCMDVDQVFQDKFGVETLGESVAQLQAGWYKADPNDFTYERRKESAAYIPFGEGDFYYHAAIFGGTPTQVLNITQECFKGILKDKKNDIEAQWHDESHLNKYFLLNKPTKILSPEYCWDYHIGLPADIKLVKMSWQTKEYNVVRNNV
6DDC , Knot 224 554 0.85 40 276 516
GSSHHHHHHSSGLVPRGSMSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVERLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERFYILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKYVVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGTVLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPELAQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLFASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHESTVEHTHVDINEMESPLATRNRTSVDFKDTDYKRHQLTRSISEIKPPNL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9FIV_1)}(2) \setminus P_{f(1VZT_1)}(2)|=86\), \(|P_{f(1VZT_1)}(2) \setminus P_{f(9FIV_1)}(2)|=74\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000001011100011000100110011100010011011100100000011111001100100000111000100011100100110001001001110010001010010101101101011001000010000000000001010101000110011001110010100000110011001001101101111101011011001000001010001011001110011000010011001101111001000110011010101010100100011000000011000011000010100000101110100001001101100001100110010000010100000000
Pair \(Z_2\) Length of longest common subsequence
9FIV_1,1VZT_1 160 4
9FIV_1,6DDC_1 156 4
1VZT_1,6DDC_1 174 4

Newick tree

 
[
	1VZT_1:85.35,
	[
		9FIV_1:78,6DDC_1:78
	]:7.35
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{644 }{\log_{20} 644}-\frac{289}{\log_{20}289})=98.9\)
Status Protein1 Protein2 d d1/2
Query variables 9FIV_1 1VZT_1 125 114
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]