CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8IHJ_1 3LZI_1 1TTC_1 Letter Amino acid
32 55 11 S Serine
16 31 0 Q Glutamine
38 48 5 F Phenylalanine
26 44 3 N Asparagine
16 8 1 C Cysteine
21 70 12 E Glutamic acid
23 14 4 H Histidine
40 74 5 I Isoleucine
33 70 8 K Lycine
39 53 12 A Alanine
24 46 4 R Arginine
11 25 2 M Methionine
21 43 8 P Proline
59 64 7 L Leucine
23 38 12 T Threonine
13 52 5 Y Tyrosine
32 48 11 V Valine
31 57 5 D Aspartic acid
22 51 10 G Glycine
10 12 2 W Tryptophan

8IHJ_1|Chain A[auth R]|Soluble cytochrome b562,Hydroxycarboxylic acid receptor 3|Escherichia coli (562)
>3LZI_1|Chain A|DNA polymerase|Enterobacteria phage RB69 (12353)
>1TTC_1|Chains A, B|Transthyretin|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8IHJ , Knot 220 530 0.86 40 268 509
MKTIIALSYIFCLVFADYKDDDDKADLEDNWETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLEDKSPDSPEMKDFRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRNAYIQKYLNRHHLQDHFLEIDKKNCCVFRDDFIAKVLPPVLGLEFIFGLLGNGLALWIFCFHLKSWKSSRIFLFNLAVADFLLIICLPFVMDYYVRRSDWKFGDIPCRLVLFMFAMNRQGSIIFLTVVAVDRYFRVVHPHHALNKISNWTAAIISCLLWGITVGLTVHLLKKKLLIQNGTANVCISFSICHTFRWHEAMFLLEFFLPLGIILFCSARIIWSLRQRQMDRHAKIKRAITFIMVVAIVFVICFLPSVVVRIHIFWLLHTSGTQNCEVYRSVDLAFFITLSFTYMNSMLDPVVYYFSSPSFPNFFSTLINRCLQRKITGEPDNNRSTSVELTGDPNKTRGAPEALIANSGEPWSPSYLGPTSNNHSKKGHCHQEPASLEKQLGCCIEENLYFQGSHHHHHH
3LZI , Knot 337 903 0.84 40 319 826
MKEFYLTVEQIGDSIFERYIDSNGRERTREVEYKPSLFAHCPESQATKYFDIYGKPCTRKLFANMRDASQWIKRMEDIGLEALGMDDFKLAYLSDTYNYEIKYDHTKIRVANFDIEVTSPDGFPEPSQAKHPIDAITHYDSIDDRFYVFDLLNSPYGNVEEWSIEIAAKLQEQGGDEVPSEIIDKIIYMPFDNEKELLMEYLNFWQQKTPVILTGWNVESFAIPYVYNRIKNIFGESTAKRLSPHRKTRVKVIENMYGSREIITLFGISVLDYIDLYKKFSFTNQPSYSLDYISEFELNVGKLKYDGPISKLRESNHQRYISYNIIAVYRVLQIDAKRQFINLSLDMGYYAKIQIQSVFSPIKTWDAIIFNSLKEQNKVIPQGRSHPVQPYPGAFVKEPIPNRYKYVMSFDLTSLYPSIIRQVNISPETIAGTFKVAPLHDYINAVAERPSDVYSCSPNGMMYYKDRDGVVPTEITKVFNQRKEHKGYMLAAQRNGEIIKEALHNPNLSVDEPLDVDYRFDFSDEIKEKIKKLSAKSLNEMLFRAQRTEVAGMTAQINRKLLINSLAGALGNVWFRYYDLRNATAITTFGQMALQWIERKVNEYLNEVCGTEGEAFVLYGDTDSIYVSADKIIDKVGESKFRDTNHWVDFLDKFARERMEPAIDRGFREMCEYMNNKQHLMFMDREAIAGPPLGSKGIGGFWTGKKRYALNVWDMEGTRYAEPKLKIMGLETQKSSTPKAVQKALKECIRRMLQEGEESLQEYFKEFEKEFRQLNYISIASVSSANNIAKYDVGGFPGPKCPFHIRGILTYNRAIKGNIDAPQVVEGEKVYVLPLREGNPFGDKCIAWPSGTEITDLIKDDVLHWMDYTVLLEKTFIKPLEGFTSAAKLDYEKKASLFDMFDF
1TTC , Knot 65 127 0.82 38 102 125
GPTGTGESKCPLMVKVLDAVRGSPAINVAMHVFRKAADDTWEPFASGKTSESGELHGLTTEEEFVEGIYKVEIDTKSYWKALGISPFHEHAEVVFTANDSGPRRYTIAALLSPYSYSTTAVVTNPKE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8IHJ_1)}(2) \setminus P_{f(3LZI_1)}(2)|=45\), \(|P_{f(3LZI_1)}(2) \setminus P_{f(8IHJ_1)}(2)|=96\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10011110011011110000000010100010010001011001001101001100101111010010110100001001010010011011110100110110010100101110010000010100010000100011010000001100011101111111101111111011111110101001000011110111101111101111100010000101101100111111110001011110111100010110100110010010111100111110111010110001110010101010101000101001111101111111111001011101000010001010011011111111111011101110101111100010000010001011111010100100110111001001011011001100010001010100000001010101000011101111001011010011100000000100000110100011001000101010000000
Pair \(Z_2\) Length of longest common subsequence
8IHJ_1,3LZI_1 141 4
8IHJ_1,1TTC_1 206 3
3LZI_1,1TTC_1 243 4

Newick tree

 
[
	1TTC_1:12.52,
	[
		8IHJ_1:70.5,3LZI_1:70.5
	]:53.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1433 }{\log_{20} 1433}-\frac{530}{\log_{20}530})=229.\)
Status Protein1 Protein2 d d1/2
Query variables 8IHJ_1 3LZI_1 297 234.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]