CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8ZJF_1 4UOH_1 6UJM_1 Letter Amino acid
53 8 17 R Arginine
49 4 19 Q Glutamine
28 2 13 H Histidine
23 12 19 I Isoleucine
25 6 28 F Phenylalanine
44 7 30 P Proline
72 9 44 S Serine
7 4 3 W Tryptophan
49 11 37 V Valine
58 13 33 E Glutamic acid
7 6 16 M Methionine
37 7 25 T Threonine
56 10 31 A Alanine
22 3 22 N Asparagine
41 7 29 D Aspartic acid
75 14 45 G Glycine
20 12 38 K Lycine
16 5 15 Y Tyrosine
55 3 13 C Cysteine
89 8 50 L Leucine

8ZJF_1|Chain A[auth B]|Integrin beta-7|Homo sapiens (9606)
>4UOH_1|Chains A, B, C|NUCLEOSIDE DIPHOSPHATE KINASE|LITOPENAEUS VANNAMEI (6689)
>6UJM_1|Chains A, B, C, D|Glutaminase kidney isoform, mitochondrial|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8ZJF , Knot 307 826 0.83 40 282 727
VALPMVLVLLLVLSRGESELDAKIPSTGDATEWRNPHLSMLGSCQPAPSCQKCILSHPSCAWCKQLNFTASGEAEARRCARREELLARGCPLEELEEPRGQQEVLQDQPLSQGARGEGATQLAPQRVRVTLRPGEPQQLQVRFLRAEGYPVDLYYLMDLSYSMKDDLERVRQLGHALLVRLQEVTHSVRIGFGSFVDKTVLPFVSTVPSKLRHPCPTRLERCQSPFSFHHVLSLTGDAQAFEREVGRQSVSGNLDSPEGGFDAILQAALCQEQIGWRNVSRLLVFTSDDTFHTAGDGKLGGIFMPSDGHCHLDSNGLYSRSTEFDYPSVGQVAQALSAANIQPIFAVTSAALPVYQELSKLIPKSAVGELSEDSSNVVQLIMDAYNSLSSTVTLEHSSLPPGVHISYESQCEGPEKREGKAEDRGQCNHVRINQTVTFWVSLQATHCLPEPHLLRLRALGFSEELIVELHTLCDCNCSDTQPQAPHCSDGQGHLQCGVCSCAPGRLGRLCECSVAELSSPDLESGCRAPNGTGPLCSGKGHCQCGRCSCSGQSSGHLCECDDASCERHEGILCGGFGRCQCGVCHCHANRTGRACECSGDMDSCISPEGGLCSGHGRCKCNRCQCLDGYYGALCDQCPGCKTPCERHRDCAECGAFRTGPLATNCSTACAHTNVTLALAPILDDGWCKERTLDNQLFFFLVEDDARGTVVLRVRPQEKGADHTQAIVLGCVGGIVAVGLGLVLAYRLSVEIYDRREYSRFEKEQQQLNWKQDSNPLYKSAITTTINPRFQEADSPTLGSGSGLNDIFEAQKIEWHEGSGSENLYFQ
4UOH , Knot 74 151 0.82 40 127 149
MVRERTFIAVKPDGVQRGLIGEIIKRFEAKGFKLAGMKYIQASEDLLKQHYIDLADKPFYPGLCKYMSSGPVVAMCWEGTGVVKTARVMMGETRPADSKPGTIRGDFCIEVGRNIIHGSDSVESANKEIALWFKPEELVSWTQTNESWIYE
6UJM , Knot 216 527 0.85 40 261 493
LSSSPSEILQELGKGSTHPQPGVSPPAAPAAPGPKDGPGETDAFGNSEGKELVASGENKIKQGLLPSLEDLLFYTIAEGQEKIPVHKFITALKSTGLRTSDPRLKECMDMLRLTLQTTSDGVMLDKDLFKKCVQSNIVLLTQAFRRKFVIPDFMSFTSHIDELYESAKKQSGGKVADYIPQLAKFSPDLWGVSVCTADGQRHSTGDTKVPFCLQSCVKPLKYAIAVNDLGTEYVHRYVGKEPSGLRFNKLFLNEDDKPHNPMVNAGAIVVTSLIKQGVNNAEKFDYVMQFLNKMAGNEYVGFSNATFQSERESGDRNFAIGYYLKEKKCFPEGTDMVGILDFYFQLCSIEVTCESASVMAATLANGGFCPITGERVLSPEAVRNTLSLMHSCGMYDFSGQFAFHVGLPAKSGVAGGILLVVPNVMGMMCWSPPLDKMGNSVKGIHFCHDLVSLCNFHNYDNLRHFAKKLDPRREGGDQRHSFGPLDYESLQQELALKETVWKKVSPESNEDISTTVVYRMESLGEKS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8ZJF_1)}(2) \setminus P_{f(4UOH_1)}(2)|=191\), \(|P_{f(4UOH_1)}(2) \setminus P_{f(8ZJF_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111111111111001000101011001010010010101110001110000011001001100010101010101000100001110101100100101000110001100110101100111001010101101001010110101011010011010001000100100110111101001000101111011000111110011001001010010000011010011010101011000110001010100101110111011100001110010011110000010011010111111100100010001100000010010110110110110101111100111110001001110011101000000110111010001000101000011111010000000110000101000100001010001011101010001101011010111100011101001000000000101100001010100110001110110100001101001010010011010111001010000100000100010100000100000011101111000011000010001010000101000101011100101000000000101001110000110001000000010011100111100000101000101111111001100000100011111100010101110101000110000111110111111111111110010101000000001000000101000001100011000101010010010110101100110100101001010001010
Pair \(Z_2\) Length of longest common subsequence
8ZJF_1,4UOH_1 227 4
8ZJF_1,6UJM_1 151 4
4UOH_1,6UJM_1 196 4

Newick tree

 
[
	4UOH_1:11.41,
	[
		8ZJF_1:75.5,6UJM_1:75.5
	]:38.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{977 }{\log_{20} 977}-\frac{151}{\log_{20}151})=227.\)
Status Protein1 Protein2 d d1/2
Query variables 8ZJF_1 4UOH_1 287 170
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]