CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2INE_1 1SQB_1 4PDB_1 Letter Amino acid
13 22 3 Q Glutamine
9 15 8 H Histidine
18 17 11 I Isoleucine
34 50 13 L Leucine
25 14 12 K Lycine
20 19 5 P Proline
19 51 11 A Alanine
15 25 7 D Aspartic acid
11 17 4 Y Tyrosine
15 17 6 N Asparagine
11 14 2 F Phenylalanine
17 37 10 S Serine
15 26 6 T Threonine
6 5 1 W Tryptophan
16 35 13 G Glycine
5 9 7 M Methionine
23 30 10 E Glutamic acid
25 32 14 V Valine
11 32 12 R Arginine
7 13 0 C Cysteine

2INE_1|Chain A|Aldose reductase|Homo sapiens (9606)
>1SQB_1|Chain A|Ubiquinol-cytochrome C reductase complex core protein I, mitochondrial|Bos taurus (9913)
>4PDB_1|Chain A|30S ribosomal protein S8|Bacillus anthracis (260799)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2INE , Knot 141 315 0.85 40 209 303
ASRLLLNNGAKMPILGLGTWKSPPGQVTEAVKVAIDVGYRHIDCAHVYQNENEVGVAIQEKLREQVVKREELFIVSKLWCTYHEKGLVKGACQKTLSDLKLDYLDLYLIHWPTGFKPGKEFFPLDESGNVVPSDTNILDTWAAMEELVDEGLVKAIGISNFNHLQVEMILNKPGLKYKPAVNQIECHPYLTQEKLIQYCQSKGIVVTAYSPLGSPDRPWAKPEDPSLLEDPRIKAIAAKHNKTTAQVLIRFPMQRNLVVIPKSVTPERIAENFKVFDFELSSQDMTTLLSYNRNWRVCALLSCTSHKDYPFHEEF
1SQB , Knot 194 480 0.83 40 242 453
MAASAVCRAAGAGTRVLLRTRRSPALLRSSDLRGTATYAQALQSVPETQVSQLDNGLRVASEQSSQPTCTVGVWIDAGSRYESEKNNGAGYFVEHLAFKGTKNRPGNALEKEVESMGAHLNAYSTREHTAYYIKALSKDLPKAVELLADIVQNCSLEDSQIEKERDVILQELQENDTSMRDVVFNYLHATAFQGTPLAQSVEGPSENVRKLSRADLTEYLSRHYKAPRMVLAAAGGLEHRQLLDLAQKHFSGLSGTYDEDAVPTLSPCRFTGSQICHREDGLPLAHVAIAVEGPGWAHPDNVALQVANAIIGHYDCTYGGGAHLSSPLASIAATNKLCQSFQTFNICYADTGLLGAHFVCDHMSIDDMMFVLQGQWMRLCTSATESEVLRGKNLLRNALVSHLDGTTPVCEDIGRSLLTYGRRIPLAEWESRIAEVDARVVREVCSKYFYDQCPAVAGFGPIEQLPDYNRIRSGMFWLRF
4PDB , Knot 72 155 0.78 38 116 145
MGSSHHHHHHSSGLVPRGSHMASMVMTDPIADMLTRIRNANMVRHEKLEVPASKIKKEIAELLKREGFIRDVEYIEDNKQGILRIFLKYGANNERVITGLKRISKPGLRVYAKADEVPRVLNGLGIALVSTSKGVMTDKDARQLQTGGEVVAYVW

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2INE_1)}(2) \setminus P_{f(1SQB_1)}(2)|=73\), \(|P_{f(1SQB_1)}(2) \setminus P_{f(2INE_1)}(2)|=106\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111001101111111010011101001101110110001001010000001111100010001100001111001100000011101100001001010010101101101101100111100010111000011001111001100111011110010010101110011100011100100010100001100000011110100111010011101001011001010111100000010111011100011111001010011001011010100001001100000101011100000000110001
Pair \(Z_2\) Length of longest common subsequence
2INE_1,1SQB_1 179 4
2INE_1,4PDB_1 177 4
1SQB_1,4PDB_1 182 4

Newick tree

 
[
	1SQB_1:90.82,
	[
		2INE_1:88.5,4PDB_1:88.5
	]:2.32
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{795 }{\log_{20} 795}-\frac{315}{\log_{20}315})=130.\)
Status Protein1 Protein2 d d1/2
Query variables 2INE_1 1SQB_1 167 137
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]