CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1HQZ_1 6MYT_1 7XUJ_1 Letter Amino acid
11 36 51 S Serine
3 24 23 Q Glutamine
11 41 22 E Glutamic acid
7 61 46 G Glycine
3 21 11 H Histidine
11 50 82 L Leucine
1 17 10 C Cysteine
5 24 20 Y Tyrosine
7 41 68 V Valine
15 54 53 A Alanine
5 39 28 R Arginine
10 30 68 I Isoleucine
8 25 42 K Lycine
6 22 49 F Phenylalanine
2 5 5 W Tryptophan
6 19 26 N Asparagine
12 32 38 D Aspartic acid
2 12 10 M Methionine
8 25 32 P Proline
8 43 34 T Threonine

1HQZ_1|Chains A[auth 1], B[auth 2], C[auth 3], D[auth 4], E[auth 5], F[auth 6], G[auth 7], H[auth 8], I[auth 9]|ACTIN-BINDING PROTEIN|Saccharomyces cerevisiae (4932)
>6MYT_1|Chain A|Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial|Gallus gallus (9031)
>7XUJ_1|Chains A, B|Chloride anion exchanger|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1HQZ , Knot 69 141 0.80 40 113 136
MALEPIDYTTHSREIDAEYLKIVRGSDPDTTWLIISPNAKKEYEPESTGSSFHDFLQLFDETKVQYGLARVSPPGSDVEKIIIIGWCPDSAPLKTRASFAANFAAVANNLFKGYHVQVTARDEDDLDENELLMKISNAAGA
6MYT , Knot 249 621 0.86 40 280 584
STKVSDSISTQYPVVDHEFDAVVVGAGGAGLRAAFGLSEAGFNTACVTKLFPTRSHTVAAQGGINAALGNMEDDNWRWHFYDTVKGSDWLGDQDAIHYMTEQAPAAVIELENYGMPFSRTEEGKIYQRAFGGQSLQFGKGGQAHRCCCVADRTGHSLLHTLYGRSLRYDTSYFVEYFALDLLMENGECRGVIALCIEDGTIHRFRAKNTVIATGGYGRTYFSCTSAHTSTGDGTAMVTRAGLPCQDLEFVQFHPTGIYGAGCLITEGCRGEGGILINSQGERFMERYAPVAKDLASRDVVSRSMTIEIREGRGCGPEKDHVYLQLHHLPPQQLATRLPGISETAMIFAGVDVTKEPIPVLPTVHYNMGGIPTNYKGQVITHVNGEDKVVPGLYACGEAASASVHGANRLGANSLLDLVVFGRACALTIAETCKPGEPVPSIKPNAGEESVANLDKLRFADGTIRTSEARLNMQKTMQSHAAVFRTGSILQEGCEKLSQIYRDLAHLKTFDRGIVWNTDLVETLELQNLMLCALQTIYGAEARKESRGAHAREDYKLRIDEFDYSKPLQGQQKRPFEEHWRKHTLSYVDVKSGKVTLKYRPVIDRTLNEEDCSSVPPAIRSY
7XUJ , Knot 272 718 0.83 40 273 636
QYIVARPVYSTNAFEENHKKTGRHHKTFLDHLKVCCSCSPQKAKRIVLSLFPIASWLPAYRLKEWLLSDIVSGISTGIVAVLQGLAFALLVDIPPVYGLYASFFPAIIYLFFGTSRHISVGPFPILSMMVGLAVSGAVSKAVPDRNATTLGLPNNSNNSSLLDDERVRVAAAASVTVLSGIIQLAFGILRIGFVVIYLSESLISGFTTAAAVHVLVSQLKFIFQLTVPSHTDPVSIFKVLYSVFSQIEKTNIADLVTALIVLLVVSIVKEINQRFKDKLPVPIPIEFIMTVIAAGVSYGCDFKNRFKVAVVGDMNPGFQPPITPDVETFQNTVGDCFGIAMVAFAVAFSVASVYSLKYDYPLDGNQELIALGLGNIVCGVFRGFAGSTALSRSAVQESTGGKTQIAGLIGAIIVLIVVLAIGFLLAPLQKSVLAALALGNLKGMLMQFAEIGRLWRKDKYDCLIWIMTFIFTIVLGLGLGLAASVAFQLLTIVFRTQFPKCSTLANIGRTNIYKNKKDYYDMYEPEGVKIFRCPSPIYFANIGFFRRKLIDAVGFSPLRILRKRNKALRKIRKLQKQGLLQVTPKGFICTVDTIKDSDEELDNNQIEVLDQPINTTDLPFHIDWNDDLPLNIEVPKISLHSLILDFSAVSFLDVSSVRGLKSILQEFIRIKVDVYIVGTDDDFIEKLNRYEFFDGEVKSSIFFLTIHDAVLHILMKKD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1HQZ_1)}(2) \setminus P_{f(6MYT_1)}(2)|=26\), \(|P_{f(6MYT_1)}(2) \setminus P_{f(1HQZ_1)}(2)|=193\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111011000000001010010110100100011110101000001000100100110110000100111010111001001111110100111000101110111110011010010101000001000011101001111
Pair \(Z_2\) Length of longest common subsequence
1HQZ_1,6MYT_1 219 3
1HQZ_1,7XUJ_1 196 4
6MYT_1,7XUJ_1 129 5

Newick tree

 
[
	1HQZ_1:11.05,
	[
		7XUJ_1:64.5,6MYT_1:64.5
	]:49.55
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{762 }{\log_{20} 762}-\frac{141}{\log_{20}141})=175.\)
Status Protein1 Protein2 d d1/2
Query variables 1HQZ_1 6MYT_1 225 136.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]