CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1QTZ_1 7CLT_1 1URJ_1 Letter Amino acid
1 2 26 H Histidine
3 4 61 P Proline
9 4 84 V Valine
13 7 75 R Arginine
12 2 44 N Asparagine
1 0 19 C Cysteine
11 13 97 G Glycine
6 2 27 Y Tyrosine
8 13 64 E Glutamic acid
13 13 30 K Lycine
6 7 72 S Serine
3 0 8 W Tryptophan
12 4 60 T Threonine
9 9 49 D Aspartic acid
5 3 37 Q Glutamine
10 4 32 I Isoleucine
5 6 28 M Methionine
16 14 152 A Alanine
16 17 115 L Leucine
5 12 56 F Phenylalanine

1QTZ_1|Chain A|PROTEIN (T4 LYSOZYME)|Enterobacteria phage T4 (10665)
>7CLT_1|Chain A|EF-hand domain-containing protein D1|Mus musculus (10090)
>1URJ_1|Chains A, B|MAJOR DNA-BINDING PROTEIN|HUMAN HERPESVIRUS 1 (10298)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1QTZ , Knot 77 164 0.79 40 119 157
MNIFEMLRIDEGLRLKIYKCTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
7CLT , Knot 66 136 0.79 36 104 134
GAMGSGTARPGRSKVFNPYTEFPEFSRRLLKDLEKMFKTYDAGRDGFIDLMELKLMMEKLGAPQTHLGLKSMIKEVDEDFDGKLSFREFLLIFHKAAAGELQEDSGLLALAKFSEIDVALEGVRGAKNFFEAKAQA
1URJ , Knot 406 1136 0.83 40 319 962
METKPKTATTIKVPPGPLGYVYARACPSEGIELLALLSARSGDSDVAVAPLVVGLTVESGFEANVAVVVGSRTTGLGGTAVSLKLTPSHYSSSVYVFHGGRHLDPSTQAPNLTRLCERARRHFGFSDYTPRPGDLKHETTGEALCERLGLDPDRALLYLVVTEGFKEAVCINNTFLHLGGSDKVTIGGAEVHRIPVYPLQLFMPDFSRVIAEPFNANHRSIGEKFTYPLPFFNRPLNRLLFEAVVGPAAVALRSRNVDAVARAAAHLAFDENHEGAALPADITFTAFEASQGKTPRGGRDGGGKGAAGGFEQRLASVMAGDAALALESIVSMAVFDEPPTDISAWPLFEGQDTAAARANAVGAYLARAAGLVGAMVFSTNSALHLTEVDDAGPADPKDHSKPSFYRFFLVPGTHVAANPQVDREGHVVPGFEGRPTAPLVGGTQEFAGEHLAMLSGFSPALLAKMLFYLERCDGAVIVGRQEMDVFRYVADSNQTDVPCNLCTFDTRHACVHTTLMRLRARHPKFASAARGAIGVFGTMNSMYSDCDVLGNYAAFSALKRADGSETARTIMQETYRAATERVMAELETLQYVDQAVPTAMGRLETIITNREALHTVVNNVRQVVDREVEQLMRNLVEGRNFKFRDGLGEANHAMSLTLDPYACGPCPLLQLLGRRSNLAVYQDLALSQCHGVFAGQSVEGRNFRNQFQPVLRRRVMDMFNNGFLSAKTLTVALSEGAAICAPSLTAGQTAPAESSFEGDVARVTLGFPKELRVKSRVLFAGASANASEAAKARVASLQSAYQKPDKRVDILLGPLGFLLKQFHAAIFPNGKPPGSNQPNPQWFWTALQRNQLPARLLSREDIETIAFIKKFSLDYGAINFINLAPNNVSELAMYYMANQILRYCDHSTYFINTLTAIIAGSRRPPSVQAAAAWSAQGGAGLEAGARALMDAVDAHPGAWTSMFASCNLLRPVMAARPMVVLGLSISKYYGMAGNDRVFQAGNWASLMGGKNACPLLIFDRTRKFVLACPRAGFVCAASSLGGGAHESSLCEQLRGIISEGGAAVASSVFVATVKSLGPRTQQLQIEDWLALLEDEYLSEEMMELTARALERGNGEWSTDAALEVAHEAEALVSQLG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1QTZ_1)}(2) \setminus P_{f(7CLT_1)}(2)|=71\), \(|P_{f(7CLT_1)}(2) \setminus P_{f(1QTZ_1)}(2)|=56\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110110100110101000001000111101100010101100010011100001110000100110001011101110010101100010110011110111011001111100010110000100111011000100001001001100100101010001
Pair \(Z_2\) Length of longest common subsequence
1QTZ_1,7CLT_1 127 3
1QTZ_1,1URJ_1 230 3
7CLT_1,1URJ_1 227 4

Newick tree

 
[
	1URJ_1:12.73,
	[
		1QTZ_1:63.5,7CLT_1:63.5
	]:63.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{300 }{\log_{20} 300}-\frac{136}{\log_{20}136})=50.7\)
Status Protein1 Protein2 d d1/2
Query variables 1QTZ_1 7CLT_1 62 56.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]