CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9GQV_1 7YTX_1 3CJE_1 Letter Amino acid
24 43 12 D Aspartic acid
53 30 13 G Glycine
7 21 1 Y Tyrosine
24 28 13 V Valine
15 80 3 N Asparagine
6 29 6 Q Glutamine
19 42 11 T Threonine
11 22 3 H Histidine
36 132 13 L Leucine
35 76 3 S Serine
2 6 2 W Tryptophan
53 26 21 A Alanine
4 14 4 C Cysteine
22 43 11 E Glutamic acid
15 6 5 M Methionine
18 49 9 F Phenylalanine
17 35 10 P Proline
29 38 8 R Arginine
21 51 9 I Isoleucine
8 40 10 K Lycine

9GQV_1|Chains A, B|3-oxoacyl-[acyl-carrier-protein] synthase 2|Pseudomonas aeruginosa (287)
>7YTX_1|Chains A, B|Toll-like receptor 8|Homo sapiens (9606)
>3CJE_1|Chain A|OsmC-like protein|Jannaschia sp. (290400)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9GQV , Knot 173 419 0.83 40 211 388
GHMASMSRRRVVITGMGMLSPLGLDVPSSWEGILAGRSGIAPIEHMDLSAYSTRFGGSVKGFNVEEYLSAKEARKLDLFIQYGLAASFQAVRDSGLEVTDANRERIGVSMGSGIGGLTNIENNCRSLFEQGPRRISPFFVPGSIINMVSGFLSIHLGLQGPNYALTTAATTGTHSIGMAARNIAYGEADVMVAGGSEMAACGLGLGGFGAARALSTRNDEPTRASRPWDRDRDGFVLSDGSGALVLEELEHARARGARIYAELVGFGMSGDAFHMTAPPEDGAGAARCMKNALRDAGLDPRQVDYINAHGTSTPAGDIAEIAAVKSVFGEHAHALSMSSTKSMTGHLLGAAGAVEAIFSVLALRDQVAPPTINLDNPDEGCDLDLVAHEAKPRKIDVALSNSFGFGGTNGTLVFRRFAD
7YTX , Knot 287 811 0.79 40 270 664
RSPWEENFSRSYPCDEKKQNDSVIAECSNRRLQEVPQTVGKYVTELDLSDNFITHITNESFQGLQNLTKINLNHNPNVQHQNGNPGIQSNGLNITDGAFLNLKNLRELLLEDNQLPQIPSGLPESLTELSLIQNNIYNITKEGISRLINLKNLYLAWNCYFNKVCEKTNIEDGVFETLTNLELLSLSFNSLSHVPPKLPSSLRKLFLSNTQIKYISEEDFKGLINLTLLDLSGNCPRCFNAPFPCVPCDGGASINIDRFAFQNLTQLRYLNLSSTSLRKINAAWFKNMPHLKVLDLEFNYLVGEIASGAFLTMLPRLEILDLSFNYIKGSYPQHINISRNFSKLLSLRALHLRGYVFQELREDDFQPLMQLPNLSTINLGINFIKQIDFKLFQNFSNLEIIYLSENRISPLVKDTRQSYANSSSFQRHIRKRRSTDFEFDPHSNFYHFTRPLIKPQCAAYGKALDLSLNSIFFIGPNQFENLPDIACLNLSANSNAQVLSGTEFSAIPHVKYLDLTNNRLDFDNASALTELSDLEVLDLSYNSHYFRIAGVTHHLEFIQNFTNLKVLNLSHNNIYTLTDKYNLESKSLVELVFSGNRLDILWNDDDNRYISIFKGLKNLTRLDLSLNRLKHIPNEAFLNLPASLTELHINDNMLKFFNWTLLQQFPRLELLDLRGNKLLFLTDSLSDFTSSLRTLLLSHNRISHLPSGFLSEVSSLKHLDLSSNLLKTINKSALETKTTTKLSMLELHGNPFECTCDIGDFRRWMDEHLNVKIPRLVDVICASPGDQRGKSIVSLELTTCVSDVTEFLVPR
3CJE , Knot 81 167 0.82 40 129 161
GMALTPDQMPDRAEVVFTCNGKAVGKMRNELDVAMVKPFEERFALATDEGAFHGGDASAPPPLALFIAGLTGCVMTQIRAFAKRLKVTVTDLDVECRVVWDWAKAGPVYETGPKSFEIDIILHSPDPIEAQQALIEAAKKGCFLEQTLGQANTIRHRLKVGDTFIDA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9GQV_1)}(2) \setminus P_{f(7YTX_1)}(2)|=49\), \(|P_{f(7YTX_1)}(2) \setminus P_{f(9GQV_1)}(2)|=108\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110100001110111110111101100101111100111110010101000011101011010001010010010111001111010110001101001000011101101111100100000011001100101111110110110111010111011001100110010001111100110101011111100111011111111110110000001001001100000111100101111100100101011010101111110101101011100111110010011001110100100101010001110110111100111001011010000010101111111101110111100011110101001001001011100101001011100011111001011100110
Pair \(Z_2\) Length of longest common subsequence
9GQV_1,7YTX_1 157 5
9GQV_1,3CJE_1 166 3
7YTX_1,3CJE_1 213 3

Newick tree

 
[
	3CJE_1:10.49,
	[
		9GQV_1:78.5,7YTX_1:78.5
	]:21.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1230 }{\log_{20} 1230}-\frac{419}{\log_{20}419})=210.\)
Status Protein1 Protein2 d d1/2
Query variables 9GQV_1 7YTX_1 254 195.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]