CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6QVP_1 6SKF_1 8XRH_1 Letter Amino acid
7 0 23 E Glutamic acid
15 0 36 I Isoleucine
2 0 17 P Proline
4 0 12 R Arginine
3 0 11 M Methionine
2 0 16 F Phenylalanine
12 0 29 S Serine
9 0 29 T Threonine
4 0 12 Y Tyrosine
1 0 12 H Histidine
0 416 10 C Cysteine
8 0 29 L Leucine
1 0 4 W Tryptophan
5 20 15 N Asparagine
7 0 20 D Aspartic acid
1 0 14 Q Glutamine
3 553 21 G Glycine
10 0 38 K Lycine
5 0 27 V Valine
5 291 33 A Alanine

6QVP_1|Chains A, B, C, D, E, F|Inner membrane protein|Salmonella typhimurium (90371)
>6SKF_1|Chain A[auth Aa]|16S rRNA|Thermococcus kodakarensis (311400)
>8XRH_1|Chain A|DNA topoisomerase 2|African swine fever virus BA71V (10498)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6QVP , Knot 55 104 0.81 38 82 101
KTDITSTKNELVITYHGRLRSFSEEDTYKIKAWLEDKINSNLLIEMVIPQADISFSDSLRLGYERGIILMKEIKKIYPDVVIDMSVNSAASSTTSKAIITTINK
6SKF , Knot 275 1498 0.44 10 18 73
AUUCNGGUUGAUCCUGCCGGAGGCCACUGCUAUGGGGGUCNGACUAAGCCAUGCGAGUCAUGGGGCGCGCUCUGCGCGCACCGGCGGACGGCUCAGUAACACGUCGGUAACCUACCCUCGGGAGGGGGAUAACCCNGGGAAACUGGGGCUAAUCCCCCAUAGGCCUGAGGUACUGGAAGGUCCUCAGGCCGAAAGGGGCAUCUGCCCGCCCGAGGAUGGGCCGGCGGCCGAUUAGGUAGUUGGUGGGGUAACGGCCCACCAAGCCGAAGAUCGGUACGGGCCAUGAGAGUGGGAGCCNGGAGAUGGACACUGAGACACGGGUCCAGGCCCUACGGGGCGCAGCAGGCGCGAAACCUCNGCAAUGCGGGCAACNGCGACGGGGGGACCCCCAGUGCCGUGGCAUAGCCACGGCUUUUCCGGAGUGUAAAAAGCUCCGGGAAUAAGGGCUGGGCAAGGCNGGUGGCAGCCGCCGCGGUAAUACCGGCGGCCCGAGUGGUGGCNGCUAUUAUUGGGCCUAAAGCGUCNGUAGCCGGGCCCGUAAGUCCCUGGCGAAAUCCCACGGCUCAACNGUGGGGCUUGCUGGGGAUACUGCGGGCCUUGGGACNGGGAGAGGCCGGGGGUACCCCUGGGGUAGGGGUGAAAUCCUAUAAUCCCAGGGGGACCGCCAGUGGCGAAGGCGCCNGGCUGGAACGGGUCNGACGGUGAGGGACGAAGGCCAGGGGAGCGAACNGGAUUAGAUACCCGGGUAGUCCUGGCUGUAAAGGAUGCGGGCUAGGUGUCGGGCGAGCUUCGAGCUCGCCCGGUGCCGGAGGGAAGCNGUUAAGCCNGCCGCCUGGGGAGUACGGCNGCAAGGCUGAAACUUAAAGGAAUUGGCGGGGGAGCACUACAAGGGGUGGAGCGUGCGGUUUAAUUGGAUUCAACGCCGGGAACCUCACCGGGGGCGACGGCAGGAUGAAGGCCAGGCUGAAGGUCUUGCCGGACACGCCGAGAGGAGGUGCAUGGCCGCNGUCAGCUCGUACCGUGAGGCGUCCACUUAAGUGUGGUAACGAGCGAGACCCGCGCCCCCAGUUGCCAGUCCUCCCCGCUGGGGAGGAGGCACUCUGGGGGGACCGCCGGCGAUAAGCCGGAGGAAGGAGCGGGCGACGGUAGGUCAGUAUGCCCCGAAACCCCCGGGCUACACGCGCGCUACAAUGGGCGGGACAAUGGGAUCCGACCCNGAAAGGGGAAGGGAAUCCCCUAAACCCGCCCUCAGUUCGGAUCGCGGGCUGCAACUCGCCCGCGUGAAGCUGGAAUCCCUAGUACCCGCGUGUCAUCAUCGCGCGGCGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCACUCCACCCGAGCGGGGUCCGGGUGAGGCCUGGUCUCCCUUCGGGGAGGCCGGGUCGAGCCUGGGCUCCGUGAGGGGGGAGAAGUCGUAACAAGGUAGCNGUAGGGGAACCUACGGCUCGAUCACCUCCUAUCGCCGGA
8XRH , Knot 174 408 0.85 40 236 391
MEAFEISDFKEHAKKKSMWAGALNKVTISGLMGVFTEDEDLMALPIHRDHCPALLKIFDELIVNATDHERACHSKTKKVTYIKISFDKGVFSCENDGPGIPIAKHEQASLIAKRDVYVPEVASCFFLAGTNINKAKDCIKGGTNGVGLKLAMVHSQWAILTTADGAQKYVQQINQRLDIIEPPTITPSREMFTRIELMPVYQELGYAEPLSETEQADLSAWIYLRACQCAAYVGKGTTIYYNDKPCRTGSVMALAKMYTLLSAPNSTIHTATIKADAKPYSLHPLQVAAVVSPKFKKFEHVSIINGVNCVKGEHVTFLKKTINEMVIKKFQQTIKDKNRKTTLRDSCSNIFVVIVGSIPGIEWTGQRKDELSIAENVFKTHYSIPSSFLTSMTRSIVDILLQSISKKD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6QVP_1)}(2) \setminus P_{f(6SKF_1)}(2)|=81\), \(|P_{f(6SKF_1)}(2) \setminus P_{f(6QVP_1)}(2)|=17\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00010000001110001010010000000101110001000111011110101010001011000111110010010101110101001100000011100100
Pair \(Z_2\) Length of longest common subsequence
6QVP_1,6SKF_1 98 2
6QVP_1,8XRH_1 192 4
6SKF_1,8XRH_1 240 3

Newick tree

 
[
	8XRH_1:12.24,
	[
		6QVP_1:49,6SKF_1:49
	]:73.24
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1602 }{\log_{20} 1602}-\frac{104}{\log_{20}104})=396.\)
Status Protein1 Protein2 d d1/2
Query variables 6QVP_1 6SKF_1 273 162
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]