Analytic Pattern Matching: From DNA to Twitter

Philippe Jacquet; Wojciech Szpankowski

doi:10.1017/CBO9780511843204

References

Bibliography

Abramowitz, M. and Stegun, I. (1964). Handbook of Mathematical Functions. Dover, New York.

Adebiyi, E. F., Jiang, T., and Kaufmann, M. (2001). An efficient algorithm for finding short approximate non-tandem repeats, In Intelligent Systems for Molecular Biology (ISMB), pp. 5–12.

Aldous, D. and Shields, P. (1988). A diffusion limit for a class of random-growing binary trees, Probab. Th. Rel. Fields, 79, 509–542.

Alon, N. and Spencer, J. (1992). The Probabilistic Method. John Wiley and Sons.

Apostolico, A. (1985). The myriad virtues of suffix trees, In Apostolico, A. and Galil, Z. (Eds.), Combinatorial Algorithms on Words, Vol. 12 of NATO Advanced Science Institutes, Series F, pp. 85–96. Springer-Verlag.

Apostolico, A. and Szpankowski, W. (1992). Self-alignments in words and their applications, J. Algorithms, 13, 446–467.

Arratia, R., Goldstein, L., and Gordon, L. (1990). Poisson approximation and the Chen-Stein method, Statist. Sci., 5, 403–434.

Arratia, R. and Waterman, M. (1994). A phase transition for the score in matching random sequences allowing deletions, Ann. Appl. Probab., 4, 200–225.

Arratia, R. and Waterman, M. S. (1989). The Erdös-Rényi strong law for pattern matching with a given proportion of mismatches, Ann. Probab., 17, 1152–1169.

Banderier, C., Bodini, O., Ponty, Y., and Bouzid, H. (2012). On the diversity of pattern distributions in rational language, In ANALCO, pp. 107–116.

Bassino, F., Clement, J., Fayolle, J., and Nicodeme, P. (2007). Counting occurrences for a finite set of words: an inclusion-exclusion approach, In Conference on Analysis ofAlgorithms, pp. 29–44.

Bassino, F., Clement, J., and Nicodeme, P. (2012). Counting occurrences for a finite set of words: an inclusion-exclusion approach, ACM Trans. Algorithms, 8(3), Article No. 31.

Bender, E. (1973). Central and local limit theorems applied to asymptotic enumeration, J. Combin. Th. A, 15, 91–111.

Bender, E. and Kochman, F. (1993). The distribution of subword counts is usually normal, European J. Combin., 14, 265–275.

Bieganski, P., Riedl, J., Carlis, J. V., and Retzel, E. (1994). Generalized suffix trees for biological sequence data: applications and implementations, In 27th Hawaii Int. Conf. on Systems Science, pp. 35–44. IEEE Computer Society Press.

Billingsley, P. (1961). Statistical methods in Markov chains, Ann. Math. Statist., 2, 12–40.

Billingsley, P. (1968). Convergence of Probability Measures. John Wiley and Sons.

Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Chen, M. T., and Seiferas, J. (1985). The smallest automaton recognizing the subwords of a text, Theoret. Comput. Sci., 40(1), 31–55.

Blumer, A., Ehrenfeucht, A., and Haussler, D. (1989). Average size of suffix trees and DAWGS, Discr. Appl. Math., 24, 37–45.

Bourdon, J. and Vallee, B. (2002). Generalized pattern matching statistics, In Mathematics and Computer Science II (Versailles, 2002), Trends. Math., pp. 249–265. Birkhauser.

Bradley, R. (1986). Basic properties of strong mixing conditions, In Eberlein, E. and Taqqu, M. (Eds.), Dependence in Probability and Statistics, pp. 165–192.

Breen, S., Waterman, M. S., and Zhang, N. (1985). Renewal theory for several patterns, J. Appl. Probab., 22, 228–234.

Briandais, R.de la (1959). File searching using variable length keys, In Pro-ceedings of the AFIPS Spring Joint Computer Conference, pp. 295–298.

Bucklew, J. A. (1990). Large Deviation Techniques in Decision, Simulation, and Estimation. John Wiley and Sons.

Burrows, M. and Wheeler, D. J. (1994). A block sorting data compression algorithm, Tech. rep., Digital System Research Center.

Choi, Y., Knessl, C., and Szpankowski, W. (2012). On a Recurrence Arising in Graph Compression, Electronic J. Combinatorics, 15(3), P15.

Choi, Y. and Szpankowski, W. (2011). Pattern Matching in Constrained Sequences, ACM Trans. Algorithms, 7(2), 25:1-25:19.

Chryssaphinou, O. and Papastavridis, S. (1988a). A limit theorem for the number of non-overlapping occurrences of a pattern in a sequence of independent trials, J. Appl. Probab., 25, 428–431.

Chryssaphinou, O. and Papastavridis, S. (1988b). A limit theorem on the number of overlapping appearances of a pattern in a sequence of independent trials, Probab. Theory Related Fields, 79, 129–143.

Chryssaphinou, O., Papastavridis, S., and Vaggelatou, E. (2001). Poisson approximation for the non-overlapping appearances of several words in Markov chains, Combin. Probab. Comput., 10, 293–308.

Clément, J., Flajolet, P., and Vallee, B. (2001). Dynamical sources in information theory: a general analysis of trie structures, Algorithmica, 29, 307–369.

Cobbs, A. L. (1995). Fast identification of approximately matching substrings, In Galil, Z. and Ukkonen, E. (Eds.), Combinatorial Pattern Matching, Vol. 937 of Lect. Notes Comp. Sci., pp. 41–54. Springer-Verlag.

Coffman, E. and Eve, J. (1970). File structures using hashing functions, Commun. ACM, 13(7), 427–432.

Cowan, R. (1991). Expected frequencies of DNA patterns using Whittle's formula, J. Appl. Probab., 28, 886–892.

Crochemore, M., Mignosi, F., Restivo, A., and Salemi, S. (2000). Data compression using antidictionaries, Proceedings of the IEEE, 88(11), 1756–1768. Special issue Lossless data compression edited by J. Storer.

Crochemore, M. and Rytter, W. (1994). Text Algorithms. The Clarendon Press Oxford University Press.

Dembo, A. and Kontoyiannis, I. (2002). Source coding, large deviations, and approximate pattern matching, IEEE Trans. Information Theory, 48(6), 1590–1615.

Dembo, A. and Zeitouni, O. (1993). Large Deviation Techniques and Applica-tions. Jones and Bartlett.

Denise, A. and Regnier, M. (2004). Rare events and conditional events on random strings, Discrete Math. Theor. Comput. Sci., 6, 191–214.

Denise, A., Regnier, M., and Vandenbogaert, M. (2001). Assessing the statistical significance of overrepresented oligonucleotides, In Gascuel, O. and Moret, B. M. E. (Eds.), Algorithms in Bioinformatics (WABI 2001), Vol. 2149 of Lect. Notes Comp. Sci., pp. 85–97. Springer-Verlag.

Devroye, L. (1982). A note on the average depth of tries, Computing, 28, 367–371.

Devroye, L. (1984). A probabilistic analysis of the height of tries and of the complexity of triesort, Acta Informatica, 21, 229–237.

Devroye, L. (1992). A study of trie-like structures under the density model, Ann. Appl. Probab., 2, 402–434.

Devroye, L. (1999). Universal limit laws for depths in random trees, SIAM Journal on Computing, 28, 409–432.

Devroye, L. (2002). Laws of large numbers and tail inequalities for random tries and PATRICIA trees, Journal ofComputational and Applied Mathematics, 142, 27–37.

Devroye, L., Lugosi, G., Park, G., and Szpankowski, W. (2009). Multiple Choice Tries, Rand Structures & Algorithms, 34, 337–367.

Devroye, L., Szpankowski, W., and Rais, B. (1992). A note of the height of suffix trees, SIAM J. Comput., 21, 48–53.

Drmota, M. (2009). Random Trees. Springer, Wien-New York.

Drmota, M., Reznik, Y., and Szpankowski, W. (2010). Tunstall Code, Khodak Variations, and Random Walks, IEEE Trans. Information Theory, 56, 2928–2937.

Drmota, M. and Szpankowski, W. (2009). (Un)Expected Behavior of Digital Search Tree Profile, In ACM-SIAM Symposium on Discrete Algorithms, pp. 130–138.

Durrett, R. (1991). Probability: Theory and Examples. Wadsworth, Belmont.

Farach, M. (1997). Optimal suffix tree construction with large alphabets, In 38th Foundations of Computer Science (FOCS), pp. 137-143, Miami Beach, FL.

Fayolle, J. (2003). Parameters des arbes suffixes dans le cas de sources simples, Tech. rep., INRIA.

Fayolle, J. and Ward, M. (2005). Analysis of the average depth in a suffix tree under a Markov model, In 2005 International Conference on Analysis of Algorithms, pp. 95–104.

Feller, W. (1970). An Introduction to Probability Theory and its Applications, Vol. I. John Wiley and Sons.

Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. II. John Wiley and Sons. Second edition.

Flajolet, P. (1983). On the performance evaluation of extendible hashing and trie searching, Acta Informatica, 20, 345–369.

Flajolet, P., Guivarc'h, Y., Szpankowski, W., and Vallée, B. (2001). Hidden pattern statistics, In Oreijas, F., Spirakis, P., and Leeuwen, J. van (Eds.), Automata, Languages and Programming (ICALP 2001), Vol. 2076 of Lect. Notes Comp. Sci., pp. 152-165. Springer-Verlag.

Flajolet, P. and Sedgewick, R. (2009). Analytic Combinatorics. Cambridge University Press.

Flajolet, P. and Steyaert, J.-M. (1982). A branching process arising in dynamic hashing, trie searching and polynomial factorization, Lecture Notes in Computer Science, 140, 239–251.

Flajolet, P., Szpankowski, W., and Vallee, B. (2006). Hidden word statistics, J. ACM, 53(1), 147–183.

Fredkin, E. (1960). Trie memory, Communications ofthe ACM, 3, 490–499.

Fudos, I., Pitoura, E., and Szpankowski, W. (1996). On Pattern Occurrences in a Random Text, Information Processing Letters, 57, 307–312.

Gaither, J. and Ward, M. (2013). The Asymptotic Distribution of the Multiplicity Matching Parameter in Tries and Suffix Trees Tech. rep., Purdue University.

Gantmacher, F. R. (1959). The Theory ofMatrices, Vols 1, 2. Chelsea. Translated from the Russian original.

Gentleman, J. (1994). The distribution of the frequency of subsequences in alphabetic sequences, as exemplified by deoxyribonucleic acid, Appl. Statist., 43, 404–414.

Gentleman, J. and Mullin, R. (1989). The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability, Biometrics, 45, 35–52.

Geske, M. X., Godbole, A. P., Schaffner, A. A., Skolnick, A. M., and Wallstrom, G. L. (1995). Compound Poisson approximations for word patterns under Markovian hypotheses, J. Appl. Probab., 32, 877–892.

Gheorghiciuc, I. and Ward, M. D. (2007). On correlation polynomials and sub-word complexity, Discrete Mathematics and Theoretical Computer Science, AH, 1–18.

Giegerich, R., Kurtz, S., and Stoye, J. (1999). Efficient implementation of lazy suffix trees, In 3rd Workshop on Algorithmic Engineering (WAE99), Vol. 1668 of Lect. Notes Comp. Sci., pp. 30-42. Springer-Verlag.

Gilbert, E. and Kadota, T. (1992). The Lempel-Ziv Algorithm and Message Complexity, IEEE Trans. Information Theory, 38, 1839–1842.

Godbole, A. P. (1991). Poisson approximations for runs and patterns of rare events, Adv. in Appl. Probab., 23, 851–865.

Godbole, A. P. and Schaffner, A. A. (1993). Improved Poisson approximations for word patterns, Adv. in Appl. Probab., 25, 334–347.

Goulden, I. P. and Jackson, D. M. (1983). Combinatorial Enumeration. John Wiley and Sons.

Guibas, L. and Odlyzko, A. (1981a). Periods in strings, J. Combin. Th. A, 30, 19–43.

Guibas, L. and Odlyzko, A. (1981b). String overlaps, pattern matching, and nontransitive games, J. Combin. Th. A, 30, 183–208.

Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge University Press.

Gwadera, R., Atallah, M., and Szpankowski, W. (2003). Reliable detection of episodes in event sequences, In 3rd IEEE Conf. on Data Mining, pp. 67-74. IEEE Computer Soc.

Henrici, P. (1997). Applied and Computational Complex Analysis. John Wiley.

Hwang, H.-K. (1994). Théorèmes limites pour les structures combinatoires et les fonctions arithmétiques,. Thèse de Doctorat, Ecole Polytechnique.

Hwang, H.-K. (1996). Large deviations for combinatorial distributions I: Central limit theorems, Ann. in Appl. Probab., 6, 297–319.

Jacquet, P. (2007). Common words between two random strings, In IEEE Intl. Symposium on Information Theory, pp. 1495–1499.

Jacquet, P., Milioris, D., and Szpankowski, W. (2013). Classification of Markov Sources Through Joint String Complexity: Theory and Experiments, In IEEE Intl. Symposium on Information Theory, pp. 2289–2293.

Jacquet, P. and Régnier, M. (1986). Trie partitioning process: limiting distri-butions, In CAAP '86 (Nice, 1986), Vol. 214 of Lect. Notes Comp. Sci., pp. 196-210. Springer-Verlag.

Jacquet, P. and Regnier, M. (1998). Normal limiting distribution of the size of tries, In Proceedings of Performance'87, pp. 209–223.

Jacquet, P. and Szpankowski, W. (1991). Analysis of digital tries with Markovian dependency, IEEE Trans. Inform. Theory, 37, 1470–1475.

Jacquet, P. and Szpankowski, W. (1994). Autocorrelation on words and its applications: analysis of suffix trees by string-ruler approach, J. Combin. Th. A, 66, 237–269.

Jacquet, P. and Szpankowski, W. (1995). Asymptotic behavior of the LempelZiv parsing scheme and digital search trees, Theoret. Comput. Sci., 144, 161–197.

Jacquet, P. and Szpankowski, W. (1998). Analytical de-Poissonization and its applications, Theoret. Comput. Sci., 201, 1–62.

Jacquet, P. and Szpankowski, W. (2011). Limiting Distribution of Lempel Ziv'78 Redundancy, In IEEE Intl. Symposium on Information Theory, pp. 1424-1428.

Jacquet, P. and Szpankowski, W. (2012). Joint String Complexity for Markov Sources, In 23rd International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms, pp. 1–12.

Jacquet, P. and Szpankowski, W. (2014). On the Limiting Distribution of Lempel Ziv'78 Redundancy for Memoryless Sources, IEEE Trans. Information Theory, 60, 1–14.

Jacquet, P., Szpankowski, W., and Apostol, I. (2002). A universal predictor based on pattern matching, IEEE Trans. Inform. Theory, 48, 1462–1472.

Jacquet, P., Szpankowski, W., and Tang, J. (2001). Average profile of the Lempel-Ziv parsing scheme for a Markovian source, Algorithmica, 31, 318–360.

Janson, S. (2004). Large deviations for sums of partially dependent random variables, Random Structures & Algorithms, 24, 234–248.

Janson, S., Lonardi, S., and Szpankowski, W. (2004). On the Average Sequence Complexity, Theoretical Computer Science, 326, 213–227.

Jolivet, R., Rauch, A., Luscher, H. R., and Gerstner, W. (2006). Predicting spike timing of neocortical pyramidal neurons by simple threshold models, J. Computational Neuroscience, 21, 35–49.

Kärkkäinen, J. and Sanders, P. (2003). Simple linear work suffix array constructio, In Baeten, J. C. M., Lenstra, J. K., Parrow, J., and Woeginger, G. J. (Eds.), 30th Automata, Languages and Programming (ICALP '03), Vol. 2719 of Lect. Notes Comp. Sci., pp. 943-955. Springer-Verlag.

Karlin, S. and Ost, F. (1987). Counts of long aligned word matches among random letter sequences, Adv. in Appl. Probab., 19, 293–351.

Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes (Second edition). Academic Press.

Kato, T. (1980). Perturbation Theory for Linear Operators. Springer-Verlag.

Khodak, G. (1969). Connection Between Redundancy and Average Delay of Fixed-Length Coding, In All-Union Conference on Problems of Theoretical Cybernetics.

Kirschenhofer, P. and Prodinger, H. (1988). Further results on digital search trees, Theoretical Computer Science, 58, 143–154.

Kirschenhofer, P. and Prodinger, H. (1991). On some applications of formulae of Ramanujan in the analysis of algorithms, Mathematika, 38, 14–33.

Kirschenhofer, P., Prodinger, H., and Szpankowski, W. (1989a). On the Balance Property of PATRICIA Tries: External Path Length Viewpoint, Theoretical Computer Science, 68, 1–17.

Kirschenhofer, P., Prodinger, H., and Szpankowski, W. (1989b). On the variance of the external path in a symmetric digital trie, Discrete Applied Mathematics, 25, 129–143.

Kirschenhofer, P., Prodinger, H., and Szpankowski, W. (1994). Digital Search Trees Again Revisited: The Internal Path Length Perspective, SIAM J. Computing, 23, 598–616.

Kleffe, J. and Borodovsky, M. (1992). First and second moment of counts of words in random texts generated by Markov chains, Comput. Appl. Biosci., 8, 433–441.

Knessl, C. and Szpankowski, W. (2004). On the number of full levels in tries, Random Structures and Algorithms, 25, 247–276.

Knessl, C. and Szpankowski, W. (2005). Enumeration of Binary Trees and Universal Types, Discrete Mathematics and Theoretical Computer Science, 7, 313–400.

Knessl, C. and Szpankowski, W. (2009). On the Average Profile of Symmetric Digital Search Trees, Analytic Combinatorics, 4.

Knuth, D. E. (1998). The Art of Computer Programming. Sorting and Searching (Second edition)., Vol. 3. Addison-Wesley.

Kolesnik, V. D. and Krachkovsky, V. Y. (1991). Generating functions and lower bounds on rates for limited error-correcting codes, IEEE Trans. Informa-tion Theory, 37(3), 778–788.

Kucherov, G. and Rusinowitch, M. (1997). Matching a set of strings with variable length don't cares, Theoret. Comput. Sci., 178, 129–154.

Landau, G. and Schmidt, J. (1993). An algorithm for approximate tandem repeats, In Apostolico, A., Crochemore, M., Galil, Z., and Manber, U. (Eds.), 4th Combinatorial Pattern Matching (Padova), Vol. 684 of Lect. Notes Comp. Sci., pp. 120–133. Springer-Verlag.

Li, M. and Vitanyi, P. (1993). Introduction to Kolmogorov Complexity and its Applications. Springer Verlag.

Li, S.-Y. (1980). A martingale approach to the study of occurrence of sequence patterns in repeated experiments, Ann. Probab., 8, 1171–1176.

Lonardi, S., Szpankowski, W., and Ward, M. (2007). Error Resilient LZ'77 Data Compression: Algorithms, Analysis, and Experiments, IEEE Trans. Information Theory, 53, 1799–1813.

Lothaire, M. (2005). Applied Combinatorics on Words, Vol. 105 of Encyclopedia of Mathematics and its Applications. Cambridge University Press.

Louchard, G. (1994). Trie size in a dynamic list structure, Random Structures and Algorithms, 5, 665–702.

Louchard, G. and Szpankowski, W. (1995). Average profile and Limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm, IEEE Trans. Information Theory, 41, 478–488.

Louchard, G. and Szpankowski, W. (1997). On the Average Redundancy Rate of the Lempel-Ziv Code, IEEE Trans. Information Theory, 43, 2–8.

Louchard, G., Szpankowski, W., and Tang, J. (1999). Average Profile of Generalized Digital Search Trees and the Generalized Lempel-Ziv Algorithm, SIAM J. Computing, 28, 935–954.

Luczak, T. and Szpankowski, W. (1997). A suboptimal lossy data compression based on approximate pattern matching, IEEE Trans. Inform. Theory, 43, 1439–1451.

Magner, A., Knessl, C., and Szpankowski, W. (2014). Expected External Profile of PATRICIA Tries, In Analytic Algorithmics and Combinatorics (ANALCO).

Mahmoud, H. (1992). Evolution of Random Search Trees. John Wiley and Sons.

Marcus, B., Roth, R., and Siegel, P. (1988). Constrained Systems and Coding for Recording Channels in Handbook of Coding Theory,. chap. 20. Elsevier Science.

Marsan, L. and Sagot, M.-F. (2000). Extracting structured motifs using a suffix tree. Algorithms and application to consensus identification, In 4th Research in Computational Molecular Biology (RECOMB), pp. 210–219. ACM Press.

McCreight, E. M. (1976). A space-economical suffix tree construction algorithm, J. Assoc. Comput. Mach., 23(2), 262–272.

Merhav, N. (1991). Universal Coding with Minimum Probability of Codeword Length Overflow, IEEE Trans. Information Theory, 37, 556–563.

Moision, B. E., Orlitsky, A., and Siegel, P. H. (2001). On codes that avoid specified differences, IEEE Trans. Information Theory, 47(1), 433–442.

Naor, M. and Wieder, U. (2003). Novel architectures for P2P applications: The continuous-discrete approach, In Proc. 15th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 50–59.

Neininger, R. and Rüschendorf, L. (2004). A general limit theorem for recursive algorithms and combinatorial structures, Ann. Appl. Probab., 14, 378–418.

Nicodème, P., Salvy, B., and Flajolet, P. (1999). Motif Statistics, In European Symposium on Algorithms, Lecture Notes in Computer Science, pp. No. 1643, 194–211.

Nicodème, P., Salvy, B., and Flajolet, P. (2002). Motif statistics, Theoret. Comput. Sci., 287(2), 593–617. Algorithms (Prague, 1999).

Niederreiter, H. (1999). Some computable complexity measures for binary sequences, In Sequences and Their Applications, pp. 67–78. Springer Verlag.

Olver, F. W. J. (1974). Asymptotics and Special Functions. Academic Press.

Paninski, L. (2003). Estimation of Entropy and Mutual Information, Neural Comp., 15(6), 1191–1253.

Park, G. (2006). Profile of Tries. Ph.d. thesis, Purdue University.

Park, G., Hwang, H., Nicodeme, P., and Szpankowski, W. (2009). Profile in Tries, SIAM J. Computing, 38, 1821–1880.

Pittel, B. (1985). Asymptotic growth of a class of random trees, Ann. Probab., 18, 414–427.

Pittel, B. (1986). Paths in a random digital tree: limiting distributions, Adv. Appl. Probab., 18, 139–155.

Plotnik, E., Weinberger, M., and Ziv, J. (1992). Upper Bounds on the Probability of Sequences Emitted by Finite-State Sources and on the Redundancy of the Lempel-Ziv Algorithm, IEEE Trans. Information Theory, 38, 66–72.

Prum, B., Rodolphe, F., and Turckheim, È. (1995). Finding words with unexpected frequencies in deoxyribonucleic acid sequences, J. Roy. Statist. Soc. Ser. B, 57, 205–220.

Rabiner, L. (1989). A tutorial on hidden Markov models, Proceedings of the IEEE, 77(2), 257–286.

Rachev, S. T. and Ruschendorf, L. (1995). Probability metrics and recursive algorithms, Adv. Appl. Probab., 27, 770–799.

Régnier, M. (2000). A unified approach to word occurrence probabilities, Discr. Appl. Math., 104, 259–280.

Régnier, M. and Jacquet, P. (1989). New results on the size of tries, IEEE Trans. Information Theory, 35, 203–205.

Régnier, M. and Szpankowski, W. (1998a). On pattern frequency occurrences in a Markovian sequence, Algorithmica, 22, 631–649.

Régnier, M. and Szpankowski, W. (1998b). On the approximate pattern occurrences in a text, In Compression and Complexity of Sequences, pp. 253–264. IEEE Computer Society Press.

Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains, J. Comput. Biol., 5, 223–253.

Reinert, G., Schbath, S., and Waterman, M. (2000). Probabilistic and statistical properties of words: an overview, J. Comput. Biol., 7, 1–46.

Robin, S. (2002). A compound Poisson model for word occurrences in DNA sequences, J. Roy. Statist. Soc. Ser. C, 51, 437–451.

Robin, S. and Daudin, J.-J. (1999). Exact distribution of word occurrences in a random sequence of letters, J. Appl. Probab., 36, 179–193.

Robin, S. and Daudin, J.-J. (2001). Exact distribution of the distances between any occurrences of a set of words, Ann. Inst. Statist. Math., 36 (4), 895905.

Sagot, M.-F. (1998). Spelling approximate repeated or common motifs using a suffix tree, In Lucchesi, C. and Moura, A. (Eds.), LATIN'98: Theoretical Informatics: Third Latin American Symposium, Vol. 1380 of Lect. Notes Comp. Sci., pp. 111–127. Springer-Verlag.

Salvy, B., Sedgewick, B., Soria, M., Szpankowski, W., and Vallee, B. (2011). Philippe Flajolet, the Father of Analytic Combinatorics, ACM Transac-tions on Algorithms, 7.

Savari, S. (1997). Redundancy of the Lempel-Ziv Incremental Parsing Rule, IEEE Trans. Information Theory, 43, 9–21.

Savari, S. and Gallager, R. (1997). Generalized Tunstall codes for sources with memory, IEEE Trans. Information Theory, 43, 658–668.

Schachinger, W. (1995). On the variance of a class of inductive valuations of data structures for digital search, Theoretical Computer Science, 144, 251–275.

Schachinger, W. (2000). Limiting distributions for the costs of partial match retrievals in multidimensional tries, Random Structures & Algorithms, 17, 428–459.

Schachinger, W. (2001). Asymptotic normality of recursive algorithms via mar-tingale difference arrays, Discrete Mathematics and Theoretical Computer Science, 4, 363–397.

Schachinger, W. (2004). Concentration of size and path length of tries, Combi-natorics, Probability and Computing, 13, 763–793.

Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaîne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. Ph.D. thesis, Université René Descartes, Paris V.

Sedgewick, R. (1983). Algorithms. Addison-Wesley.

Sedgewick, R. and Flajolet, P. (1995). An Introduction to the Analysis ofAlgo-rithms. Addison-Wesley.

Seroussi, G. (2006a). On the Number of t-Ary Trees with a Given Path Length, Algorithmica, 46, 557–565.

Seroussi, G. (2006b). On Universal Types, IEEE Trans. Information Theory, 52, 171–189.

Shallit, J. (1993). On the maximum number of distinct factors in a binary string, Graphs and Combinatorics, 9, 197–200.

Shields, P. C. (1969). The Ergodic Theory of Discrete Sample Paths. Amer. Math. Soc.

Stefanov, V. (2003). The intersite distances between pattern occurrences in strings generated by general discrete- and continuous-time models: an algorithmic approach, J. Appl. Probab., 40.

Szpankowski, W. (1987). Average complexity of additive properties for multiway tries: a unified approach, Lecture Notes in Computer Science, 249, 13-25.

Szpankowski, W. (1988a). The Evaluation of an Alternating Sum with Applications to the Analysis of Some Data Structures, Information Processing Letters, 28, 13-19.

Szpankowski, W. (1988b). Some results on V -ary asymmetric tries, Journal of Algorithms, 9, 224-244.

Szpankowski, W. (1991). On the height of digital trees and related problems, Algorithmica, 6, 256-277.

Szpankowski, W. (1993a). Asymptotic properties of data compression and suffix trees, IEEE Trans. Inform. Theory, 39, 1647-1659.

Szpankowski, W. (1993b). A generalized suffix tree and its (un)expected asymptotic behaviors, SIAM J. Comput., 22, 1176-1198.

Szpankowski, W. (2001). Average Case Analysis of Algorithms on Sequences. John Wiley and Sons.

Temme, N. (1996). Special Functions. John Wiley & Sons, New York.

Titchmarsh, E. C. and Heath-Brown, D. (1988). The Theory of the Riemann Zeta-Functions. Oxford University Press.

Tunstall, B. (1967). Synthesis of Noiseless Compression Codes. Ph.d. thesis, Georgia Inst. Tech.

Ukkonen, E. (1995). On-line construction of suffix trees, Algorithmica, 14(3), 249-260.

Vallée, B. (2001). Dynamical sources in information theory: fundamental intervals and word prefixes, Algorithmica, 29, 262-306.

Vitter, J. and Krishnan, P. (1996). Optimal Prefetching via Data Compression, Journal of the ACM, 43, 771-793.

Wang, M. and Shallit, J. (1998). On minimal words with given subword complexity, Electronic Journal of Combinatorics, 5.

Ward, M. (2005). Analysis of the Multiplicity Matching Parameter in Suffix Trees. Ph.d. thesis, Purdue University.

Ward, M. and Szpankowski, W. (2005). Analysis of the Multiplicity Matching Parameter in Suffix Trees, 2005 Conference on Analysis of Algorithms -DMTCS Proc., pp. 307–322.

Waterman, M. (1995a). Introduction to Computational Biology. Chapman & Hall.

Waterman, M. S. (1995b). Introduction to Computational Biology. Maps, Sequences and Genomes. Chapman and Hall.

Weiner, P. (1973). Linear pattern matching algorithm, In 14th IEEE Symposium on Switching and Automata Theory (SWAT), pp. 1–11.

Wu, S. and Manber, U. (1995). Fast text searching allowing errors, Comm. Assoc. Comput. Mach., 35, 983–991.

Wyner, A. and Ziv, J. (1989). Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression, IEEE Trans. Inform. Theory, 35, 1250–1258.

Wyner, A. J. (1997). The redundancy and distribution of the phrase lengths of the fixed-database Lempel-Ziv algorithm, IEEE Trans. Inform. Theory, 43, 1439–1465.

Yang, E. and Kieffer, J. (1998). On the performance of data compression algorithms based upon string matching, IEEE Trans. Inform. Theory, 44, 47–65.

Zehavi, E. and Wolf, J. K. (1988). On runlength codes, IEEE Trans. Information Theory, 34(1), 45–54.

Ziv, J. (1988). On classification with empirically observed statistics and universal data compression, IEEE Trans. Information Theory, 34, 278–286.

Ziv, J. and Lempel, A. (1977). A universal algorithm for sequential data compression, IEEE Trans. Inform. Theory, IT-23, 337–343.

Ziv, J. and Lempel, A. (1978). Compression of individual sequences via variable-rate coding, IEEE Trans. Inform. Theory, 24, 530–536.

Ziv, J. and Merhav, N. (1993). A measure of relative entropy between individual sequences with application to universal classification, IEEE Trans. Inform. Theory, 39, 1270–1279.

Analytic Pattern Matching

From DNA to Twitter

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

Refine List

Actions for selected content:

Contents

Frontmatter
pp i-iv

Dedication
pp v-vi

Contents
pp vii-x

Foreword
pp xi-xii

Preface
pp xiii-xviii

Acknowledgments
pp xix-xx

About the sketches
pp xxi-xxii

I - ANALYSIS
pp 1-2

Chapter 1 - Probabilistic Models
pp 3-22

Chapter 2 - Exact String Matching
pp 23-46

Chapter 3 - Constrained Exact String Matching
pp 47-74

Chapter 4 - Generalized String Matching
pp 75-108

Chapter 5 - Subsequence String Matching
pp 109-132

II - APPLICATIONS
pp 133-134

Chapter 6 - Algorithms and Data Structures
pp 135-154

Chapter 7 - Digital Trees
pp 155-218

Chapter 8 - Suffix Trees and Lempel–Ziv'77
pp 219-266

Chapter 9 - Lempel–Ziv'78 Compression Algorithm
pp 267-310

Chapter 10 - String Complexity
pp 311-346

Bibliography
pp 347-362

Index
pp 363-366

Metrics

Altmetric attention score

Full text views

Book summary page views