Some publications

[2025] [2024] [2023] [2022] [2021] [2020] [2019] [2018] [2017] [2016] [2015] [2014] [2013] [2012] [2011] [2010] [2009] [2008] [2007] [2006] [2005] [2004] [2003] [2002] [2001] [2000] [1999] [1998] [1997] [1996] [1994] [1993] [1991]

Note: the years below correspond to the first preprint, not to the final publication dates.

2025:

Correctly-rounded evaluation of a function: why, how, and at what cost?, with Nicolas Brisebarre, Guillaume Hanrot and Jean-Michel Muller, preprint, 35 pages, 2025.

Note: the hardness to round upper bounds from Table 3 can be reproduced with this SageMath program.

Accuracy of Mathematical Functions in Single, Double, Extended Double and Quadruple Precision, with Brian Gladman, Vincenzo Innocente, John Mather, 27 pages, February 2025. [previous versions from August 2024, February 2024, September 2023, February 2023, August 2022, February 2022, September 2021, February 2021, December 2020, September 17, 2020, September 15, 2020, August 28, 2020, August 25, 2020, February 2020] [HAL]

For single, double, and quadruple precision (IEEE-754 binary32, binary64 and binary128 formats), we give the largest known error in ulps (units in last place) of mathematical functions for several mathematical libraries. For single precision and univariate functions, these values were obtained by exhaustive search over the full binary32 range, and are thus rigorous upper bounds. The input values that exhibit these largest known errors are also given.
Note: the material for this paper (LaTeX source and programs) is available here.
Errata: AMD LibM does not provide the Bessel functions (j0, j1, y0, y1) and erfc, both in single and double precision. The corresponding entries in all versions up to August 2022 should thus read "NA" (Not Available). In version from February 2024, replace "We also get large errors for expm1f in AMD LibM, and for cospif, tanpif, powf in FreeBSD" by "We also get large errors for expm1f in AMD LibM".

FastTwoSum revisited, with Claude-Pierre Jeannerod, 8 pages, March 2025. Final version published in the proceedings of the 32nd IEEE International Symposium on Computer Arithmetic (ARITH 2025).

The FastTwoSum algorithm is a classical way to evaluate the rounding error that occurs when adding two numbers in finite precision arithmetic. Starting with Dekker in the early 1970s, numerous floating-point analyses have been made of this algorithm, that are aimed at identifying sufficient conditions for the error to be computed exactly and, otherwise, at quantifying the quality of the error estimate thus produced. In this paper we revisit these two aspects of FastTwoSum. We first provide new, less restrictive conditions for exactness, and show that FastTwoSum performs an error-free transform in more general situations than those found so far in the literature. Second, when exactness cannot be guaranteed we give several error analyses of the output of FastTwoSum and show that the bounds obtained are tight. In particular, this provides further insight into how the algorithm behaves when roundings other than 'to nearest' are used, or when the operands are reversed.

Correct Rounding in Double Extended Precision, with Sélène Corbineau, 8 pages, January 2025. Final version published in the proceedings of the 32nd IEEE International Symposium on Computer Arithmetic (ARITH 2025).

The double extended precision format is an 80-bit floating-point format introduced in the 80x87 series of floating- point processors by Intel. Since the introduction of vector instructions in the x86 processors, its use has fallen due to speed concerns. We implement the first correctly-rounded routines for double extended precision. These implementations use modern microprocessor features and double-double arithmetic, avoiding x87-specific features, and achieve up to 2x speedup over state-of- the-art implementations which are not correctly rounded. This demonstrates that double extended precision could be viable as a large computational format.

2024:

Computing hard-to-round cases of Gamma for binary64, 6 pages, January 2025.

This note explains how we computed the hard-to-round cases of the Gamma function for the binary64 format. We are looking for hard-to-round cases with at least 42 identical bits after the round bit.

Accuracy of Complex Mathematical Operations and Functions in Single and Double Precision, with Paul Caprioli and Vincenzo Innocente, 8 pages, September 2024.

The IEEE Standard for Floating-Point Arithmetic requires correct rounding for basic arithmetic operations (addition, subtraction, multiplication, division, and square root) on real floating-point numbers, but there is no such requirement for the corresponding operations on complex floating- point numbers. Furthermore, while the accuracy of mathematical functions has been studied in various libraries for different IEEE real floating-point formats, we are unaware of similar studies for complex-valued functions

Quadratic Short Division, with Juraj Sukop, May 2024.

In Modern Computer Arithmetic, the authors describe a quadratic division with remainder, and mention a factor of two speedup can be obtained when only the quotient is needed. We give an explicit quadratic algorithm that computes an approximate quotient.

Note on the Veltkamp/Dekker Algorithms with Directed Roundings, preprint, February 2024.

The Veltkamp/Dekker algorithms are very useful for double-double arithmetic, when no fused multiply-add is available in hardware. Their analysis is well-known for rounding to nearest-even. We study how they behave with directed roundings in radix 2.

2023:

Cipher key used by king Charles IX to his ambassador in Spain, Fourquevaux, reconstructed on November 9, 2023, and completed on April 8, 2025, thanks to a letter found by Camille Desenclos.

Towards a correctly-rounded and fast power function in binary64 arithmetic, with Tom Hubrecht and Claude-Pierre Jeannerod, extended version (with proofs) of an article published in the proceedings of Arith 2023, 23 pages, July 2023 [extended version with full proofs]

We design algorithms for the correct rounding of the power function x^y in the binary64 IEEE 754 format, for all rounding modes, modulo the knowledge of hardest-to-round cases. Our implementation of these algorithms largely outperforms previous correctly-rounded implementations and is not far from the efficiency of current mathematical libraries, which are not correctly-rounded. Still, we expect our algorithms can be further improved for speed. The proofs of correctness are fully detailed, with the goal to enable a formal proof of these algorithms. We hope this work will motivate the next IEEE 754 revision committee to require correct rounding for mathematical functions.

Deciphering Charles Quint (A diplomatic letter from 1547), with Cécile Pierrot, Camille Desenclos and Pierrick Gaudry, Proceedings of Histocrypt 2023, pages 148-159, 2023.

An unknown and almost fully encrypted letter written in 1547 by Emperor Charles V to his ambassador at the French Court, Jean de Saint-Mauris, was identified in a public library, the Bibliothèque Stanislas (Nancy, France). As no decryption of this letter was previously published or even known, a team of cryptographers and historians gathered together to study the letter and its encryption system. First, multiple approaches and methods were tested in order to decipher the letter without any other specimen. Then, the letter has now been inserted within the whole correspondence between Charles and Saint-Mauris, and the key has been consolidated thanks to previous key reconstructions. Finally, the decryption effort enabled us to uncover the content of the letter and investigate more deeply both cryptanalysis challenges and encryption methods.

Note on FastTwoSum with Directed Rounding, preprint, 3 pages, 2023, revised July 2024 with Sélène Corbineau.

In [1], Graillat and Jézéquel prove a bound on the maximal error for the FastTwoSum algorithm with directed roundings. We improve that bound by a factor 2, and show that FastTwoSum is exact when the exponent difference of the inputs does not exceed the current precision.

2022:

Déchiffrement de la lettre de Stanislas au comte d'Heudicourt datée du 23 décembre 1724, avec Clément Dallé, 2022 [in french].

This letter is located at the Bibliothèque Stanislas in Nancy. The cipher used is a variant of Cesar's cipher, but with a shift of 2 instead of 3, and an encoding of letter and digrams by 1- and 2-digit numbers.

The CORE-MATH Project, with Alexei Sibidanov and Stéphane Glondu, Proceedings of the 29th IEEE Symposium on Computer Arithmetic (ARITH 2022), 2022. Best paper award.

The CORE-MATH project aims at providing open-source mathematical functions with correct rounding that can be integrated into current mathematical libraries. This article demonstrates the CORE-MATH methodology on two functions: the binary32 power function (powf) and the binary64 cube root function (cbrt). CORE-MATH already provides a full set of correctly rounded C99 functions for single precision (binary32). These functions provide similar or in some cases up to threefold speedups with respect to the GNU libc, which is not correctly rounded. This work offers a prospect of the mandatory requirement of correct rounding for mathematical functions in the next revision of the IEEE-754 standard.

The State of the Art in Integer Factoring and Breaking Public-Key Cryptography, with Fabrice Boudot, Pierrick Gaudry, Aurore Guillevic, Nadia Heninger, Emmanuel Thomé, IEEE Security and Privacy, volume 20, number 2, pages 80-86, 2022. [HAL]

In this column, we will review the current state of the art of cryptanalysis for three number-theoretic problems using classical (nonquantum) computers, including, in particular, our most recent computational records for integer factoring and prime-field discrete logarithms.

2021:

Three Cousins of Recamán’s Sequence, with Max A. Alekseyev, Joseph Samuel Myers, Richard Schroeppel, S. R. Shannon, and N. J. A. Sloane, The Fibonacci Quarterly, volume 60, number 3, pages 201-219, August 2022. [HAL]

Although 10²³⁰ terms of Recamán's sequence have been computed, it remains a mystery. Here three distant cousins of that sequence are described, one of which is also mysterious. (i) {A(n), n ≥ 3} is defined as follows. Start with n, and add n+1, n+2, n+3, ..., stopping after adding n+k if the sum n + (n+1) + ... + (n+k) is divisible by n+k+1. Then A(n)=k. We determine A(n) and show that A(n) ≤ n² - 2n - 1. (ii) {B(n), n ≥ 1} is a multiplicative analog of {A(n)}. Start with n, and successively multiply by n+1, n+2, ..., stopping after multiplying by n+k if the product n(n+1) ... (n+k) is divisible by n+k+1. Then B(n)=k. We conjecture that log² B(n) = (1/2+o(1)) log n log log n. (iii) The third sequence, {C(n), n ≥ 1}, is the most interesting, because the most mysterious. Concatenate the decimal digits of n, n+1, n+2, ... until the concatenation n || n+1 || ... || n+k is divisible by n+k+1. Then C(n)=k. If no such k exists we set C(n) = -1. We have found C(n) for all n ≥ 1000 except for two cases. Some of the numbers involved are quite large. For example, C(92) = 218128159460, and the concatenation 92 || 93 || ... || (92+C(92)) is a number with about 2 * 10¹² digits. We have only a probabilistic argument that such a k exists for all n.

Nouveaux records de factorisation et de calcul de logarithme discret, with F. Boudot, P. Gaudry, A. Guillevic, N. Heninger and E. Thomé, Techniques de l'ingénieur, 17 pages, 2021 [in french]. [HAL]

Cet article décrit deux nouveaux records établis fin 2019 : un record de factorisation d’entier avec la factorisation du nombre RSA-240, et un record de calcul de logarithme discret de même taille. Ces deux records correspondent à des nombres de 795 bits, soit 240 chiffres décimaux, et ont été établis avec le même logiciel libre (CADO-NFS), sur le même type de processeurs. Ces records servent de référence pour les recommandations en termes de taille de clé pour les protocoles cryptographiques.

2020:

Parallel Structured Gaussian Elimination for the Number Field Sieve, with Charles Bouillaguet, Mathematical Cryptology, volume 0, number 1, pages 22-39, 2020.

This article describes a parallel algorithm for the Structured Gaussian Elimination step of the Number Field Sieve (NFS). NFS is the best known method for factoring large integers and computing discrete logarithms. State-of-the-art algorithms for this kind of partial sparse elimination, as implemented in the CADO-NFS software tool, were unamenable to parallel implementations. We therefore designed a new algorithm from scratch with this objective and implemented it using OpenMP. The result is not only faster sequentially, but scales reasonably well: using 32 cores, the time needed to process two landmark instances went down from 38 minutes to 20 seconds and from 6.7 hours to 2.3 minutes, respectively.

CORE-MATH, research project submitted as Advanced Grant Proposal to the European Research Council, August 2020. This project was judged ``too narrowly focused and [that it] would have limited impact'' by the ERC evaluation panel.

Comparing the difficulty of factorization and discrete logarithm: a 240-digit experiment, with F. Boudot, P. Gaudry, A. Guillevic, N. Heninger and E. Thomé, proceedings of Crypto 2020, LNCS 12171, 30 pages. [HAL]

We report on two new records: the factorization of RSA-240, a 795-bit number, and a discrete logarithm computation over a 795-bit prime field. Previous records were the factorization of RSA-768 in 2009 and a 768-bit discrete logarithm computation in 2016. Our two computations at the 795-bit level were done using the same hardware and software, and show that computing a discrete logarithm is not much harder than a factorization of the same size. Moreover, thanks to algorithmic variants and well-chosen parameters, our computations were significantly less expensive than anticipated based on previous records.
This work was partially supported by a PRACE grant, and was published in the PRACE success stories, and in the Results of the Gauss Center for Supercomputing.

2019:

Recovering Hidden SNFS Polynomials, note, 2 pages, October 2019.

Given an integer N constructed with an SNFS trapdoor, i.e., such that N = |Res(f,g)| with f = a_dx^d + a_d-1x^d-1 + ... + a₀ having small coefficients a_i = O(B), and g = lx - m, we can recover f and g in O(B F(l)) arithmetic operations, assuming B² l² ≪ a_d m, where F(l) is the number of arithmetic operations to find a prime factor l. This partially answers an open problem from [1].

A New Ranking Function for Polynomial Selection in the Number Field Sieve, with Nicolas David, Contemporary Mathematics, volume 754, pages 315-325, special issue "75 Years of Mathematics of Computation", Susanne C. Brenner, Igor Shparlinski, Chi-Wang Shu, Daniel B. Szyld, eds., American Mathematical Society, 2020. [DOI]

This article explains why the classical Murphy-E ranking function might fail to correctly rank polynomial pairs in the Number Field Sieve, and proposes a new ranking function.

Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice, with David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, and Santiago Zanella-Béguelin, Research Highlights of Communications of the ACM, volume 62, number 1, pages 106-114, January 2019. [CACM page] [HAL]

We investigate the security of Diffie-Hellman key exchange as used in popular Internet protocols and find it to be less secure than widely believed. First, we present Logjam, a novel flaw in TLS that lets a man-in-the-middle downgrade connections to "export-grade" Diffie-Hellman. To carry out this attack, we implement the number field sieve discrete logarithm algorithm. After a week-long precomputation for a specified 512-bit group, we can compute arbitrary discrete logarithms in that group in about a minute. We find that 82% of vulnerable servers use a single 512-bit group, and that 8.4% of Alexa Top Million HTTPS sites are vulnerable to the attack.a In response, major browsers have changed to reject short groups. We go on to consider Diffie-Hellman with 768- and 1024- bit groups. We estimate that even in the 1024-bit case, the computations are plausible given nation-state resources. A small number of fixed or standardized groups are used by millions of servers; performing precomputation for a single 1024-bit group would allow passive eavesdropping on 18% of popular HTTPS sites, and a second group would allow decryption of traffic to 66% of IPsec VPNs and 26% of SSH servers. A close reading of published NSA leaks shows that the agency's attacks on VPNs are consistent with having achieved such a break. We conclude that moving to stronger key exchange methods should be a priority for the Internet community.

2018:

On various ways to split a floating-point number, with Claude-Pierre Jeannerod and Jean-Michel Muller, proceedings of ARITH'25, June 2018. [HAL]

We review several ways to split a floating-point number, that is, to decompose it into the exact sum of two floating-point numbers of smaller precision. All the methods considered here involve only a few floating-point operations with rounding to nearest, including possibly the fused multiply-add (FMA). Applications range from the implementation of integer functions such as floor or round to ``scaling'' of calculations (such as sqrt(a²+b²)) to avoid spurious underflow and overflow.
C programs used for the "experimental results" section: simul1.c, simul2.c, simul3.c.

Computational Mathematics with SageMath, with Alexandre Casamayou, Nathann Cohen, Guillaume Connan, Thierry Dumont, Laurent Fousse, François Maltey, Matthias Meulien, Marc Mezzarobba, Clément Pernet, Nicolas M. Thiéry, Erik Bray, John Cremona, Marcelo Forets, Alexandru Ghitza, Hugh Thomas, SIAM textbook, 2018. [HAL]

This is the english translation of the book "Calcul mathématique avec Sage" which was published in 2013. The examples were updated from Sage 5.9 to Sage 8.3.

2017:

FFT extension for algebraic-group factorization algorithms, with Richard P. Brent and Alexander Kruppa, chapter of the book Topics in Computational Number Theory Inspired by Peter L. Montgomery, Cambridge University Press, 2017 [HAL]. Other chapters of the book are available online on this page.

Optimized Binary64 and Binary128 Arithmetic with GNU MPFR, with Vincent Lefèvre, proceedings of the 24th IEEE Symposium on Computer Arithmetic (ARITH 24), London, UK, July 24-26, 2017 [HAL].

We describe algorithms used to optimize the GNU MPFR library when the operands fit into one or two words. On modern processors, this gives a speedup for a correctly rounded addition, subtraction, multiplication, division or square root in the standard binary64 format (resp. binary128) between 1.8 and 3.5 (resp. between 1.6 and 3.2). We also introduce a new faithful rounding mode, which enables even faster computations. Those optimizations will be available in version 4 of MPFR.

2016:

Computing the ρ constant, with Jérémie Detrey and Pierre-Jean Spaenlehauer, preprint, 3 pages, October 2016.

At the 4th Number Theory Down Under conference (Newcastle, Australia) in September 2016, Timothy Trudgian gave a talk entitled ``A Tale of Two Omegas'' during which he challenged the audience to give the best possible approximation of a constant ρ. The winner of the bet was Shi Bai who proposed ρ = 0.75 and won 9 Australian dollars. We give a closed formula for ρ, and a 1000-digit approximation.
The program using GMP and MPFR we used to compute ρ is available here.

Factorisation of RSA-220 with CADO-NFS, with Shi Bai, Pierrick Gaudry, Alexander Kruppa and Emmanuel Thomé, 3 pages, May 2016. [HAL]. RSA-220 is part of the RSA Factoring Challenge.

We report on the factorization of RSA-220 (220 decimal digits), which is the 3rd largest integer factorization with the General Number Field Sieve (GNFS), after the factorization of RSA-768 (232 digits) in December 2009, and that of 3⁶⁹⁷+1 (221 digits) in February 2015 by NFS@home.

Twelve New Primitive Binary Trinomials, with Richard P. Brent, preprint, 2 pages, 2016 [arxiv, HAL]

We exhibit twelve new primitive trinomials over GF(2) of record degrees 42,643,801, 43,112,609, and 74,207,281. In addition we report the first Mersenne exponent not ruled out by Swan's theorem --- namely 57,885,161 --- for which none primitive trinomial exists. This completes the search for the currently known Mersenne prime exponents.

2015:

Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice, with David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Beguelin, May 2015, to appear in the proceedings of CCS 2015. [HAL] [talk slides]. Best paper award.

We investigate the security of Diffie-Hellman key exchange as used in popular Internet protocols and find it to be less secure than widely believed. First, we present a novel flaw in TLS that allows a man-in-the-middle to downgrade connections to export-grade Diffie-Hellman. To carry out this attack, we implement the number field sieve discrete log algorithm. After a week-long precomputation for a specified 512-bit group, we can compute arbitrary discrete logs in this group in minutes. We find that 82% of vulnerable servers use a single 512-bit group, allowing us to compromise connections to 7% of Alexa Top Million HTTPS sites. In response, major browsers are being changed to reject short groups. More details here. See also the CNRS press releases in english and french.
Note added on June 21st, 2016: our estimation for DH-768 in Table 2 was pessimistic, since Thorsten Kleinjung did such a computation with 4000 core-years of sieving (instead of 8000), 900 core years for linear algebra (instead of 28,500) for a matrix of 24M rows (instead of 150M). The DH-1024 estimation is probably pessimistic too.

Automatic Analysis, 5 pages. This is a preliminary version of a chapter for a book about collected works of Philippe Flajolet. I wrote that chapter in March to July 2012. While the book is not yet published, I make this text available here.

Magic Squares of Squares, with Paul Pierrat and François Thiriet, 2015.

We give modular properties and new classes of potential solutions for magic squares of squares and similar problems.

2014:

Beyond Double Precision, research project submitted as Advanced Grant Proposal to the European Research Council, October 2014. This project was judged ``of high quality but not sufficient to pass to Step 2 of the evaluation''.

Better Polynomials for GNFS, with Shi Bai, Cyril Bouvier, and Alexander Kruppa, Mathematics of Computation, volume 85, pages 861-873, 2016 (preprint from September 2014). [HAL]

The general number field sieve (GNFS) is the most efficient algorithm known for factoring large integers. It consists of several stages, the first one being polynomial selection. The quality of the selected polynomials can be modelled in terms of size and root properties. We propose a new kind of polynomials for GNFS: with a new degree of freedom, we further improve the size property. We demonstrate the efficiency of our algorithm by exhibiting a better polynomial than the one used for the factorization of RSA-768, and a polynomial for RSA-1024 that outperforms the best published one.

Note: the size-optimized RSA-1024 polynomial B₁₀₂₄ can be reproduced using CADO-NFS using the command sopt -n 135...563 -f rsa1024.poly1 -d 6 with that rsa1024.poly1 file.

2013:

Calcul mathématique avec Sage [in french] with Alexandre Casamayou, Guillaume Connan, Thierry Dumont, Laurent Fousse, François Maltey, Matthias Meulien, Marc Mezzarobba, Clément Pernet and Nicolas M. Thiéry, 2010 [HAL].

This book shows how to solve several problems in mathematics, and gives examples using the Sage computer algebra system.
Note added December 21st, 2011: a preliminary revised version (1.0.9) is available here.

Division-Free Binary-to-Decimal Conversion, with Cyril Bouvier, IEEE Transactions on Computers, volume 63, number 8, pages 1895-1901, August 2014. [HAL]

This article presents algorithms that convert multiple precision integer or floating-point numbers from radix 2 to radix 10 (or to any radix b > 2). Those algorithms, based on the ``scaled remainder tree'' technique, use multiplications instead of divisions in their critical part. Both quadratic and subquadratic algorithms are detailed, with proofs of correctness. Experimental results show that our implementation of those algorithms outperforms the GMP library by up to 50% (using the same low-level routines).
Errata: in Algorithm 3, the formula for g at step 20 is wrong. It should be g = max(k_t, ceil(log2((k-3)/(k_t-3)))) (reported by Juraj Sukop, April 8, 2024).

Discrete logarithm in GF(2⁸⁰⁹) with FFS, with Razvan Barbulescu, Cyril Bouvier, Jérémie Detrey, Pierrick Gaudry, Hamza Jeljeli, Emmanuel Thomé and Marion Videau, proceedings of PKC 2014, Lecture Notes in Computer Science Volume 8383, pages 221-238, 2014. [HAL]

We give details on solving the discrete logarithm problem in the 202-bit prime order subgroup of GF(2⁸⁰⁹) using the Function Field Sieve algorithm (FFS).

2012:

Factorization of RSA-704 with CADO-NFS, with Shi Bai and Emmanuel Thomé, preprint, July 2012. [HAL]

We give details of the factorization of RSA-704 with CADO-NFS. This is a record computation with publicly available software tools. The aim of this experiment was to stress CADO-NFS --- which was originally designed for 512-bit factorizations --- for larger inputs, and to identify possible rooms of improvement.

Size Optimization of Sextic Polynomials in the Number Field Sieve, with Shi Bai, preprint, March 2012, revised June 2013. [HAL]

The general number field sieve (GNFS) is the most efficient algorithm known for factoring large integers. It consists of several stages, the first one being polynomial selection. The quality of the chosen polynomials in polynomial selection can be modelled in terms of size and root properties. In this paper, we describe some methods to optimize the size properties of sextic polynomials.
To reproduce the example on page 12, see this Sage file.
Note added January 23, 2015: part of the results of this preprint are given in Better Polynomials for GNFS (see above).

Avoiding adjustments in modular computations, preprint, March 2012.

We consider a sequence of operations (additions, subtractions, multiplications) modulo a fixed integer N, where only the final value is needed, therefore intermediate computations might use any representation. This kind of computation appears for example in number theoretic transforms (NTT), in stage 1 of the elliptic curve method for integer factorization, in modular exponentiation, ... Our aim is to avoid as much as possible adjustment steps, which consist in adding or subtracting N, since those steps are useless in the mathematical sense.
Note added July 17, 2012: the fact of using residues larger than N in Montgomery multiplication is well known. See for example the article "Software Implementation of Modular Exponentiation Using Advanced Vector Instructions Architectures" by Shay Gueron and Vlad Krasnov in the proceedings of WAIFI 2012, where it is called "Non Reduced Montgomery Multiplication" (NRMM).
Note added June 26, 2018: for Montgomery multiplication, earlier papers are by H. Orup ("Simplifying quotient determination in high-radix modular multiplication", Arith 12, 1999) and C. D. Walter ("Montgomery exponentiation needs no final subtraction", Electronics Letters, 1999).

Finding Optimal Formulae for Bilinear Maps, with Razvan Barbulescu, Jérémie Detrey and Nicolas Estibals, proceedings of WAIFI 2012, Bochum, Germany, July 16-19, LNCS 7369, pages 168-186, 2012.

We describe a unified framework to search for optimal formulae evaluating bilinear --- or quadratic --- maps. This framework applies to polynomial multiplication and squaring, finite field arithmetic, matrix multiplication, etc. We then propose a new algorithm to solve problems in this unified framework. With an implementation of this algorithm, we prove the optimality of various published upper bounds, and find improved upper bounds.
The file bilinear.sage is an implementation of Algorithm 1 in Sage with some examples (updated version for Sage 9.5).

2011:

Maximal Determinants and Saturated D-optimal Designs of Orders 19 and 37, with Richard P. Brent, William Orrick, and Judy-anne Osborn, 28 pages.

A saturated D-optimal design is a {+1,-1} square matrix of given order with maximal determinant. We search for saturated D-optimal designs of orders 19 and 37, and find that known matrices due to Smith, Cohn, Orrick and Solomon are optimal. For order 19 we find all inequivalent saturated D-optimal designs with maximal determinant, 2³⁰ * 7² * 17, and confirm that the three known designs comprise a complete set. For order 37 we prove that the maximal determinant is 2³⁹ * 3³⁶, and find a sample of inequivalent saturated D-optimal designs. Our method is an extension of that used by Orrick to resolve the previously smallest unknown order of 15; and by Chadjipantelis, Kounias and Moyssiadis to resolve orders 17 and 21. The method is a two-step computation which first searches for candidate Gram matrices and then attempts to decompose them. Using a similar method, we also find the complete spectrum of determinant values for {+1,-1} matrices of order 13.
Will Orrick compiles known results about The Hadamard maximal determinant problem. Related integer sequences are A003432 and A003433.
Note: Richard Brent wrote a paper showing how to generate many Hadamard equivalence classes of solutions from a given Gram matrix.

Numerical Approximation of the Masser-Gramain Constant to Four Decimal Digits: delta=1.819..., with Guillaume Melquiond and W. Georg Nowak, Mathematics of Computation, volume 82, number 282, pages 1235-1246, 2013. [HAL entry]

We prove that the constant studied by Masser, Gramain, and Weber, satisfies 1.819776 < delta < 1.819833, and disprove a conjecture of Gramain. This constant is a two-dimensional analogue of the Euler-Mascheroni constant; it is obtained by computing the radius r_k of the smallest disk of the plane containing k Gaussian integers. While we have used the original algorithm for smaller values of k, the bounds above come from methods we developed to obtain guaranteed enclosures for larger values of k.

Ballot stuffing in a postal voting system, with Véronique Cortier, Jérémie Detrey, Pierrick Gaudry, Frédéric Sur, Emmanuel Thomé and Mathieu Turuani, proceedings of REVOTE 2011, International Workshop on Requirements Engineering for Electronic Voting Systems, Trento, Italy, August 29, 2011, pages 27-36.

We review a postal voting system used in spring 2011 by the French research institute CNRS and designed by a French company (Tagg Informatique). We explain how the structure of the material can be easily understood out of a few samples of voting material (distributed to the voters), without any prior knowledge of the system. Taking advantage of some flaws in the design of the system, we show how to perform major ballot stuffing, making possible to change the outcome of the election. Our attack has been tested and confirmed by the CNRS. A fixed postal voting system has been quickly proposed by Tagg Informatique in collaboration with the CNRS, preventing this attack for the next elections.

Short Division of Long Integers, with David Harvey, proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH 20), Tuebingen, July 25-27, 2011, pages 7-14. [HAL entry, DOI]

We consider the problem of short division --- i.e., approximate quotient --- of multiple-precision integers. We present ready-to-implement algorithms that yield an approximation of the quotient, with tight and rigorous error bounds. We exhibit speedups of up to 30% with respect to GMP division with remainder, and up to 10% with respect to GMP short division, with room for further improvements. This work enables one to implement fast correctly rounded division routines in multiple-precision software tools.

Note added July 27, 2011: the algorithm used by GMP 5 (implemented in the mpn_div_q function) was presented by Torbjörn Granlund in his invited talk at the ICMS 2006 conference.

2010:

Non-Linear Polynomial Selection for the Number Field Sieve [doi], with Thomas Prest, Journal of Symbolic Computation, special issue in the honour of Joachim von zur Gathen, volume 47, number 4, pages 401-409, 2012. [HAL]

We present an algorithm to find two non-linear polynomials for the Number Field Sieve integer factorization method. This algorithm extends Montgomery's "two quadratics" method; for degree 3, it gives two skewed polynomials with resultant O(N^5/4), which improves on Williams O(N^4/3) result.

Note added in June 2011: Namhun Koo, Gooc Hwa Jo and Soonhak Kwon extended our algorithm in this preprint.
Note added on September 30, 2011: Nicholas Coxon extended and analysed our algorithm in this preprint.
Note added on April 5, 2018: as noticed by Georgina Canny, the polynomials in the example for Montgomery's two quadratics method should be reversed. However, the reverse polynomials also work, since they have as common root 1/m modulo N.

Modern Computer Arithmetic, with Richard Brent, Cambridge University Press, 2010, our page of the book, [HAL entry].

This book collects in the same document all state-of-the-art algorithms in multiple precision arithmetic (integers, integers modulo n, floating-point numbers). The best current reference on that topic is volume 2 from Knuth's The art of computer programming, which misses some new important algorithms (divide and conquer division, other variants of FFT multiplication, floating-point algorithms, ...) Our aim is to give detailed algorithms:
- for all operations (not just multiplication as many text books),
- for all size ranges (not just schoolbook methods or FFT-based methods),
- and including all details (for example how to properly deal with carries for integer algorithms, or a rigorous analysis of roundoff errors for floating-point algorithms).
The book would be useful for graduate students in computer science and mathematics (perhaps too specialized for most undergraduates, at least in its present state), researchers in discrete mathematics, computer algebra, number theory, cryptography, and developers of multiple-precision libraries.

Reliable Computing with GNU MPFR, proceedings of the 3rd International Congress on Mathematical Software (ICMS 2010), June 2010, pages 42-45, LNCS 6327, Springer. The original publication is (or will be) available on www.springerlink.com.

This article presents a few applications where reliable computations are obtained using the GNU MPFR library.

Why and how to use arbitrary precision, with Kaveh R. Ghazi, Vincent Lefèvre and Philippe Théveny, March 2010, Computing in Science and Engineering, volume 12, number 3, pages 62-65, 2010 (© IEEE).

Most nowadays floating-point computations are done in double precision, i.e., with a significand (or mantissa) of 53 bits. However, some applications require more precision: double-extended (64 bits or more), quadruple precision (113 bits) or even more. In an article published in The Astronomical Journal in 2001, Toshio Fukushima says: In the days of powerful computers, the errors of numerical integration are the main limitation in the research of complex dynamical systems, such as the long-term stability of our solar system and of some exoplanets [...] and gives an example where using double precision leads to an accumulated round-off error of more than 1 radian for the solar system! Another example where arbitrary precision is useful is static analysis of floating-point programs running in electronic control units of aircrafts or in nuclear reactors.

An O(M(n) log n) algorithm for the Jacobi symbol, with Richard Brent, January 2010, Proceedings of the Ninth Algorithmic Number Theory Symposium (ANTS-IX), Nancy, France, July 19-23, 2010, LNCS 6197, pages 83-95, Springer Verlag [the original publication is or will be available at www.springerlink.com].

The best known algorithm to compute the Jacobi symbol of two n-bit integers runs in time O(M(n) log n), using Schönhage's fast continued fraction algorithm combined with an identity due to Gauss. We give a different O(M(n) log n) algorithm based on the binary recursive gcd algorithm of Stehlé and Zimmermann. Our implementation -- which to our knowledge is the first to run in time O(M(n) log n) -- is faster than GMP's quadratic implementation for inputs larger than about 10000 decimal digits.
Note: the subquadratic code mentioned in the paper is available here.
Note added July 19, 2019: Niels Möller published on arxiv a description of the algorithm he implemented in GMP in 2010.

Factorization of a 768-bit RSA modulus, with Thorsten Kleinjung, Kazumaro Aoki, Jens Franke, Arjen K. Lenstra, Emmanuel Thomé, Joppe W. Bos, Pierrick Gaudry, Alexander Kruppa, Peter L. Montgomery, Dag Arne Osvik, Herman te Riele and Andrey Timofeev, Proceedings of Crypto'2010, Santa Barbara, USA, LNCS 6223, pages 333-350, 2010 [technical announcement].

This paper reports on the factorization of the 768-bit number RSA-768 by the number field sieve factoring method and discusses some implications for RSA [more details here]
Note added 21 September 2012: in Section 2.3 (Sieving) we report 47 762 243 404 unique relations (including free relations). It appears the correct number should be about 10⁹ less, i.e., 46 762 246 508 including free relations, and 46 705 023 046 without free relations.
Note added 09 October 2012: in Section 2.3 (Sieving) the number of remaining prime ideals we report during the filtering (initially 35.3G, then 14.5G and 10G) are most probably underestimated by about 5G.

The Great Trinomial Hunt, with Richard Brent, Notices of the American Mathematical Society, volume 58, number 2, pages 233-239, February 2011.

2009:

The glibc bug #10709, September 2009. [bugzilla entry]

On computers without double-extended precision, the GNU libc 2.10.1 incorrectly rounds the sine of (the double-precision closest to) 0.2522464. This is a bug in IBM's Accurate Mathematical Library, which claims correct rounding, as recommended by IEEE 754-2008. We analyze this bug and propose a fix.

Calcul formel : mode d'emploi. Exemples en Maple, with Philippe Dumas, Claude Gomez, Bruno Salvy, March 2009 (in french) [HAL entry].

This book is a free version of the book of the same name published by Masson in 1995. The examples use an obsolete version of Maple (V.3), but most of the text still applies to Maple and other modern computer algebra systems.

Computing predecessor and successor in rounding to nearest, with Siegfried Rump, Sylvie Boldo and Guillaume Melquiond, BIT Numerical Mathematics, volume 49, number 2, pages 419-431, 2009.

We give simple and efficient methods to compute and/or estimate the predecessor and successor of a floating-point number using only floating-point operations in rounding to nearest. This may be used to simulate interval operations, in which case the quality in terms of the diameter of the result is significantly improved compared to existing approaches.
Note added January 31, 2018: Jean-Michel Muller found a error in Remark 1 following Theorem 2.2, where we say that in the range [1/2,2]*eta/u, Algorithm 1 returns csup = succ(succ(c)). This is wrong. For example, in the IEEE 754 binary32 format, this range is [2^-126,2^-124]. For c = 2^-125 - 2^-149, Algorithm 1 returns csup = succ(c).

2008:

Worst Cases for the Exponential Function in the IEEE 754r decimal64 Format, with Vincent Lefèvre and Damien Stehlé, LNCS volume 5045, pages 114-126, special LNCS issue following the Dagstuhl seminar 06021: Reliable Implementation of Real Number Algorithms: Theory and Practice, August 2008,

We searched for the worst cases for correct rounding of the exponential function in the IEEE 754r decimal64 format, and computed all the bad cases whose distance from a breakpoint (for all rounding modes) is less than 10^-15 ulp, and we give the worst ones. In particular, the worst case for |x| ≥ 3 * 10^-11 is exp(9.407822313572878 * 10^-2) = 1.09864568206633850000000000000000278... This work can be extended to other elementary functions in the decimal64 format and allows the design of reasonably fast routines that will evaluate these functions with correct rounding, at least in some domains.
[Complete lists of worst cases for the exponential are available for the IEEE 754r decimal32 and decimal64 formats.]

Ten New Primitive Binary Trinomials, with Richard Brent, Mathematics of Computation 78 (2009), pages 1197-1199 [Brent's web page].

We exhibit ten new primitive trinomials over GF(2) of record degrees 24036583, 25964951, 30402457, and 32582657. This completes the search for the currently known Mersenne prime exponents.

Implementation of the reciprocal square root in MPFR, March 2008 (extended abstract), Dagstuhl Seminar Proceedings following Dagstuhl seminar 08021 (Numerical validation in current hardware architectures), January 06-11, 2008.

We describe the implementation of the reciprocal square root --- also called inverse square root --- as a native function in the MPFR library. The difficulty is to implement Newton's iteration for the reciprocal square root on top's of GNU MP's mpn layer, while guaranteeing a rigorous 1/2 ulp bound on the roundoff error.

Landau's function for one million billions, with Marc Deléglise and Jean-Louis Nicolas, Journal de Théorie des Nombres de Bordeaux, volume 20, number 3, pages 625-671, 2008. A Maple program implementing the algorithm described in this paper is available from Jean-Louis Nicolas web page.

Let S_n denote the symmetric group with n letters, and g(n) the maximal order of an element of S_n. If the standard factorization of M into primes is M=q₁^a₁ q₂^a₂ ... q_k^a_k, we define l(M) to be q₁^a₁ + q₂^a₂ + ... + q_k^a_k; one century ago, E. Landau proved that g(n)=max_{l(M) ≤ n} M and that, when n goes to infinity, log g(n) ~ sqrt(n log(n)). There exists a basic algorithm to compute g(n) for 1 ≤ n ≤ N; its running time is O(N^3/2 / sqrt(log N)) and the needed memory is O(N); it allows computing g(n) up to, say, one million. We describe an algorithm to calculate g(n) for n up to 10¹⁵. The main idea is to use the so-called l-superchampion numbers. Similar numbers, the superior highly composite numbers, were introduced by S. Ramanujan to study large values of the divisor function tau(n)=sum_{d divides n} 1.

Faster Multiplication in GF(2)[x], with Richard P. Brent, Pierrick Gaudry and Emmanuel Thomé, Proceedings of the Eighth Algorithmic Number Theory Symposium (ANTS-VIII), May 17-22, 2008, Banff Centre, Banff, Alberta (Canada), A. J. van der Poorten and A. Stein, editors, pages 153--166, LNCS 5011, 2008. A preliminary version appeared as INRIA Research Report, November 2007.

In this paper, we discuss an implementation of various algorithms for multiplying polynomials in GF(2)[x]: variants of the window methods, Karatsuba's, Toom-Cook's, Schönhage's and Cantor's algorithms. For most of them, we propose improvements that lead to practical speedups.
The code that we developed for this paper is contained in the gf2x package, available under the GNU General Public License from https://gitlab.inria.fr/gf2x/gf2x.
Note (added 20 May 2008): part of this paper will be soon obsolete, namely the base case section, with the PCLMULQDQ instruction from Intel's new AVX instruction-set (compiler intrinsics, simulator, compiler support).
Note (added 26 Jan 2009): Gao and Mateer have found a theoretical speedup in Cantor multiplication, see http://cr.yp.to/f2mult.html.
Note (added 29 Nov 2011): Su and Fan analyze the use of PCLMULQDQ in http://eprint.iacr.org/2011/589.

2007:

Arithmétique entière, cours aux JNCF 2007, [in french].

A Multi-level Blocking Distinct Degree Factorization Algorithm, INRIA Research Report 6331, with Richard P. Brent, 16 pages, October 2007. This paper describes in detail the algorithm presented at the 8th International Conference on Finite Fields and Applications (Fq8), July 9-13, 2007, Melbourne, Australia [extended abstract], [Richard's slides]. A revised version appeared in a special issue of Contemporary Mathematics, volume 461, pages 47-58, 2008.

We give a new algorithm for performing the distinct-degree factorization of a polynomial P(x) over GF(2), using a multi-level blocking strategy. The coarsest level of blocking replaces GCD computations by multiplications, as suggested by Pollard (1975), von zur Gathen and Shoup (1992), and others. The novelty of our approach is that a finer level of blocking replaces multiplications by squarings, which speeds up the computation in GF(2)[x]/P(x) of certain interval polynomials when P(x) is sparse. As an application we give a fast algorithm to search for all irreducible trinomials x^r + x^s + 1 of degree r over GF(2), while producing a certificate that can be checked in less time than the full search. Naive algorithms cost O(r²) per trinomial, thus O(r³) to search over all trinomials of given degree r. Under a plausible assumption about the distribution of factors of trinomials, the new algorithm has complexity O(r² (log r)^3/2 (log log r)^1/2) for the search over all trinomials of degree r. Our implementation achieves a speedup of greater than a factor of 560 over the naive algorithm in the case r = 24036583 (a Mersenne exponent). Using our program, we have found two new primitive trinomials of degree 24036583 over GF(2) (the previous record degree was 6972593).

A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm, with Pierrick Gaudry and Alexander Kruppa, Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC 2007), Waterloo, Ontario, Canada, pages 167-174, editor C.W.Brown, 2007.

Schönhage-Strassen's algorithm is one of the best known algorithms for multiplying large integers. Implementing it efficiently is of utmost importance, since many other algorithms rely on it as a subroutine. We present here an improved implementation, based on the one distributed within the GMP library. The following ideas and techniques were used or tried: faster arithmetic modulo 2ⁿ+1, improved cache locality, Mersenne transforms, Chinese Remainder Reconstruction, the sqrt(2) trick, Harley's and Granlund's tricks, improved tuning. We also discuss some ideas we plan to try in the future.
Note: this paper was motivated by Allan Steel, and the corresponding code is available from http://www.loria.fr/~zimmerma/software/.
Andrew Sutherland used our FFT code to set a new record for elliptic curve point counting.
Note added on April 29, 2011: Tsz-Wo Sze multiplied integers of 2⁴⁰ bits in about 2000 seconds using a cluster of 1350 cores : preprint.

Time- and Space-Efficient Evaluation of Some Hypergeometric Constants, with Howard Cheng, Guillaume Hanrot, Emmanuel Thomé and Eugene Zima, Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC 2007), Waterloo, Ontario, Canada, pages 85-91, editor C.W.Brown, 2007.

The currently best known algorithms for the numerical evaluation of hypergeometric constants such as zeta(3) to d decimal digits have time complexity O(M(d) log² d) and space complexity of O(d log d) or O(d). Following work from Cheng, Gergel, Kim and Zima, we present a new algorithm with the same asymptotic complexity, but more efficient in practice. Our implementation of this algorithm improves over existing programs for the computation of Pi, and we announce a new record of 2 billion digits for zeta(3).

Worst Cases of a Periodic Function for Large Arguments, with Guillaume Hanrot, Vincent Lefèvre and Damien Stehlé, Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH'18), pages 133-140, Montpellier, France, 2007. A preliminary version appeared as INRIA Research Report 6106, January 2007.

One considers the problem of finding hard to round cases of a periodic function for large floating-point inputs, more precisely when the function cannot be efficiently approximated by a polynomial. This is one of the last few issues that prevents from guaranteeing an efficient computation of correctly rounded transcendentals for the whole IEEE-754 double precision format. The first non-naive algorithm for that problem is presented, with an heuristic complexity of O(2^{0.676 p}) for a precision of p bits. The efficiency of the algorithm is shown on the largest IEEE-754 double precision binade for the sine function, and some corresponding bad cases are given. We can hope that all the worst cases of the trigonometric functions in their whole domain will be found within a few years, a task that was considered out of reach until now.

2006:

Asymptotically Fast Division for GMP, October 2005, revised August 2006, October 2006 and February 2015.

Until version 4.2.1, GNU MP (GMP for short) division has complexity O(M(n) log(n)), which is not asymptotically optimal. We propose here some division algorithms that achieve O(M(n)) with small constants, with corresponding GMP code.
Code is available too: invert.c computes an approximate inverse within 1 ulp in 3M(n) [revised in February and May 2015]. A patch for the 2-limb case was proposed by Marco Bodrato: invert.diff.

Errors Bounds on Complex Floating-Point Multiplication, with Richard Brent and Colin Percival, Mathematics of Computation volume 76 (2007), pages 1469-1481. Some technical details are given in INRIA Research Report 6068, December 2006. [DOI]

Given floating-point arithmetic with t-digit base-β significands in which all arithmetic operations are performed as if calculated to infinite precision and rounded to a nearest representable value, we prove that the product of complex values z₀ and z₁ can be computed with maximum absolute error |z₀| |z₁| (1/2) β^1-t sqrt(5). In particular, this provides relative error bounds of 2^-24 sqrt(5) and 2^-53 sqrt(5) for IEEE 754 single and double precision arithmetic respectively, provided that overflow, underflow, and denormals do not occur. We also provide the numerical worst cases for IEEE 754 single and double precision arithmetic.

20 years of ECM (© Springer-Verlag), with Bruce Dodson, Proceedings of ANTS VII, July 2006. A preliminary version appeared as INRIA Research Report 5834, February 2006.

The Elliptic Curve Method for integer factorization (ECM) was invented by H. W. Lenstra, Jr., in 1985 [Lenstra87]. In the past 20 years, many improvements of ECM were proposed on the mathematical, algorithmic, and implementation sides. This paper summarizes the current state-of-the-art, as implemented in the GMP-ECM software.
Erratum: on page 541 we write ``Computer experiments indicate that these curves have, on average, 3.49 powers of 2 and 0.78 powers of 3, while Suyama's family has 3.46 powers of 2 and 1.45 powers of 3''. As noticed by Romain Cosset, those experiments done by the first author are wrong, since he ran 1000 random curves with the same prime input p=10¹⁰+19, and the results differ according on the congruence of p mod powers of 2 and 3. With 10000 random curves on the 10000 primes just above 10²⁰, we get an average of 2^3.34*3^1.68 = 63.9 for Suyama's family, and 2^3.36*3^0.67 = 21.5 for curves of the form (16d + 18) y² = x³ + (4d + 2)x² + x.
Erratum: on page 536, section 3.5, one should read j = pi/d2 mod d1 instead of j = -pi/d2 mod d1 (reported by Alberto Zanoni, 3 Oct 2023).

2005:

MPFR: A Multiple-Precision Binary Floating-Point Library With Correct Rounding, with Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, INRIA Research Report RR-5753, November 2005. A revised version appeared in ACM TOMS (Transactions on Mathematical Software), volume 33, number 2, article 13, 2007.

This paper presents a multiple-precision binary floating-point library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend ideas from the IEEE-754 standard to arbitrary precision, by providing correct rounding and exceptions. We demonstrate how these strong semantics are achieved --- with no significant slowdown with respect to other tools --- and discuss a few applications where such a library can be useful.

Techniques algorithmiques et méthodes de programmation (in french), 11 pages, July 2005, appeared in Encyclopédie de l'informatique et des systèmes d'information, pages 929-935, Vuibert, 2006.

5,341,321, June 2005.

This short note shows the nasty effects of patents for the development of free software, even for patents that were not written with software applications in mind.

The Elliptic Curve Method, November 2002, revised April 2003 and September 2010, appeared in the Encyclopedia of Cryptography and Security, Springer, 2005 (old link).

Describes in two pages the history of ECM, how it works at high level, improvements to the method, and some applications.

MPFR : vers un calcul flottant correct ? (in french), Interstices, 2005.

Obtenir un seul résultat pour un calcul donné : à première vue, cela semble une évidence ; c'est en fait un vaste sujet de recherche auquel les chercheurs apportent petit à petit leurs contributions. Une nouvelle étape est franchie aujourd'hui grâce à MPFR, une bibliothèque de calcul multi-précision sur les nombres flottants.

A primitive trinomial of degree 6972593, with Richard Brent and Samuli Larvala, Mathematics of Computation, volume 74, number 250, pages 1001-1002, 2005.

The only primitive trinomials of degree 6972593 over GF(2) are x^6972593 + x^3037958 + 1 and its reciprocal.

An elementary digital plane recognition algorithm, with Yan Gerard and Isabelle Debled-Rennesson, appeared in Discrete Applied Mathematics, volume 151, issue 1-3, pages 169-183, 2005.

A naive digital plane is a subset of points (x,y,z) in Z³ verifying h ≤ ax+by+cz < h+max{ | a|,|b|, |c| } where (a,b,c,h) in Z⁴. Given a finite unstructured subset of Z³, determine whether there exists a naive digital plane containing it is called digital plane recognition. This question is rather classical in the field of digital geometry (also called discrete geometry). We suggest in this paper a new algorithm to solve it. Its asymptotic complexity is bounded by O(n⁷) but its behavior seems to be linear in practice. It uses an original strategy of optimization in a set of triangular facets (triangles). The code is short and elementary (less than 300 lines) and available on http://www.loria.fr/~debled/plane and here.

Searching Worst Cases of a One-Variable Function Using Lattice Reduction, with Damien Stehlé and Vincent Lefèvre, IEEE Transactions on Computers, volume 54, number 3, pages 340-346, 2005. A preliminary version appeared as INRIA Research Report 4586. Some results for the 2^x function double-extended precision are available here.

We propose a new algorithm to find worst cases for the correct rounding of a mathematical function of one variable. We first reduce this problem to the real small value problem--i.e., for polynomials with real coefficients. Then, we show that this second problem can be solved efficiently by extending Coppersmith's work on the integer small value problem--for polynomials with integer coefficients--using lattice reduction. For floating-point numbers with a mantissa less than N and a polynomial approximation of degree d, our algorithm finds all worst cases at distance less than N^{-d^2/(2d+1)} from a machine number in time O(N^{(d+1)/(2d+1)+epsilon}). For d=2, a detailed study improves on the O(N^2/3+epsilon) complexity from Lefèvre's algorithm to O(N^4/7+epsilon). For larger d, our algorithm can be used to check that there exist no worst cases at distance less than N^-k in time O(N^1/2+epsilon).
Note added on February 10, 2020: in the IEEE TC version, page 344, Section 5,1, line 6 of the matrix: in column 1, the coefficient should read Ca instead of C; the coefficient Cb should be in the second column, and the coefficient Cc should be in the third column.

2004:

Gal's Accurate Tables Method Revisited, with Damien Stehlé, INRIA Research Report RR-5359, October 2004. An improved version appeared in the Proceedings of Arith'17. Those ideas are demonstrated by an implementation of the exp2 function in double precision. Erratum in the final version of the paper: in Section 4, the simultaneous worst case for sin and cos is t0=1f09c0c6cde5e3 and not t0=31a93fddd45e3. See also my coauthor page.

Gal's accurate tables algorithm aims at providing an efficient implementation of elementary functions with correct rounding as often as possible. This method requires an expensive pre-computation of a table made of the values taken by the function or by several related functions at some distinguished points. Our improvements of Gal's method are two-fold: on the one hand we describe what is the arguably best set of distinguished values and how it improves the efficiency and correctness of the implementation of the function, and on the other hand we give an algorithm which drastically decreases the cost of the pre-computation. These improvements are related to the worst cases for the correct rounding of mathematical functions and to the algorithms for finding them. We show that the whole method can be turned into practice by giving complete tables for 2^x and sin(x) for x in [1/2,1[, in double precision.

Newton iteration revisited, with Guillaume Hanrot, March 2004.

On March 10, 2004, Dan Bernstein announced a revised draft of his paper Removing redundancy in high-precision Newton iteration, with algorithms that compute a reciprocal of order n over C[[x]] 1.5+o(1) times longer than a product; a quotient or logarithm 2.16666...+o(1) times longer; a square root 1.83333...+o(1) times longer; an exponential 2.83333...+o(1) times longer. We give better algorithms.
Note added on March 24, 2004: the 1.5+o(1) reciprocal algorithm was already published by Schönhage (Information Processing Letters 74, 2000, p. 41-46)
Note added on July 24, 2006: in a preprint Newton's method and FFT trading, Joris van der Hoeven announces better constants for the exponential (2.333...) and the quotient (1.666...). [October 27, 2009: those constants need yet to be confirmed.]
Note added on April 20, 2009: as noticed by David Harvey, in Section 3, in the Divide algorithm, Step 4 should read q <- q₀ + g₀ (h₁ - ε) xⁿ, where h = h₀ + xⁿ h₁. Indeed, after Step 3 we have q₀ f = h₀ + ε xⁿ + O(x²ⁿ), i.e., q₀ = h₀/f + ε/f xⁿ + O(x²ⁿ). Thus h/f = q₀ + (h₁ - ε)/f xⁿ + O(x²ⁿ) = q₀ + g₀ (h₁ - ε) xⁿ + O(x²ⁿ).
Note added on September 11, 2009: as noticed by David Harvey, the 1.91666...M(n) cost for the square root (Section 4) is incorrect; it should be 1.8333...M(n) instead. Indeed, Step 3 of Algorithm SquareRoot costs only M(n)/3 to compute f₀² mod xⁿ-1 instead of M(n)/2, since we only need two FFT transforms of size n. David Harvey has improved the reciprocal to 1.444...M(n) and the square root to 1.333...M(n) [preprint].

Arithmétique flottante, with Vincent Lefèvre, INRIA Research Report RR-5105, February 2004 (in french).

Ce document rassemble des notes d'un cours donné en 2003 dans la filière Algorithmique Numérique et Symbolique du DEA d'Informatique de l'Université Henri Poincaré Nancy 1. Ces notes sont basées en grande partie sur le livre Elementary Functions. Algorithms and Implementation de Jean-Michel Muller. Aussi disponible sur LibreCours.

A Formal Proof of Demmel and Hida's Accurate Summation Algorithm, with Laurent Fousse, January 2004.

A new proof of the ``accurate summation'' algorithm proposed by Demmel and Hida is presented. The main part of that proof has been written in the Coq language and verified by the Coq proof assistant.

The Middle Product Algorithm, I. Speeding up the division and square root of power series, with Guillaume Hanrot and Michel Quercia, AAECC, volume 14, number 6, pages 415-438, 2004. A preliminary version appeared as INRIA Resarch Report 3973.

We present new algorithms for the inverse, division, and square root of power series. The key trick is a new algorithm --- MiddleProduct or, for short, MP --- computing the n middle coefficients of a (2n-1) * n full product in the same number of multiplications as a full n * n product. This improves previous work of Brent, Mulders, Karp and Markstein, Burnikel and Ziegler. These results apply both to series and polynomials.
Note added June 10, 2009: Part II of this work was planned to deal with integer entries, but David Harvey was faster than us, see The Karatsuba middle product for integers.

Proposal for a Standardization of Mathematical Function Implementation in Floating-Point Arithmetic, with David Defour, Guillaume Hanrot, Vincent Lefèvre, Jean-Michel Muller and Nathalie Revol, January 2003, Numerical Algorithms, volume 37, number 1-4, pages 367-375, 2004. Extended version appeared as INRIA Research Report RR-5406.

Some aspects of what a standard for the implementation of the elementary functions could be are presented. Firstly, the need for such a standard is motivated. Then the proposed standard is given. The question of roundings constitutes an important part of this paper: three levels are proposed, ranging from a level relatively easy to attain (with fixed maximal relative error) up to the best quality one, with correct rounding on the whole range of every function. We do not claim that we always suggest the right choices, or that we have thought about all relevant issues. The mere goal of this paper is to raise questions and to launch the discussion towards a standard.

A long note on Mulders' short product, with Guillaume Hanrot, INRIA Research Report RR-4654, November 2002. A revised version appeared in the Journal of Symbolic Computation, volume 37, pages 391-401, 2004. A corrigendum appeared in 2014 (pdf).

The short product of two power series is the meaningful part of the product of these objects, i.e., sum(a[i] b[j] x^i+j, i+j < n). In [Mulders00], Mulders gives an algorithm to compute a short product faster than the full product in the case of Karatsuba's multiplication [KaOf62]. This algorithm works by selecting a cutoff point k and performing a full k x k product and two (n-k) x (n-k) short products recursively. Mulders also gives a heuristically optimal cutoff point beta n. In this paper, we determine the optimal cutoff point in Mulders' algorithm. We also give a slightly more general description of Mulders' method.
Note added November 24, 2011: Murat Cenk and Ferruh Ozbudak published a paper "Multiplication of polynomials modulo xⁿ" in Theoretical Computer Science, vol. 412, pages 3451-3462, 2011. The main result of this paper (Theorem 3.1) is a direct consequence of Mulders' short product, which is not cited in this paper. Blame to the authors, to the anonymous reviewers and to the editor in charge, Victor Y. Pan. Moreover the results in Table 1 of this paper are not optimal: M̂(14) <= M(10) + 2*M̂(4) <= 39+2*8 = 55 (instead of 56); M̂(16) <= M(12) + 2*M̂(4) <= 51+2*8 = 67 (instead of 70); M̂(17) <= M(12) + 2*M̂(5) <= 51+2*11 = 73 (instead of 76); M̂(18) <= M(12) + 2*M̂(6) <= 51+2*15 = 81 (instead of 85).
Note added February 10, 2014: Karim Belabas pointed out an error in Algorithm ShortProduct (and in Theorem 2). Indeed, the result is not necessarily reduced modulo xⁿ, since for example with f=2*x+1 and g=3*x+4 with n=2, we have l=1*4=4, h=2*3=6, m=(1+2)*(4+3)-4-6=11, thus the result is 4 + 11*x + 6*x². To fix this, it suffices to zero out the coefficient of xⁿ in the final result.

2003:

Random number generators with period divisible by a Mersenne prime, with Richard Brent, Proceedings of Computational Science and its Applications (ICCSA), LNCS 2667, pages 1-10, 2003. [HAL]

Pseudo-random numbers with long periods and good statistical properties are often required for applications in computational finance. We consider the requirements for good uniform random number generators, and describe a class of generators whose period is a Mersenne prime or a small multiple of a Mersenne prime. These generators are based on "almost primitive" trinomials, that is trinomials having a large primitive factor. They have very fast vector/parallel implementations and excellent statistical properties.

A Binary Recursive Gcd Algorithm, with Damien Stehlé, INRIA Research Report RR-5050, December 2003. A revised version (© Springer-Verlag) is published in the Proceedings of the Algorithmic Number Theory Symposium (ANTS VI). [Damien's page with erratum] [implementation in GMP]

The binary algorithm is a variant of the Euclidean algorithm that performs well in practice. We present a quasi-linear time recursive algorithm that computes the greatest common divisor of two integers by simulating a slightly modified version of the binary algorithm. The structure of the recursive algorithm is very close to the one of the well-known Knuth-Schönhage fast gcd algorithm, but the description and the proof of correctness are significantly simpler in our case. This leads to a simplification of the implementation and to better running times.
Note (added 14 Oct 2009): since that work, Niels Möller has designed a classical left-to-right/MSB fast gcd algorithm, cf his paper On Schönhage's algorithm and subquadratic integer gcd computation (Mathematics of Computation, volume 77, number 261, pages 589--607, 2008) and his slides.
Note (added 1st April 2010): Robert Harley had published in his ECDL code a similar extended binary gcd algorithm (however only in the quadratic case).
Note (added 26 February 2019): as reported by Bo-Yin Yang and Dan Bernstein, the sequence G_n = 0,1,-1,5,-9,29,-65,181,-441,1165,..., which is the worst-case for Theorem 2, is not necessarily the worst-case in practice (as claimed in Section 6.2). See their paper.

Algorithms for finding almost irreducible and almost primitive trinomials, with Richard Brent, April 2003, Proceedings of a Conference in Honour of Professor H. C. Williams, Banff, Canada (May 2003), The Fields Institute, Toronto. [HAL] [arXiv]

Consider polynomials over GF(2). We describe efficient algorithms for finding trinomials with large irreducible (and possibly primitive) factors, and give examples of trinomials having a primitive factor of degree r for all Mersenne exponents r = + 3 mod 8 in the range 5 < r < 2976221, although there is no irreducible trinomial of degree r. We also give trinomials with a primitive factor of degree r = 2^k for 3 <= k <= 12. These trinomials enable efficient representations of the finite field GF(2^r). We show how trinomials with large primitive factors can be used efficiently in applications where primitive trinomials would normally be used.
Note added April 22, 2009: this paper is mentioned in Divisibility of Trinomials by Irreducible Polynomials over F₂, by Ryul Kim and Wolfram Koepf, International Journal of Algebra, Vol. 3, 2009, no. 4, 189-197 (arxiv).

Accurate Summation: Towards a Simpler and Formal Proof, with Laurent Fousse, March 2003, in Proc. of RNC'5, pages 97-108.

This paper provides a simpler proof of the ``accurate summation'' algorithm proposed by Demmel and Hida in DeHi02. It also gives improved bounds in some cases, and examples showing that those new bounds are optimal. This simpler proof will be used to obtain a computer-generated proof of Demmel-Hida's algorithm, using a proof assistant like HOL, PVS or Coq.

10^2098959 [in french], décembre 2002, paru dans la Gazette du Cines, numéro 14, janvier 2003.

Cet article décrit les premiers résultats de la recherche (avec Richard Brent et Samuli Larvala) de trinômes primitifs de degré 6972593 sur GF(2), et indique quelques conséquences amusantes de la loi de Moore.

A Fast Algorithm for Testing Reducibility of Trinomials mod 2 and Some New Primitive Trinomials of Degree 3021377, with Richard Brent and Samuli Larvala, Mathematics of Computation, volume 72, number 243, pages 1443-1452, 2003. A preliminary version appeared as Report PRG TR-13-00, Oxford University Computing Laboratory, December 2000.

The standard algorithm for testing irreducibility of a trinomial of prime degree r over GF(2) requires 2r + O(1) bits of memory and of order r² bit-operations. We describe an algorithm which requires only 3r/2 + O(1) bits of memory and less bit-operations than the standard algorithm. Using the algorithm, we have found several new irreducible trinomials of high degree. If r is a Mersenne exponent (i.e. 2^r - 1 is a Mersenne prime), then an irreducible trinomial of degree r is necessarily primitive and can be used to give a pseudo-random number generator with period at least 2^r - 1. We give examples of primitive trinomials for r = 756839, 859433, and 3021377. The results for r = 859433 extend and correct some computations of Kumada et al [Mathematics of Computation 69 (2000), 811-814]. The two results for r = 3021377 are primitive trinomials of the highest known degree.

2002:

Ten Consecutive Primes In Arithmetic Progression, with Harvey Dubner, Tony Forbes, Nik Lygeros, Michel Mizony and Harry Nelson, Mathematics of Computation, volume 71, number 239, pages 1323-1328, 2002. [HAL]

In 1967 the first set of 6 consecutive primes in arithmetic progression was found. In 1995 the first set of 7 consecutive primes in arithmetic progression was found. Between November, 1997 and March, 1998, we succeeded in finding sets of 8, 9 and 10 consecutive primes in arithmetic progression. This was made possible because of the increase in computer capability and availability, and the ability to obtain computational help via the Internet. Although it is conjectured that there exist arbitrarily long sequences of consecutive primes in arithmetic progression, it is very likely that 10 primes will remain the record for a long time.

Worst Cases and Lattice Reduction, with Damien Stehlé and Vincent Lefèvre, INRIA Research Report RR-4586, October 2002. Appeared in the proceedings of the 16th IEEE Symposium on Computer Arithmetic (Arith'16), IEEE Computer Society, pages 142-147, 2003.

We propose a new algorithm to find worst cases for correct rounding of an analytic function. We first reduce this problem to the real small value problem --- i.e. for polynomials with real coefficients. Then we show that this second problem can be solved efficiently, by extending Coppersmith's work on the integer small value problem --- for polynomials with integer coefficients --- using lattice reduction [Coppersmith96a,Coppersmith96b,Coppersmith01]. For floating-point numbers with a mantissa less than $N$, and a polynomial approximation of degree $d$, our algorithm finds all worst cases at distance $< N^{\frac{-d^2}{2d+1}}$ from a machine number in time $O(N^{\frac{d+1}{2d+1}+\varepsilon})$. For $d=2$, this improves on the $O(N^{2/3+\varepsilon})$ complexity from Lef\`evre's algorithm [Lefevre00,LeMu01] to $O(N^{3/5+\varepsilon})$. We exhibit some new worst cases found using our algorithm, for double-extended and quadruple precision. For larger $d$, our algorithm can be used to check that there exist no worst cases at distance $< N^{-k}$ in time $O(N^{\frac{1}{2}+O(\frac{1}{k})})$.

A Proof of GMP Square Root, with Yves Bertot and Nicolas Magaud, Journal of Automated Reasoning, volume 29, 2002, pages 225--252, Special Issue on Automating and Mechanising Mathematics: In honour of N.G. de Bruijn. A preliminary version appeared as INRIA Research Report 4475.

We present a formal proof (at the implementation level) of an efficient algorithm proposed by Paul Zimmermann [Zimmermann00] to compute square roots of arbitrary large integers. This program, which is part of the GNU Multiple Precision Arithmetic Library (GMP) is completely proven within the Coq system. Proofs are developed using the Correctness tool to deal with imperative features of the program. The formalization is rather large (more than 13000 lines) and requires some advanced techniques for proof management and reuse.
Note: Vincent Lefèvre found a potential problem in the GMP implementation, which is fixed by the following patch. This does not contradicts our proof: the problem is due to the different C data types (signed or not, different width), whereas our proof assumed a unique type.

Aliquot Sequence 3630 Ends After Reaching 100 Digits, with M. Benito, W. Creyaufmüller and J. L. Varona, Experimental Mathematics, volume 11, number 2, pages 201-206.

In this paper we present a new computational record: the aliquot sequence starting at 3630 converges to 1 after reaching a hundred decimal digits. Also, we show the current status of all the aliquot sequences starting with a number smaller than 10,000; we have reached at leat 95 digits for all of them. In particular, we have reached at least 112 digits for the so-called "Lehmer five sequences," and 101 digits for the "Godwin twelve sequences." Finally, we give a summary showing the number of aliquot sequences of unknown end starting with a number less than or equal 10⁶.

Note added 29 July 2008: in this paper, we say (page 203, middle of right column) "It is curious to note that the driver 2⁹ * 3 * 11 * 31 has appeared in no place in any of the sequences given in Table 1"; Clifford Stern notes that this driver appears at index 215 of sequence 165744, which gives 2⁹ * 3 * 7 * 11 * 31² * 37 * 10594304241173.

2001:

De l'algorithmique à l'arithmétique via le calcul formel, Habilitation à diriger des recherches, novembre 2001. (Transparents de la soutenance.)

This document presents my research contributions from 1988 to 2001, performed first at INRIA Rocquencourt within the Algo project (1988 to 1992), then at INRIA Lorraine and LORIA within the projects Euréca (1993-1997), PolKA (1998-2000), and Spaces (2001). Three main periods can be roughly distinguished: from 1988 to 1992 where my research focused on analysis of algorithms and random generation, from 1993 to 1997 where I worked on computer algebra and related algorithms, finally from 1998 to 2001 where I was interested in arbitrary precision floating-point arithmetic with well-defined semantics.

Arithmétique en précision arbitraire, rapport de recherche INRIA 4272, septembre 2001, paru dans la revue "Réseaux et Systèmes Répartis, Calculateurs parallèles", volume 13, numéro 4-5, pages 357-386, 2001 [in french].

This paper surveys the available algorithms for integer or floating-point arbitrary precision calculations. After a brief discussion about possible memory representations, known algorithms for multiplication, division, square root, greatest common divisor, input and output, are presented, together with their complexity and usage. For each operation, we present the naïve algorithm, the asymptotically optimal one, and also intermediate "divide and conquer" algorithms, which often are very useful. For floating-points computations, some general-purpose methods are presented for algebraic, elementary, hypergeometric and special functions.

Tuning and Generalizing Van Hoeij's Algorithm, with Karim Belabas and Guillaume Hanrot, INRIA Research report 4124, February 2001.

Recently, van Hoeij's published a new algorithm for factoring polynomials over the rational integers. This algorithms rests on the same principle as Berlekamp-Zassenhaus, but uses lattice basis reduction to improve drastically on the recombination phase. The efficiency of the LLL algorithm is very dependent on fine tuning; in this paper, we present such tuning to achieve better performance. Simultaneously, we describe a generalization of van Hoeij's algorithm to factor polynomials over number fields.

Efficient isolation of a polynomial real roots, with Fabrice Rouillier, INRIA Research report 4113, February 2001. Appeared in Journal of Computational and Applied Mathematics, volume 162, number 1, pages 33-50, 2004.

This paper gives new results for the isolation of real roots of a univariate polynomial using Descartes' rule of signs, following work of Vincent, Uspensky, Collins and Akritas, Johnson, Krandick. The first contribution is a generic algorithm which enables one to describe all the existing strategies in a unified framework. Using that framework, a new algorithm is presented, which is optimal in terms of memory usage, while doing no more computations than other algorithms based on Descartes' rule of signs. We show that these critical optimizations have important consequences by proposing a full efficient solution for isolating the real roots of zero-dimensional polynomial systems.

Density results on floating-point invertible numbers, with Guillaume Hanrot, Joël Rivat and Gérald Tenenbaum, Theoretical Computer Science, volume 291, number 2, 2003, pages 135-141. (The slides of a related talk I gave in January 2002 at the workshop "Number Theory and Applications" in Luminy are here.)

Let F_k denote the k-bit mantissa floating-point (FP) numbers. We prove a conjecture of J.-M. Muller according to which the proportion of numbers in F_k with no FP-reciprocal (for rounding to the nearest element) approaches 1/2-3/2 log(4/3) i.e. about 0.06847689 as k goes to infinity. We investigate a similar question for the inverse square root.

2000:

Factorization of a 512-bit RSA Modulus, with Stefania Cavallar, Bruce Dodson, Arjen K. Lenstra, Walter Lioen, Peter L. Montgomery, Brian Murphy, Herman te Riele, Karen Aardal, Jeff Gilchrist, Gérard Guillerm, Paul Leyland, Joël Marchand, François Morain, Alec Muffett, Chris Putnam, Craig Putnam, Proceedings of Eurocrypt'2000, LNCS 1807, pages 1-18, 2000.

On August 22, 1999, we completed the factorization of the 512-bit 155-digit number RSA-155 with the help of the Number Field Sieve factoring method (NFS). This is a new record for factoring general numbers. Moreover, 512-bit RSA keys are frequently used for the protection of electronic commerce -- at least outside the USA -- so this factorization represents a breakthrough in research on RSA-based systems. The previous record, factoring the 140-digit number RSA-140, was established on February 2, 1999, also with the help of NFS, by a subset of the team which factored RSA-155. The amount of computing time spent on RSA-155 was about 8400 MIPS years, roughly four times that needed for RSA-140; this is about half of what could be expected from a straightforward extrapolation of the computing time spent on factoring RSA-140 and about a quarter of what would be expected from a straightforward extrapolation from the computing time spent on RSA-130. The speed-up is due to a new polynomial selection method for NFS of Murphy and Montgomery which was applied for the first time to RSA-140 and now, with improvements, to RSA-155.

A proof of GMP fast division and square root implementations, September 2000.

This short note gives a detailed correctness proof of fast (i.e. subquadratic) versions of the GNU MP mpn_bz_divrem_n and mpn_sqrtrem functions, together with complete GMP code. The mpn_bz_divrem_n function divides (with remainder) a number of 2n limbs by a divisor of n limbs in 2K(n), where K(n) is the time spent in a (n times n) multiplication, using the Moenck-Borodin-Jebelean-Burnikel-Ziegler algorithm. The mpn_sqrtrem computes the square root and the remainder of a number of 2n limbs (square root and remainder have about n limbs each) in time 3K(n)/2; it uses Karatsuba Square Root.

Speeding up the Division and Square Root of Power Series, with Guillaume Hanrot and Michel Quercia, INRIA Research Report 3973, July 2000.

We present new algorithms for the inverse, quotient, or square root of power series. The key trick is a new algorithm --- RecursiveMiddleProduct or RMP --- computing the n middle coefficients of a (2n * n) product in essentially the same number of operations --- K(n) --- than a full (n * n) product with Karatsuba's method. This improves previous work of Mulders, Karp and Markstein, Burnikel and Ziegler. These results apply both to series, polynomials, and multiple precision floating-point numbers.
A Maple implementation is available here, together with slides of a talk given at ENS Paris in January 2004.

Factorization in Z[x]: the searching phase, with John Abbott and Victor Shoup, April 2000, Proceedings of ISSAC'2000 [HAL entry].

In this paper we describe ideas used to accelerate the Searching Phase of the Berlekamp-Zassenhaus algorithm, the algorithm most widely used for computing factorizations in Z[x]. Our ideas do not alter the theoretical worst-case complexity, but they do have a significant effect in practice: especially in those cases where the cost of the Searching Phase completely dominates the rest of the algorithm. A complete implementation of the ideas in this paper is publicly available in the library NTL. We give timings of this implementation on some difficult factorization problems.

1999:

Karatsuba Square Root, INRIA Research Report 3905, November 1999.

We exhibit an algorithm to compute the square-root with remainder of a n-word number in (3/2) K(n) word operations, where K(n) is the number of words operations to multiply two n-word numbers using Karatsuba's algorithm. If the remainder is not needed, the cost can be reduced to K(n) on average. This algorithm can be used for floating-point or polynomial computations too; although not optimal asymptotically, its simplicity gives a wide range of use, from about 50 to 1,000,000 digits, as shown by computer experiments.

On Sums of Seven Cubes, with Francois Bertault and Olivier Ramaré, Mathematics of Computation, volume 68, number 227, pages 1303-1310, 1999.

We show that every integer between 1290741 and 3.375 * 10¹² is a sum of 5 nonnegative cubes, from which we deduce that every integer which is a cubic residue modulo 9 and an invertible cubic residue modulo 37 is a sum of 7 nonnegative cubes.

Uniform Random Generation of Decomposable Structures Using Floating-Point Arithmetic with Alain Denise, Theoretical Computer Science, volume 218, number 2, 219--232, 1999. A preliminary version appeared as INRIA Research Report 3242, September 1997.

The recursive method formalized by Nijenhuis and Wilf [NiWi78] and systematized by Flajolet, Van Cutsem and Zimmermann [FlZiVa94], is extended here to floating-point arithmetic. The resulting ADZ method enables one to generate decomposable data structures --- both labelled or unlabelled --- uniformly at random, in expected O(n^1+ε) time and space, after a preprocessing phase of O(n^2+ε) time, which reduces to O(n^1+ε) for context-free grammars.

1998:

Estimations asymptotiques du nombre de chemins Nord-Est de pente fixée et de largeur bornée, avec Isabelle Dutour et Laurent Habsieger, INRIA Research Report RR-3585, décembre 1998 [in french].

We study here a quantity related to the number of walks with North and East steps staying under the line of slope d starting from the origin. We give an asymptotic analysis of this quantity with respect to both the width n and the slope d, answering to a question asked by Bernard Mourrain.

1997:

Calcul formel : ce qu'il y a dans la boîte, journées X-UPS, octobre 1997.

Cinq algorithmes de calcul symbolique, INRIA Technical Report RT-0206, notes de cours d'un module de spécialisation du DEA d'informatique de l'Université Henri Poincaré Nancy 1, 1997 [in french].

These are lecture notes of a course entitled "Some computer algebra algorithms" given by the author at the University of Nancy 1 in 1997. Five fundamental algorithms used by computer algebra systems are briefly described: Gosper's algorithm for computing indefinite sums, Zeilberger's algorithm for definite sums, Berlekamp's algorithm for factoring polynomials over finite fields, Zassenhaus' algorithm for factoring polynomials with integer coefficients, and Lenstra's integer factorization algorithm using elliptic curves. All these algorithms were implemented --- or improved --- by the author in the computer algebra system MuPAD.

Progress Report on Parallelism in MuPAD, with Christian Heckler and Torsten Metzner, Inria Research Report 3154, April 1997.

MuPAD is a general purpose computer algebra system with two programming concepts for parallel processing: micro-parallelism for shared-memory machines and macro-parallelism for distributed architectures. This article describes language instructions for both concepts, the current state of implementation, together with some examples.

1996:

Polynomial Factorization Challenges, with L. Bernardin and M. Monagan, poster presented at the International Symposium on Symbolic and Algebraic Computation (ISSAC), July 1996, 4 pages.

Joachim von zur Gathen has proposed a challenge for factoring univariate polynomials over finite fields to evaluate the practicability of current factorization algorithms ("A Factorization Challenge", SIGSAM Bulletin 26(2):22-24, 1992). More recently, Victor Shoup has proposed an alternate family of polynomials with a similar goal in mind.
Our effort is to take these challenges on using the general purpose computer algebra systems Maple and MuPAD.
The result of our work are the factorizations of the von zur Gathen polynomials f_n and of the Shoup polynomials F_n for n from 1 to 500. We also present the factorization of the degree 1000 von zur Gathen polynomial f₁₀₀₀.

1994:

GFUN: a Maple package for the manipulation of generating and holonomic functions in one variable, with Bruno Salvy, ACM Transactions on Mathematical Software, volume 20, number 2, 1994. A preliminary version appeared as INRIA Technical Report 143, October 1992.

We describe the GFUN package which contains functions for manipulating sequences, linear recurrences or differential equations and generating functions of various types. This document is intended both as an elementary introduction to the subject and as a reference manual for the package.

Random walks, heat equation and distributed algorithms, with Guy Louchard, René Schott and Michael Tolley, Journal of Computational and Applied Mathematics, volume 53, pages 243-274, 1994.

New results are obtained concerning the analysis of the storage allocation algorithm which permits one to maintain two stacks inside a shared (continuous) memory area of fixed size m and of the banker's algorithm (a deadlock avoidance policy). The formulation of these problems is in terms of random walks inside polygonal domains in a two-dimensional lattice space with several reflecting barriers and one absorbing barrier. For the two-stacks problem, the return time to the origin, the time to absorption, the last leaving time from the origin and the number of returns to the origin before absorption are investigated. For the banker's algorithm, the trend-free absorbed random walk is analyses with numerical methods.
We finally analyse the average excursion along one axis for the classical random walk: an analytic method enables us to deduce asymptotic results for this average excursion.

The n-Queens Problem, with Igor Rivin and Ilan Vardi, American Mathematical Monthly, volume 101, number 7, pages 629-639.

We give several lower and upper bounds for the number Q(n) of ways to put n queens on an nxn chessboard, and the number T(n) to put n queens on a toroidal chessboard (i.e., with n upper diagonals instead of 2n-1). We also conjecture that (log T(n))/(n log n) and (log Q(n))/(n log n) tend to two positive constants.
Note added 11 September 2012: in Remark 1 page 636, the primes p = 2q+1 are called safe primes (and the primes q are called Sophie-Germain primes). Moreover Warren Smith pointed the paper "The n-queens problem" by Bruen and Dixon (Discrete Mathematics, 1975) which gives a construction yielding (if I am correct) at least 8 * binomial(floor((p-3)/8),2) * p different toroidal solutions for any prime p >= 13. Also on page 635 we say "Remark. Corollary 1 gives the first example of a set of n's for which Q(n) grows faster than a polynomial in n". This is wrong, since T. Kløve gives in "The modular n-queens problem" (Discrete Mathematics, 1977) a construction yielding for n=p*p at least n*(p-3)*p^p-1 (thanks again to Warren Smith for pointing us this).

1993:

A Calculus of Random Generation, with Philippe Flajolet and Bernard Van Cutsem, Proceedings of European Symposium on Algorithms (ESA'93), LNCS 726, pages 169-180, 1993.

A systematic approach to the random generation of labelled combinatorial objects is presented. It applies to structures that are decomposable, i.e., formally specifiable by grammars involving union, product, set, sequence, and cycle constructions. A general strategy is developed for solving the random generation problem with two closely related types of methods: for structures of size n, the boustrophedonic algorithms exhibit a worst-case behaviour of the form O(n log n); the sequential algorithms have worst case O(n²), while offering good potential for optimizations in the average case. (Both methods appeal to precomputed numerical tables of linear size).
A companion calculus permits to systematically compute the average case cost of the sequential generation algorithm associated to a given specification. Using optimizations dictated by the cost calculus, several random generation algorithms are developed, based on the sequential principle; most of them have expected complexity 1/2 n log n, thus being only slightly superlinear. The approach is exemplified by the random generation of a number of classical combinatorial structures including Cayley trees, hierarchies, the cycle decomposition of permutations, binary trees, functional graphs, surjections, and set partitions.

Epelle : un logiciel de détection de fautes d'orthographe, INRIA Research Report 2030, September 1993.

This report describes the algorithm used by the epelle program, together with its implementation in the C language. This program is able to check about 30.000 words every second on modern computers, without any error, contrary to the Unix spell program which makes use of hashing methods and could thus accept wrong words. The main principle of epelle is to use digital trees (also called dictionary trees), which in addition reduces the space needed to store the list of words (by a factor of about 5 for the french dictionary). Creating a new digital tree for the franch language (about 240.000 words) takes only a dozen of seconds. The same program is directly usable for other languages and more generally for any list of alphanumeric keys.

1991:

Automatic Average-case Analysis of Algorithms, with Ph. Flajolet and B. Salvy, Theoretical Computer Science, volume 79, number 1, pages 37-109, 1991.

Many probabilistic properties of elementary discrete combinatorial structures of interest for the average-case analysis of algorithms prove to be decidable. This paper presents a general framework in which such decision procedures can be developed. It is based on a combination of generating function techniques for counting, and complex analysis techniques for asymptotic estimations.
We expose here the theory of exact analysis in terms of generating functions for four different domains: the iterative/recursive and unlabelled/labelled data type domains. We then present some major components of the associated asymptotic theory and exhibit a class of naturally arising functions that can be automatically analyzed.
A fair fragment of this theory is also incorporated into a system called Lambda-Upsilon-Omega. In this way, using computer algebra, one can produce automatically non-trivial average-case analyses of algorithms operating over a variety of decomposable combinatorial structures.
At a fundamental level, this paper is part of a global attempt at understanding why so many elementary combinatorial problems tend to have elementary asymptotic solutions. In several cases, it proves possible to relate entire classes of elementary combinatorial problems whose structure is well defined with classes of elementary special functions and classes of asymptotic forms relative to counting, probabilities, or average-case complexity.

Séries génératrices et analyse automatique d'algorithmes, PhD thesis (in french), École Polytechnique, Palaiseau, 1991 [in french]. Also available in postscript.

This thesis studies systematic methods to determine automatically the average-case cost of an algorithm. Those methods apply generally to descent schemes in decomposable data structures, which enables one to model a large class of problems. More precisely, this thesis focuses on the first stage of the analysis of an algorithm, namely the algebraic analysis, which translates the program into mathematical objects, whereas the second stage extracts from those mathematical objects the desired information about the average-case cost. We define a language to describe decomposable data-structures and descent schemes on them. When one uses generating functions as mathematical objects (counting generating functions for data-structures, and cost generating functions for programs), we show that the algorithms described in this language directly translate into systems of equations for the corresponding generating functions, moreover using simple rules. From those equations, we can then determine in polynomial time the exact average-case cost for a given size of the input data-structures. We can also use those equations to compute using asymptotic analysis the average-case cost when the input data size tends to infinity, since we know that the asymptotic average cost is directly related to the behaviour of those generating functions around their singularities. Therefore, we show that to a given class of algorithms corresponds a well-defined class of generating functions, and in turn a given class of formulae for the asymptotic average cost. Those algebraic analysis rules were included in a software tool for the average-case analysis of algorithms, called Lambda-Upsilon-Omega, which proved useful for experiments and research.