""" Implementation of the inverse of the logarithm of the CDF of the standard normal distribution. Copyright: Albert Steppi Distributed under the same license as SciPy Implementation Overview The inverse of the CDF of the standard normal distribution is available in scipy through the Cephes Math Library where it is called ndtri. We call our implementation of the inverse of the log CDF ndtri_exp. For -2 <= y <= log(1 - exp(-2)), ndtri_exp is computed as ndtri(exp(y)). For 0 < p < exp(-2), the cephes implementation of ndtri uses an approximation for ndtri(p) which is a function of z = sqrt(-2.0 * log(p)). Letting y = log(p), for y < -2, ndtri_exp uses this approximation in log(p) directly. This allows the implementation to achieve high precision for very small y, whereas ndtri(exp(y)) evaluates to infinity. This is because exp(y) underflows for y < ~ -745.1. When p > 1 - exp(-2), the Cephes implementation of ndtri uses the symmetry of the normal distribution and calculates ndtri(p) as -ndtri(1 - p) allowing for the use of the same approximation. When y > log(1 - exp(-2)) this implementation calculates ndtri_exp as -ndtri(-exp1m(y)). Accuracy Cephes provides the following relative error estimates for ndtri Relative error: arithmetic domain # trials peak rms IEEE 0.125, 1 20000 7.2e-16 1.3e-16 IEEE 3e-308, 0.135 50000 4.6e-16 9.8e-17 When y < -2, ndtri_exp must have relative error at least as small as the Cephes implementation of ndtri for p < exp(-2). It relies on the same approximation but does not have to lose precision by passing from p to log(p) before applying the approximation. Relative error of ndtri for values of the argument p near 1 can be much higher than claimed by the above chart. For p near 1, symmetry is exploited to replace the calculation of ndtri(p) with -ndtri(1 - p). The inverse of the normal CDF increases so rapidly near the endpoints of [0, 1] that the loss of precision incurred by the subtraction 1 - p due to limitations in binary approximation can make a significant difference in the results. Using version 9.3.0 targeting x86_64-linux-gnu we've observed the following Estimated Relative Error ndtri(1e-8) = -5.612001244174789 '' -ndtri(1 - 1e-8) = -5.612001243305505 1.55e-10 ndtri(1e-16) = -8.222082216130435 '' -ndtri(1 - 1e-16) = -8.209536151601387 0.0015 If expm1 is correctly rounded for y in [log(1 - exp(-2), 0), then ndtri_exp(y) should have the same relative error as ndtri(p) for p > 1 - exp(-2). As seen above, this error may be higher than desired. IEEE-754 provides no guarantee on the accuracy of expm1 however, therefore accuracy of ndtri_exp in this range is platform dependent. The case -2 <= y <= log(1 - exp(-2)) ~ -0.1454 corresponds to ~ 0.135 <= p <= ~ 0.865 The derivative of ndtri is sqrt(2 * pi) * exp(ndtri(x)**2 / 2). It is ~4.597 at x ~ 0.135, decreases monotonically to sqrt(2 * pi) ~ 2.507 at x = 0 and increases monotonically again to ~4.597 at x ~ 0.865. It can be checked that all higher derivatives follow a similar pattern. Their absolute value takes on a maximum (for this interval) at x ~ 0.135, decrease to a minimum at x = 0 and increases to the same maximum at x ~ 0.865. Derivatives of all orders are positive at x=log(1 - exp(-2)). Thus the worst possible loss of precision of ndtri(exp(x)) in the interval [0, log(1 - exp(-2))] due to error in calculating exp(x) must occur near x=log(1 - exp(-2)). By symmetry, the worst possible loss of precision in [-2, log(1 - exp(-2)] must occur near the endpoints. We may observe empirically that error at the endpoints due to exp is not substantial. Assuming that exp is accurate within +-ULP (unit of least precision), we observed a value of at most ~6.0474e-16 for abs(ndtri(x + epsilon) - ndtri(x)) if x is near exp(-2) or 1 - exp(-2) and epsilon is equal to the unit of least precision of x. (IEEE-754 provides no guarantee on the accuracy of exp, but for most compilers on most architectures an assumption of +-ULP should be reasonable.) The error here is on the order of the error in the Cephes implementation of ndtri itself, leading to an error profile that is still favorable. """ import cython from libc.float cimport DBL_MAX from libc.math cimport exp, expm1, log, log1p, sqrt, M_SQRT2 cdef extern from "cephes/polevl.h": double polevl(double x, const double coef[], int N) nogil double p1evl(double x, const double coef[], int N) nogil cdef extern from "numpy/npy_math.h": double NPY_INFINITY from ._cephes cimport ndtri @cython.cdivision(True) cdef inline double _ndtri_exp_small_y(double y) nogil: """Return inverse of log CDF of normal distribution for very small y For p sufficiently small, the inverse of the CDF of the normal distribution can be approximated to high precision as a rational function in sqrt(-2.0 * log(p)). """ cdef: double *P1 = [ 4.05544892305962419923, 3.15251094599893866154e1, 5.71628192246421288162e1, 4.40805073893200834700e1, 1.46849561928858024014e1, 2.18663306850790267539, -1.40256079171354495875e-1, -3.50424626827848203418e-2, -8.57456785154685413611e-4 ] double *Q1 = [ 1.57799883256466749731e1, 4.53907635128879210584e1, 4.13172038254672030440e1, 1.50425385692907503408e1, 2.50464946208309415979, -1.42182922854787788574e-1, -3.80806407691578277194e-2, -9.33259480895457427372e-4 ] double *P2 = [ 3.23774891776946035970, 6.91522889068984211695, 3.93881025292474443415, 1.33303460815807542389, 2.01485389549179081538e-1, 1.23716634817820021358e-2, 3.01581553508235416007e-4, 2.65806974686737550832e-6, 6.23974539184983293730e-9 ] double *Q2 = [ 6.02427039364742014255, 3.67983563856160859403, 1.37702099489081330271, 2.16236993594496635890e-1, 1.34204006088543189037e-2, 3.28014464682127739104e-4, 2.89247864745380683936e-6, 6.79019408009981274425e-9 ] double x, x0, x1, z # sqrt(-2 * y) is faster and has more precision but overflows when # y < -DBL_MAX * 0.5 if y >= -DBL_MAX * 0.5: x = sqrt(-2 * y) else: x = M_SQRT2 * sqrt(-y) x0 = x - log(x) / x z = 1 / x if x < 8.0: x1 = z * polevl(z, P1, 8) / p1evl(z, Q1, 8) else: x1 = z * polevl(z, P2, 8) / p1evl(z, Q2, 8) return x1 - x0 cdef inline double ndtri_exp(double y) nogil: """Return inverse of logarithm of Normal CDF evaluated at y.""" if y < -DBL_MAX: return -NPY_INFINITY elif y < - 2.0: return _ndtri_exp_small_y(y) elif y > -0.14541345786885906: # log1p(-exp(-2)) return -ndtri(-expm1(y)) else: return ndtri(exp(y))