Skip to content


Python isalpha is buggy

This code

#!/usr/bin/env python
# -*- coding: utf-8 -*-
ml_string=u"സന്തോഷ്  हिन्दी"
for ch in ml_string:
    if(ch.isalpha()):
        print ch



gives this output

സ
ന
ത
ഷ
ह
न
द

And fails for all mathra signs of Indian languages. This is a known bug in glibc.
Does anybody know whether python internally use glibc functions for this basic string operations or use separate character database llke QT does?

Posted in Bugs.

Tagged with .


3 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. unmadindu says

    The Python source code seems to suggest so

    From stringobject.c,


    .......
    /* Shortcut for single character strings */
    if (PyString_GET_SIZE(self) == 1 &&
    isalpha(*p))
    return PyBool_FromLong(1);
    .......

    • admin says

      Re: The Python source code seems to suggest so

      Thanks Sayamindu,
      So when glibc patches gets into distros , let us hope that these problems will disappear.
      But QT problem remains

  2. Anonymous says

    looks like a feature

    that seems to be a feature. its perfectly removing the matras.



Some HTML is OK

or, reply to this post via trackback.

Powered by WP Hashcash