Malayalam Wikipedia releases selected articles on CD

As part of Malayalam Wikipedia Meetup 2010 , today  Malayalam wikipedia releases 500 selected articles on a CD ROM. This is the first time in India, a Wikipedia on local language releasing its articles for offline usage. I handled the technology part  of the project.

The idea was to get the selected articles in static form to the CD. But this is not easy as we imagine. It is not like saving each  page from browser to the local machine. Following were the challenges:

  • Automate the process of getting the page and the images in it. Wikipedia articles changes frequently. So we need the program to fetch the latest article from wiki whenever it is executed.
  • Fix all the links, css, javascript, image references so that all resolves within CD itself
  • Provide an categorized index of the articles for easily locating the article.
  • Provide a search in the article titles.
  • ISO 9660 filesystem of CD/DVD has lots of limitations. There are restrictions on unicode names of the files, length of the file names, directory depth, special characters in filenames etc. Wikipedia has its article and image names with unicode, special characters and most of the time they exceeds the filename length. To avoid all these, we should rename most of the files and then fix the cross references in all files.
  • It should work on all Operating systems. All the content should be presented with HTML, Javascript and CSS. Being the content in Malayalam, even if the user does not have required fonts in her/his machine, there should not be any problem for reading the content(font embedding required).

Manually solving all these challenges is not the way to go. So I wrote a program, which just takes the article titles and does all the above tasks and finally creates a repository ready for burning to CD ROM.

Wget disappointed me in fetching the content from wiki. There is an open bug in wget which make the download of non-latin URLs impossible.

Have a look at the CD content we created : Malayalam Wikipedia Selected 500 Articles . Hiran helped me with the artworks.

The CD cover image designed by Hiran

Since entire process is automated, the program can be used for any other language.  I am releasing the program for the benefit of everybody. You can get the program from here. It is written on Python. Jquery was used for the UI.  For details on the usage, customization etc read the wiki page of the project.

For those who can’t read Malayalam, here is a sample wiki created  by the wiki2cd program from English wikipedia by selecting 10 articles.

Malayalam Wikipedia Community  hope that this is a big step to reach the majority of the people who does not have internet access. If printed, this 500 articles will be at least 5000 pages. CDROM also includes information about commonly used free software based tools for Malayalam computing. Some writing tools and fonts are distributed in the same CD ROM.

Thanks to Malayalam Wikipedia for giving this great opportunity to wok on this project.

The ISO image of the CD is available here for download.

20 Responses to “Malayalam Wikipedia releases selected articles on CD”

  1. irfan says:

    i appreciate your work ..

  2. Evin James says:

    Santhu,

    Your hard work and commitment to build up a strong base for the e-Malayalam will never go unnoticed. Initiatives like this would require tremendous amount of hard work, technical knowledge and a good vision.
    You are wonderful when you dont stay for the credits but when you move on to the next projects once things like these are done
    All the best buddy!

  3. Pranava Swaroop says:

    Woah!

    That is amazing! Good work, keep it up!

  4. Great work. A note on comparison with other solutions for offline-cd for wikipedia content like http://moulinwiki.org/wordpress will be helpful.

  5. Sreenadh says:

    Great work. You have done a wonderful job. അഭിനന്ദനങ്ങള്‍ :)

  6. Anoop P says:

    സന്തോഷ്. അഭിനന്ദനങ്ങൾ

  7. Ajith says:

    താങ്ക്സ്….

  8. Basil Kurian says:

    Congratz

    earlier there was similar project called Webaroo

  9. naveen says:

    Congrats santhosh :) Gr8 work

  10. jafa says:

    its wonderful!

  11. jafar says:

    hai! wiki cd is excelent. how to run the programme(win2cd)? through which software i can run it?python?

  12. sts says:

    You post informative posts. Bookmarked !

  13. Vipin George says:

    congrats, keep it up !

  14. Jijo says:

    Congrats.. Gr8 work..

  15. robin paul says:

    thanks for your great work….you bring technology & knowledge closer to and empower our people..thanks once again….

  16. Thomas Rohde says:

    Hello,

    I’d like to express my respect for your work. Sadly, my Malayalam is ~46 years old and I can’t read or speak it anymore. Yet I still hope to have the opportunity to learn it again.

    Greetings from Germany,

    Tom

  17. […] this is the first time a wikisource project release its offline version. Malayalam wiki community had released the first version of Malayalam wikipedia one year […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>