Are you a student, do you want to have an awesome job during the holiday and do you want to make an impact on the world? Then definitly check out the amazing Google Summer of Code program! In a nutshell: Google Summer of Code (or GSoC for short) is a program in which Google sponsors students to work on open source projects during the summer holiday. This allows students to get real world experience while being mentored by experts in the field and gets the open source projects potential long term contributors.
One of the open source projects I’m involved with, Semantic MediaWiki, is applying this year as mentoring organization. Although no organizations have been officially accepted yet into GSoC, we are confident we’ll be able to mentor at least two students and already have over a dozen high quality proposals. Check them out!
You can contact me or any of the other people that enlisted themselves as mentor on that page for questions and advice.
I just moved to Berlin to work on the WikiData project for a year at Wikimedia Germany.
The WikiData project will start on April 1st (yes, really) and will tackle many data re-usability issues by having structured storage much like done by Semantic MediaWiki (although it will not use SMW). The basic idea is to have a “data-wiki”, which will hold common data, much like Wikimedia Commons holds common media (mainly images). Initially this wiki will serve as an entity base for the different (language) Wikipedias, enabling the removal of inter-language links where people want to use the new data wiki instead. After that we’ll allow for adding properties to the entities, which articles will be able to pull into factboxes and have some editing UI on the Wikipedias to edit this data in place and send it back to the data-wiki. The final phase will be allowing for (limited) queries on the structured data of the data wiki from the Wikipedias to automatically generate lists and tables.
This is a real game changer, and if you ask me the most fundamental change Wikipedia will have seen since it’s inception. The goal of the Wikimedia Foundation is to make all human knowledge freely available to everyone. Having it in a structured format that can be queried makes this data available to so many more people and more much more easy to actually make sense of. External applications will be able to build on top of Wikipedia like never before. And of course the win for Wikipedia itself is huge as well. Data will be nicely consolidated instead of duplicated across wikis, and so be more consistent. And data that is now available on any Wikipedia will be available on all Wikipedias, vastly increasing the content on all non-English Wikipedias. The world encyclopedia will turn into the worlds database (ofc it’s function as encyclopedia will not get lost)
I can’t wait to get started on this!
A high level overview of the project and it’s phases can be found here. You can also follow me and Denny on twitter of updates. We might get a dedicated account later on, but for now @WikiData is not under our control :/
Three days ago I moved from Gent (in Belgium) to Berlin for at least a year. This is because I will be working on WikiData for Wikimedia Germany, an awesome project that will bring semantic capabilities to Wikipedia.
Although I have moved to Berlin, I do not have my own apartment yet and am currently staying at a private room in shared apartment thing for a month, by which I hope to have found something more permanent. I have not found anything so far, many due to looking for something that’s fully furnished, in a quit neighborhood, not over 900 eur/month with everything includes (ie heating, interwebs, commission, ect) and within 15 to 20 mins walking distance from where the office will be. Turns out not a lot of stuff matches all these criteria
So far I’m definitely enjoying Berlin, which I think is a great city. I became a member of c-base, the huge and awesome hackerspace of Berlin and already met quite some interesting people there, including some of my soon to be colleagues on WikiData.
It’s been a while since I last wrote a blog post, and this is in part due to the last extension I wrote sadly enough not being available under an open license due to client demands :/ This blog post is about what I’m currently working on, new infrastructure for a project that I hope you’ll agree upon is totally awesome.
Open knowledge and open learning tools go hand in hand, thus Wikipedia can be an amazing resource for education further then simply looking up things. The Wikipedia Education Program strives to involve university students with the Wikipedia community, by creating and editing articles as part of their coursework. This is a real win-win for everyone involved. The students get to do something which is actually useful, as their work won’t disappear into some dusty archive of their university, and they learn working collaboratively with other people in the Wikipedia community. The gains for Wikipedia, and it’s community, are probably pretty obvious; more contributors, more content, more people to help increase the quality of the encyclopaedia as a whole.
If you are interested in bringing this program to your university, definitely check out the Wikipedia Education Program (WEP). Also, you can find more comprehensive posts about the program itself, as well as many success stories, on the Wikimedia blog.
The WEP started several years ago with a handful of universities participating. Such participation and enrolment was kept track of manually on wiki pages. Obviously, this does not scale well, and now that the program has two or three orders of magnitude more students, new tools are urgently needed. The main goal is having a way for students to enrol themselves, and then be tracked by their mentors and people of the community wanting to provide assistance where needed. Further a lot of nice things can be done, such as keeping track of contributions, drop-out rates, ect, so that problems can be spotted, and measures can be taken to avoid these in the future. You can find an initial draft of the requirements here.
I’m implementing this as a new MediaWiki extension, which will when completed, be placed onto Wikipedia. The extension is, probably unsurprisingly, called Education Program. It’s currently in early alpha stage, so not much to see there yet. However, a draft of it’s functionality can be found here. Management wise it’ll be somewhat similar to Contest, but more extensive (since there is more stuff to manage), and hopefully improved along multiple dimensions. I still need to put more thoughts into the exact flow for students though, and discuss this with the other people involved.
This is a screenshot of one of the many special pages making up the management interface. It lists all courses, allows you to browse through them (paged) and filter on criteria. Program administrators and mentors also get to see a control to add new courses, which then takes them to a new page with a form. Very similar pages exist for other types of objects, such as Institutions, Terms, Mentors and Students (although the later two are somewhat different, since they cannot be modified by people other then the students or mentors themselves).
I will likely be working on this for another 2 months, after which I will start work on an even more awesome project (don’t get me wrong, the WEP is definitely awesome), on which I’ll post more later on.
During the last 3 months I’ve been doing Stanfords online machine learning class. This as a great experience, and I now at least have a solid feel on the subjects covered in the course.
I actually started off doing the Artificial Intelligence class, and then found that the Machine Learning one was more interesting for me, and even of higher quality. So I decided to do both classes. After a few weeks I found this was really to much to do on top of my regular work, and decided to drop the AI class, so I could focus on the ML class and get good results there, rather then mediocre results on both. The ML class is made up of 18 lessons, each consisting of a set of videos with in-video mini-quizzes, review questions and programming exercises (in GNU Octave, similar to MatLab). Although I don’t have the official score yet, by my own counting I have 800 of 890 points, of which 70 I lost by not making all of the last set of programming exercises due to being sick.
Stanford offered 3 such online classes during Q4 of 2011 (AI, ML and databases), and is tripping this number in Q1 2012. As a response, MIT is going to extend on it’s OpenCourseWare platform. This is great news for online education, which has made huge strides in the last few years with things such as Khan Academy, these online courses by universities and the Wikipedia Education Program (more on which in my next blog post). If you want to teach yourself some new things, definitely check out these awesome programs
As it’s been 2 months since my last blog post, I figured it was time for another one. Quite a few things I could write about (SMWCon, my new awesome laptop, Stanfords AI and ML classes, me moving to Berlin, …), but I decided to give some introduction to my most recent MediaWiki extension: Contest.
Contest extension that allows users to participate in admin defined contest challenges. Via a judging interface, judges can discuss and vote on submissions. I created it for the Wikimedia October coding challenge, so it got a nice amount of review, uncovering some minor misconceptions I had about some core MW code, and it got deployed on MediaWiki.org. The coding challenge is quite awesome, but I won’t discuss it any further here, so check out the linked blog post if you’re curious/interested.
Feature overview:
Requirements
Some screenshots
Some background
When starting with this extension, it was clear pretty quickly that it could really use the awesome DBObject class I’ve been incrementally creating over my last few extensions and mostly finished in Survey. This class is a wrapper for objects of a certain type, which is equivalent to a row in some db table. Even though it’s in essence very simple – it just has a field that is an associative array with field => value – it’s also very powerful and flexible. When I started with this, I had no idea it would turn out to be so neat. The bad news was that I could not use PHP 5.3 or later for Contest, while the DBObject class uses late static binding, which was introduces in PHP 5.3. I came up with a simple hack: all static methods in the base class have been made-non static, but are marked as “should be static”. Then, every deriving class has a public static function s(), which returns a (cached) instance of the class. So then for every “static” method you need to call, instead of ClassName::methodName(), you do ClassName::s()->methodName(). If you know this, and do not misuse the non-static-but-should-be-static methods in the base class, it retains all it’s niceness, at the cost of something that’s pretty much a tiny bit of syntactic sugar. And it’s quite obvious how easy it will be to replace this with actual LSB usage once this becomes possible
Download
What’s next?
There are various small additions that could be made, but one things really stands out: contest configuration with version history. Right now, you can create contests and challenges, modify them and delete them. But once you make a change, the previous version is lost. You cannot revert. You cannot compare. You can’t even see who made the change or when. Implementing such a thing is not trivial, especially if you want to have a generic system that can be used by any extension that wants to store data and have version history for it. And if you think about it, quite a few extensions could use this. Let’s have a look at my extensions, latest first, and see if they can use it:
That’s 6 out of 10 just for the extensions I wrote. What it comes down to is that pretty much any extension that has some sort of settings interface that is not user-specific, could use this. And maybe even the user specific ones, which would obviously include the user preferences in MediaWiki core. So why not store this data in wiki pages, such as done by Maps for layers? You could even store it as JSON or serialized PHP objects if you need more complexity… The things with this is that it only works for simple use cases (such as the layers in Maps), and even then is limited. You cannot query over the data as you do not have it in relational form. And you cannot have fine grained access and write rights control over the data, which in a lot of cases is quite important. So a generic solution here would be an awesome addition to MediaWiki if you ask me.
More info on Contest can be found on it’s documentation page.
Over the past 3 weeks I’ve been working on a new MediaWiki extension that allows creation of on-wiki surveys by admins. It comes with a whole bunch of neat features, and is the most awesome (code wise) extensions I’ve created so far.It’s aptly titled Survey.
Feature overview
Requirements
Survey makes use of many new features introduced in MediaWiki 1.17, and therefore requires this version or later. It even makes use of MW 1.18 features, with fallback code for MW 1.17
It also makes use of 5.3 features, these being late static binding and anonymous functions, so it won’t work with PHP 5.2.x and earlier.
Some screenshots
Downloads
Some background
I developed the Survey extension as WikiWorks consultant for the IEEE, with some help from Yaron Koren.
What’s next?
There are many many features that can be added to this extension to make it even more awesome. I’d like to get some initial feedback on version 0.1, so the usability issues and bugs that might be there can be ironed out. Please place any feedback you might have on the discussion page. This initial release contains all the features my client needed, so if you want to have new capabilities added and can fund the work, definitely contact me
Yesterday I released version 1.0.3 of the Maps and Semantic Maps MediaWiki extensions. This release re-introduces Google Earth support, this time for Google Maps v3, and enhances the KML/KMZ support for this mapping service as well. Many thanks go to Jon Povey for funding the implementation of these features! Since I didn’t make any release announcements for 1.0.1 or 1.0.2, I’ll just include changes made in these versions as well, effectively treating this as the release after 1.0.
KML/KMZ support
The Google Maps v3 service now supports 2 new KML related parameters: kml and gkml. Both accept a url pointing to a KML (or KMLZ) file. The first one uses a KML parsing library (geoxml3) included in the extension to translate the features described in the KML file into elements to place onto the map. This is very nice for people that do not want to be dependent on third party services, but sadly enough, the library is somewhat limited. It lacks support for more advanced KML features such as polygons and paths. The gkml parameter uses Googles KML service, which pulls the KML file to some Google server, and then decides if it should be send to the client (for simple and small files), or if it should be rendered server side and send as tiles to the client (for big files or files containing advanced features such as polygons).
You can now also choose if you want the map to rezoom after the KML layers have been loaded or not using the kmlrezoom parameter. KML layers will load a bit after the map, since they require extra resources to be loaded, and there is no need to let the user wait to see the normal map until those are done loading.
Google Earth support
Maps already has support for Google Earth since on of it’s earliest versions. This was quite easy to achieve as Google Maps v2 natively supported it. Now with the switch to Google Maps v3 in Maps 1.0, people asked for Google Maps support in that as well. Unfortunately Google Earth is not natively supported here. Maps now provides support for GE using the Google Maps utility library v3. The earth type can be enabled by adding “earth” to the types parameter, or setting it as the default type using “type”. Do however note that due to this not being officially supported by Google, it has some deficiencies. For one, when switching to GE, the map controls won’t be displayed any more, preventing you from switching back. Also, the GE plugin is only supported on Windows and Mac, so won’t be usable for mobile or Linux users.
A completely new thing added in 1.0.3 is the tilt parameter, which, as you can probably guess, allows you to set the initial tilt of the GE layer.
Full list of changes since 1.0:
What’s next?
For now, I have no specific plans for changes or addition to either of the mapping extensions, further then some minor script loading improvements, as I ‘m working on several other projects. However it’s likely that people will have suggestions for new features at SMWCon Fall 2011, which is next week.
Download
As this particular project is coming to and end, I figured I’d do a quick blog post on it.
Wiki Loves Monuments (WLM) is a photo contest for European monuments, organized by Wikimedia this September. Last year some JavaScrip hacks on the regular Wikimedia Commons (the media repository for Wikipedia and other Wikimedia Foundation projects) upload interface where used for this contest. This year the new and completely awesome Upload Wizard (UW) will be used, with configuration optimized for WLM. I created a campaign-based configuration system from the UW and also added a bunch of new settings.
2 new special pages where added. One listing all campaigns, their status, and edit and delete links. This is at Special:UploadCampaigns.
The other special page handles the edit action and displays a list of all available settings that can be modified for the campaign. This is at Special:UploadCampaign/name.
A campaign can be applied to the UW by adding the “campaign” url parameter with as value the campaign name, ie ?campaign=wlm-be.
One fun thing about the architecture of the campaign system is that the setting support is very generic. I created a new settings class that pulls in the default settings, overrides these with the wikis config (ie PHP vars in LocalSettings.php), passed URL arguments and finally the upload campaign settings if a campaign is specified. I like this kind of setup, as it’s a lot nicer then dealing with over 9000 global variables, and in the meanwhile already applied some variation of it in Semantic Signup and in my new Surveys extension. And I wrote up a more general and powerful version of such setting handling in the Maps extension. Unfortunately this code uses late static bindings and thus requires PHP 5.3, making it not usable in actual code for quite a while
Another neat thing is that the upload campaign class only specifies a lift of settings that should be configurable for upload campaigns, together with what kind of HTML form input they should be displayed. That info is then merged with the settings obtained from the settings class and put into a FormSpecialPage, which uses HTMLForm to display anything without any further hassle
Not used the Upload Wizard before and curious how it works? Go upload some nice stuff to commons then
A few days ago I released version 1.2 of the Live Translate MediaWiki extension, which is a major update bringing mainly under-the-hood improvements. I’ve worked on this for about 3 days in my free time, mainly to try out some JavaScript techniques I had not utilized yet.
These are the changes for 1.2:
This post is about the first two.
Some background
The Live Translate (LT) extension allows live translation of content in wiki pages. For this it uses translation services such as Google Translate or Microsoft Translator. It also allows specifying your own translations for certain words within the wiki, which will then be left alone by the (remote) translation services. Such specifications of translations are called translation memories (TM), and are typically done in a special XML-based format called Translation Memory eXchange (TMX). LT also supports a more wiki-friendly format, custom written, which is DSV-based. Translation memories in both these formats can be embedded in wiki pages designated as TM or you can point to files hosted somewhere else. What it comes down to is that there are a set of local translations, which require special handling: local translation and be ignored by the remote translation service.
On every translation, the JavaScript needs to know which are the special words that have a local translation, so translations for these can be requested, and measures can be taken to not send them to the remote translation services. This means doing a call to the wikis API to obtain these words. In case of big translation memories, this requires several calls to obtain all words, often resulting in a few seconds wait before local translations are even requested. If there are words that have local translations on the page, a single request I send to another part of the API to obtain these translations for the language currently translated to. This usually bring the total time to complete local translation to somewhere between 2 and 5 seconds, after which, in version 1.1 and earlier, remote translation is kicked of.
The idea
Translation memories do not tend to change all the time, so it’s very inefficient to request all special words for every translation, and in somewhat lesser degree to always request the translations. The obvious answer to this is local caching, and since I wanted to play around with HTML5 localStorage a bit, this is exactly what I did. I also wanted to make use of JavaScript capabilities I was not really aware of back in last December, when writing Live Translate, so took this as an opportunity to also do some JavaScript refactoring. These being primarily prototypes, closure scopes and callbacks.
The realization
I made a whole bunch of client side changes (and some server side changes to the API), but the most significant ones are the creation of a translation memory object which takes care of all caching, and the rewrite of the translation control to a jQuery plugin.
Translation memory object
In file includes/ext.lt.tm.js.
The translation memory object class is named simply “memory” and resides in the “lt” namespace. It acts as abstraction layer via which special words and translations of those special words can be accessed. It takes care of all API interaction and caching and exposes 2 simple functions, getSpecialWords and getTranslations, which are called by the translation control.
When the cache is empty, the memory will request a new hash via the API, which indicates the “version” of the translation memories on the server, and is later used for cache invalidation. It the proceeds fetching the requested special words or translations of special words and returns these via a callback passed to either getSpecialWords or getTranslations. Before this last step is done, the obtained data is cached in memory (the words and translations fields, one lines 26 and 30, respectively), and, when available, also in HTML5 localStorage. The in memory caching only yields advantages when doing multiple translations on a single page, which is rather rare, so is not that much of a win. The data stored in localStorage on the other hand, persists when navigating to other pages, even when closing the browser and re-opening it. localStorage really isn’t a cache on it’s own, but the lt.memory class uses it as one.
When the cache is not empty, a single request to the API is made to compare the earlier obtained hash and see if any changes to the TMs have been made, and thus if the cache should be invalidated. If changes have been made, the stored data is discarded and pretty much the same as when the cache was empty happens. If no changes have been made, locally stored data is used where possible. In case of the list of special words, no requests will have to be made at all, since all such words are already known. For the translations of these words it’s a little trickier, since the needed data here varies from page to page, and also depends on both the source and destination language. The lt.memory class checks which of the needed data if available, and in case there is a remainder of non-known translations, requests these. The newly obtained translations are then of course also added to the cache.
jQuery plugin
In file jquery.liveTranslate.js.
This plugin contains a lot of already existing code from Live Translate 1.1, but is structured a lot better. It takes care of creating all the HTML needed for the control (while in 1.1, the HTML was provided, and only events where bound to it) in it’s setup function (line 147). The click event handler for the translation button calls the obatinAndInsetSpecialWords which uses the getSpecialWords function of the lt.memory class to obtain the words with local translations, and then inserts them, meaning that occurrences of these words are wrapped into notranslate spans, which then enables finding all words which should be translated locally, and makes them be ignored by the remote translation services. The click handler function passed doTranslations as completion callback to obatinAndInsetSpecialWords, which starts both local and remote translation in parallel. Local translation is done, as you can uncountably guess, by calling getTranslations function.
The results
Once the cache is warm (the user made a translation before) and valid (the TMs have not changed), local translation is practically instant (~0.4 seconds in my tests). Since remote translation now starts as soon as the special words are known, this can take as little as ~0.2 seconds, a huge difference compares to the earlier up to 5 seconds and possibly longer. All assuming you are using a modern browser of course
Not to forget, the code is a lot better structures now. It should be a lot easier to track the order of execution, and it’s now possible (JavaScript wise) to place multiple translation controls onto a single page (which has little practical value, but indicates a better design).
And, maybe most importantly for me, I now have a much better grasp of the earlier mentioned prototypes, callbacks and closure scopes. Perhaps most of the time I spend on this version was figuring out how to properly use these and debugging out misconceptions I had about how they worked
Live Translate

Categories
Tag Cloud
Blog RSS
Comments RSS
Last 50 Posts
Back
Void « Default
Life
Earth
Wind
Water
Fire
Light 