This is a description of my proposal for Google Summer of Code -- Richard Szopa

Tool for computer aided vocabulary learning

An important part of learning a foreign language is improvement of your vocabulary: learning new words and consolidating the vocabulary from earlier lessons. The paper-and-pencil method, when you have a notebook in which you write down each new word, has some serious drawbacks. You cannot easily link words from different lessons, and you never know if the word didn't appear in a different context four lessons ago. If you aren't disciplined enough, your notes may easily become a mess (dysgraphia doesn't help). Every correction you make makes the whole thing a little less readable. If you didn't manage to note down the meaning of a word, you have to consult a separate dictionary.

Also, there is no obvious way of checking whether you have memorized a group of words, at least if you don't have someone to help you.

On the other hand, computer aided vocabulary learning could be much more pleasant. You don't have to worry about organizing graphically your lexicon. You could change the order in which the words appear, group them, assign tags to them, link them... All of this without making the readability worse. Finally, the computer could perform a test in order to check how well you have memorized the last lesson. Computer aided vocabulary learning could be less tedious, more fun, and much more efficient than the traditional paper-and-pencil method.

I think it would be very nice if Edubuntu (or even Ubuntu) would incorporate such a tool for computer aided vocabulary learning.

Firstly, it is crucial that we realize that a program for enhancing ones vocabulary has very little to do with a dictionary.The main profit for the student comes not from storing lexical information, but from the very process of *building* his or her vocabulary. Categorizing words, linking them, attaching pictures allows the student to remember the word in a specific context and, effectively, designing some kind of a mind-map of the vocabulary he or she is learning.

The program would have three main modules. The first would help the student to organize his or her vocabulary. The second one would allow him or her to explore it. The third one would perform some quiz-like tests to check how well the student has memorized a lesson. The last module would give the student feedback on how well is he doing and suggest lessons to review.

1. Organizing the vocabulary

The organization has two levels. The basic unit for grouping is a _lesson_: a set of words that have been presented during one learning session. Secondly, a number of tags could be attached to each word (think “people”, “animals”, “vehicles” and such stuff). Thirdly, words can be linked (for example, words connected with cooking). Finlay, each entry has a field for grammatical information. For example (if the learnt language would be Dutch), a noun could have an annotation whether it is a “het” or “de” word. The program would recognize the situation when a word has been defined in an earlier and offer to copy the information from that entry (*not* just do it, because of homonyms and words whose meaning depends heavily on the context). It may be considered adding an option to browse the Internet to find an appropriate picture.

2. Exploring the vocabulary

The words would be displayed on a lesson basis. Links to other words would be clicable, but always with an easy possibility of returning to the former location (the links shouldn't distract the student from the lesson he or she is trying to learn).

3. Testing module

There is a number of possibilities of how the program may asses the student's proficiency in a given lesson. For example:

* The program presents a word in language A and the student has to input its translation in language B. Or vice versa. A variation would be that the student can ask for hints: then the computer reveals some metainformation attached to the word (what is it linked to, etc.). The input is tested not only for exact matching, but also for being similar to the correct answer (for example, Levenshtein distance).

* A quiz: the program presents a word in language A and three words from language B, from which only one is a correct translation. The student has of course to find which of the three matches the word from language A.

4. Feedback

First of all, the most obvious: how well has the student done in the exercises. Also, plotted against time, and how much has the student spent exploring that lesson. Also, this module would keep track on which word-sets are explored when, and e.g. suggested a lesson that hasn't been reviewed for a longer time or from which lessons the student has the lowest scores.


The program would be written in Python, with a GTK+ GUI. I'd like the design to be very “gnomish”---clean and easy to use.

There are at least two possibilities of storing the vocabulary and other data. The first would be an Sqlite database, the second---an XML file. The first option would be good if the vocabulary gets rather large. The second one is good because it is a human-readable text format, and can be easily manipulated (a nice option would be the possibility to export the whole vocabulary as a densely hyperlinked HTML page).

(It is also possible to combine this two approaches... The program should collect quite a lot information about how much time the user spends on each lesson, the scores and so on---and this could be stored in the database, while keeping the vocabulary as XML.)

The program should enable the user to create different profiles. For example, one person may be learning at the same time Russian and Norwegian and these two vocabularies should be kept separate (at least we are not aiming for some kind of Russenorsk (see

There are also some possibilities of integrating the program with other Gnome components. For example, I think it would be nice that the fields in the“foreign” (learnt) language would be spellchecked by gnome-spell. It could also somehow make use of gnome-dict, but I wouldn't overemphasize it (the main purpose of using this program would be to create a vocabulary, not a custom dictionary).

I have already written a program that implements some of the described functionality (nom. testing), and it can be found at Part of it was written before a German exam, and the rest before a Hungarian exam my girlfriend was taking. It was fun and it was the first GUI program I wrote in Python.

ComputerAidedVocabularyLearning (last edited 2008-08-06 16:17:56 by localhost)