Annotation graphs and the IPython notebook

I recently became obsessed with annotation graphs and linguistic graphs in general. Since September I am working with the graf-python library that implements the ISO 24612 - Linguistic annotation framework (LAF) standard. I planned to publish the corpora of our research center in an XML format that is easy to use, that is somehow officially recognized and that allows stand-off annotations so that I can share data and sets of annotations independently. GrAF/XML looked like an optimal solution, so I gave it a try.

First, I published a subset of the data of the QuantHistLing project as GrAF/XML files. I wrote a tutorial how to access the data in Python which is available as part of the graf-python documentation. But I also had the idea of making the dictionary data of the project more accessible to researchers. One thing we commonly use in language comparison are Swadesh lists that contain a fixed set of words for 200 concepts that are supposed to be general and exist in most of the languages. As most of our dictionaries contain a Spanish translation my idea was to connect the dictionaries via those translations, but only use the words of the Spanish Swadesh list. I developed an IPython notebook to connect the dictionaries of the Witotoan language family via the stem of the Spanish word comer. I basically transformed the annotation graphs from GrAF into a networkX graph that I could easily visualize using the D3 javascript library:

Published at http://bl.ocks.org/4250342

You can find an in-depth description about the transformation and the complete notebook in another turotial for graf-python. It already looks quite nice, IMO. I am now trying to make the visualization more useful to linguists, by providing more interactivity and maybe search options to query for words or sets of words. As an intermediate summary I can already tell that IPython helped a lot in working interactively with linguistic data in this case. I could try out different ways to combine and visualize graphs easily and use a JSON representation of Python data structures to publish the results in a HTML/Javascript application. Right now I save the JSON data into files and I am still looking for a way to directly stream the data from IPython to Javascript somehow. There are different ways how Python and Javascript can interact in an IPython notebook, the “official” solution is to provide an HTML representation of the Python objects as far as I understand. My goal is to completely decouple the two if possible.

About me

I work since more than 20 years as a developer, product manager and AI lead with language technologies. Starting with speech recognition and machine translation I now focus on education in semantic technologies and LLMs.

Check out my AI trainings.

Contact me and book your training.

Send me a message and I will get back to you.

pbouda@outlook.com
+351 917403181
Lisbon, Portugal