Computational Linguistics for Metafused
My latest project has been working for Metafused Limited, whom I had contacted via LinkedIn. They are building a system to extract semantic data from social media, so the research they needed was a good fit to my experience. Unfortunately, their budget for the project only stretched to a two-month engagement, but we've had a good experience working together, and are hoping to do so again at a later stage.
The tasks that I've undertaken on this project have included selecting the best database system for the project, and integrating a number of existing datasets into it. These datasets will form the basis of the the toolset that Metafused will use to extract semantics from free text. One tool that I've created to help them do this is a Word Sense Disambiguation component, that chooses the most likely WordNet sense for each word in a given sentence (taking multi-word expressions into account). In 3 weeks I managed to create an algorithm that reached 73% precision and 72% recall. Given enough time I'd be able to improve on that, but it's a good result given the time constraint, and should I return to Metafused in the future it will be something I can build on.
The project comes to an end on the 18th March, so I'm now seeking a new contract for when it finishes. I had been contacted by a company working on recommendation technology who had seen an article on my personal blog about the bootstrap problem, and were interested in working with me on the basis of that. It's still a possibility in the future, but right now they need an architect more than an algorithm developer.