Our main research contribution is a novel mathematical model that incorporates the semantics of similarity links in the entity graph into the ranking mechanism in a scalable way. A part of this work is presented in the 4-page poster paper Zhiltsov, N., Agichtein, E. Improving Entity Search over Linked Data by Modeling Latent Semantics accepted for CIKM 2013. The remaining part will be published (hopefully!) as a full paper early next year.
From the engineering perspective, this project gave birth to a bunch of open source software spin-offs that might be interesting for researchers/developers from related communities (i.e., machine learning, information retrieval, Semantic Web):
- Ext-RESCAL is a memory efficient and scalable implementation of the sparse tensor factorization algorithm RESCAL
- Anduin is for processing RDF/N-Quads data on Apache Hadoop
- Lucene-MLM adds support of mixture of language models to Apache Lucene 4.0
- a large data set of labeled entity search queries and entities from Yahoo! SemSearch Challenge.
PS. I would like to thank all the people who have made this my first visit to the United States not only possible but wonderful (deliberately hiding most of their names to protect their privacy): the Fulbright representatives in Moscow and Atlanta/Georgia, professor Eugene Agichtein & his colleagues from the Intelligent Information Access lab at Emory University, people from Math & CS department and ISSS office at Emory University, my new friends in Atlanta, kickball teammates, and Fulbrighters in Georgia I met during our terrific Fall Trip. You're great!
А ты не думал на стажировку в LinkedIn податься? Мне Даша говорила, что они активно ищут стажёров в ранжирование.
ОтветитьУдалитьПосмотрим.. они же будут хантить на CIKM?)
Удалить