Поиск по блогу

вторник, 15 октября 2013 г.

Suggestions on How to Amplify Research Productivity


Our lab at Kazan Federal University is the leading Russian research group in the area of Semantic Web, and one of the leading Russian academic research groups in natural language processing and information retrieval.

Recently, we have had a fruitful in-house discussion about ways to boost our research productivity. I herewith would like to provide some of the insights (after making proper generalizations) that may be useful for emerging and not well established labs in computer science from elsewhere (not only in Russia). These principles are as follows (the order is rather haphazard):

1) Aligning the research schedule with leading conferences' deadlines

Computer Science is utterly conference-oriented area of research (unlike mathematics, biology, medicine AFAIK). As our experience has shown, the benefit from participation in middle-tier conferences (especially, hosted within the country) is questionable. One may put quite comparable efforts to finish papers, prepare slides and visit such events, and ... get almost nothing in terms of receiving helpful feedback, having considerable success under the belt etc. Submitting only for leading conferences is much more beneficial: you get invaluable feedback about your work and prove your professional status by surviving under the strong peer review. But, one may argue, moving from the second world to the first world is challenging (and that's true!), and there's no way a developing lab can leap across the line. Interestingly, leading conferences provide a lot of opportunities to make this happen. For example, there are co-located workshops, multiple tracks covering different aspects (research track, demo track, Ph.D. student track) and selecting under their own competitions and, therefore, variable acceptance rates (formats of full/short/poster papers). 
Next, a fair question is how to come up with a list of prominent conferences in your particular field? In general, there is a consensus about the pantheon of such conferences in every field. For example, in the information retrieval community, everyone knows that ACM SIGIR conference (e.g. the latest one) is the most prestigious, then, one may distinguish CIKM, WWW, WSDM etc. I agree that sometimes it's harder to reach agreement. Then, I refer you to the Google Scholar metrics tool (e.g. the results for databases and information systems), which uses cumulative h-indices for last 5 years and looks instructive for the purpose. Yet, CORE ranks are alternatively based on qualitative judgments.

2) Organizing access to powerful machines for real-world experiments

An essential feature of any competitive research in computer science (and, perhaps, in other areas of science too) is a thorough evaluation of proposed methods, techniques, algorithms after careful well-conducted experiments. For top-tier conferences, this requirement becomes a necessary condition. Very often such experiments (especially, for IR, various applications of machine learning) demand the adequate infrastructure. Otherwise, the data and, therefore, the findings would be biased and controversial (to tell a long story short). In principle, these machines are not necessarily expensive standalone servers or large distributed clusters meaning enormous costs on technical support for universitites. Fortunately, a university could buy access to commercial cloud services (e.g. AWS) for conducting experiments nowadays.

3) Boosting collaboration between departments and labs at university

First of all, this point includes interdisciplinary research (e.g. bioinformatics). It goes without saying that  getting problems from domain experts is easier than framing them on your own. In some cases, this is a clear way to complementary funding.
Next, that's particularly true for Russia: people from some divisions within the CS department may be interested and proficient in technological aspects only, that is, designing programming solutions (web-, mobile apps). Collaboration with them may benefit researchers in creating killer apps as spin-offs of their research efforts. These works could result in authoring a paper on a demo track or (why not?) starting a profitable business.

4) Boosting collaboration between researchers in the lab

Researchers usually collaborate between each other by working on the same long-standing projects. That's a natural and convenient way to work together in science. However, I would like to suggest some hints on how to add more agility to the day-to-day cooperation and intensify sharing knowledge between colleagues working in the same lab, but on different projects.
The first is do not hesitate to ask expertise from colleagues in the relevant areas. For example, if someone is doing research in information retrieval and needs an advice on basic or even advanced techniques on natural language processing for his work, it would be better to start from asking his colleagues, whose work is devoted to this field, instead of wasting time on re-invent the wheel. Depending on the contribution, this may result in co-authorship of a paper or acknowledge in it. Looks simple, doesn't it? But how many acts of mini-collaboration have you had during your work? I can admit that I have quite a few.
Next, participation in random contests. They may be internal/open hackathons or hack sprints aiming to create a product prototype from scratch or (that's more relevant to the research community) solve an R&D task on Kaggle or Yandex-Matematika (e.g. the latest one). These forms of collaboration are clearly borrowed from the experience of IT & software developer community. However, they look very favorable to the research community too. Such events provide opportunities to share scientific experience as well as knowledge in technologies (e.g. scientific languages, libraries, frameworks) not only between researchers in the lab, but also outsiders including students.
Finally, regular research seminars about current progress are also important. Inviting speakers from other universities and companies from industry (financially supported if necessary) is a good practice I saw at Emory University (USA). The similar Russian initiative is perhaps Computer Science club in St.Petersburg, Yekaterinburg, and, since recently, in Kazan.

5) Establishing connections with other research groups

Thanks to recent initiatives of the federal government and local (Tatarstan republic) government, some opportunities have emerged. First, there is a program for inviting young researchers (a program for inviting post-docs is yet to come in 2014) for short-term onsite collaboration. (Stay tuned with these initiatives. If you will be eligible and interested in collaboration with us, feel free to email me or ITIS). Second, there is a $1000000 (yep, with 6 zeros) program from President of the Republic of Tatarstan for a computer science "star". The money is granted to create a science school with students for several years at Kazan Federal University. Hopefully, this support will help us to extend our research network.

6) Improving English (critical for the Russian audience)

The modern science speaks English. While mastering English is a personal task of each researcher (by no means an easy one in a non-English speaking environment), there is a couple of hints on it besides usual learning. I believe that switching to English on regular seminars looks reasonable. Another affordable remedy against writing papers in poor English is outsourcing proofreading of papers to native speakers or experienced translators. It does not look costly, but will definitely improve the delivery quality.

7) Applying for grants from industry

The collaboration with industry is crucial for a competitive research lab in academia. The decent funding is not the only pro. It's much more about real-world tasks as well as sharing knowledge including experience in organizing an effective workflow. The examples are academic research grants from Google and Yahoo. This point also includes a practice of personal summer internships of Ph.D. students in industry.

8) Financial support

Unfortunately, this is still quite a problem for most Russian institutions. A bitter blow is that Ph.D. students and postdocs, the main workhorse of research in the US and Europe, are marginally supported in Russian academia. As a result, students have to seek the external financial support outside academia and that distracts them from research activities. Obviously, without guaranteeing a baseline support for these groups of people including wages (grants?) along with covering travel expenses for attending international conferences, there is no way to achieve parity of productivity with competitive labs in the world. Fortunately, in Kazan Federal University, we see the positive changes in solving these problems.

9) Improving research skills

Last but not least, I highly recommend Eamonn Keogh's tutorial "How to do good research, get it published in SIGKDD and get it cited" for everyone, especially, beginner researchers. This indispensable resource covers a bunch of very important aspects of high-quality research work.

PS. The thoughts I described here are subjective and even provocative. The main goal of this post is to start a discussion on how to make the research community (particularly, in Russia) more productive.

вторник, 30 июля 2013 г.

My Visiting Project at Emory University: Entity Search over Linked Data

In this post, I would like to summarize some research output of my recent work at Emory University as a visiting graduate student, which was funded via the Fulbright Program. The work in collaboration with professor Eugene Agichtein is devoted to entity search over linked data, a very hot topic nowadays. To put that into the context, Google Knowledge Graph, Facebook Graph Search and WolframAlpha are likely the best examples of products that empower entity-centric user experience. That is, this work is about how to resolve search queries soliciting information about certain people, organizations, locations, events, scientific concepts and other entities, assuming that such information has been somehow extracted, structured and shared ("linked data"). For example, you may think of entity descriptions available at Wikipedia, Freebase, GeoNames, Last.FM, CiteSeer and other compelling web resources.

Our main research contribution is a novel mathematical model that incorporates the semantics of similarity links in the entity graph into the ranking mechanism in a scalable way. A part of this work is presented in the 4-page poster paper Zhiltsov, N., Agichtein, E. Improving Entity Search over Linked Data by Modeling Latent Semantics accepted for CIKM 2013. The remaining part will be published (hopefully!) as a full paper early next year.

From the engineering perspective, this project gave birth to a bunch of open source software spin-offs that might be interesting for researchers/developers from related communities (i.e., machine learning, information retrieval, Semantic Web):

PS. I would like to thank all the people who have made this my first visit to the United States not only possible but wonderful (deliberately hiding most of their names to protect their privacy): the Fulbright representatives in Moscow and Atlanta/Georgia, professor Eugene Agichtein & his colleagues from the Intelligent Information Access lab at Emory University, people from Math & CS department and ISSS office at Emory University, my new friends in Atlanta, kickball teammates, and Fulbrighters in Georgia I met during our terrific Fall Trip. You're great!