Blog
Multi-label Annotation in Articles
With the constant growth of the scientific literature, automated processes to enable access to its contents are increasingly in demand. Several functional discourse annotation schemes have been proposed to facilitate information extraction and summarisation from scientific articles, the most well known being argumentative zoning. Core Scientific concepts (CoreSC) is a three layered fine-grained annotation scheme…
Automatic classification of paper types
Partridge is a system that enables intelligent search for academic papers by allowing users to query terms within sentences designating a particular core scientific concept (e.g. Hypothesis, Result, etc). The system also automatically classifies papers according to article types (e.g. Review, Case Study). Here, we focus on the latter aspect of the system. For each…
CoreSC automation and Extractive summaries
In the SAPIENT Automation project we have also evaluated the CoreSC scheme and the ART/CoreSC corpus by incorporating machine learning algorithms into SAPIENT and automating the generation of core scientific concepts. The result system, SAPIENTA, has been trained and tested on the ART corpus and has also been employed to annotate biology papers from Pubmed…
Comparing CoreSC and AZ-II
A key issue in the project has been to evaluate the CoreSC annotation scheme against other schemas in order to assess their relative effectiveness. We compared CoreSC with the AZ-II annotation scheme on 36 chemistry papers. The two schemes are complementary in that they take different views on what a scientific paper represents. AZ assumes…
SAPIENTA presentations at ACL 2010, BioNLP 2010 and NeSP 2010
Maria Liakata attended ACL 2010, the most significant conference in the field of Natural Language Processing, which was held in Sweden this year. There she had the opportunity to discuss her work on the SAPIENT Automation project with distinguished colleagues in the field. Joint work with our Cambridge colleague Dr Anna Korhonen was presented at…
LREC 2010, in Malta in May 2010
From May 17- May 21 Maria Liakata attended the international conference on language resources and evaluations (LREC) in Malta and a workshop on bio-annotations. She presented a paper with Simone Teufel from the University of Cambridge on the correlation between the CoreSC scheme and the AZ-II scheme. It was a great chance to discuss with…