CCP logo

Parentheses Classifier

 

The parenthesis classifier takes the contents of a set of parentheses and classifies it into one of several categories. Its construction was motivated by the fact that the full text of scientific journal articles contains a large amount of parenthesized text, some of which is useful for text mining, and by the fact that classifying this parenthesized text is more difficult than might initially be guessed. Classification is done by a series of context-free grammars and regular expressions. A single category is returned for each input. The package includes both a class for extracting parenthesized data that will handle embedded and unbalanced parentheses, and a class that instantiates the classifier itself.

Please cite this work if you use this tool:
K. Bretonnel Cohen, Thomas Christiansen, and Lawrence E. Hunter (2011) Parenthetically speaking: Classifying the contents of parentheses for text mining. AMIA 2011.

It is one of the projects of the BioNLP initiative by the Center for Computational Pharmacology at the University of Colorado Denver Health Sciences Center to create and distribute code, software, and data for applying natural language processing techniques to biomedical texts.