Style-independent document labeling: design and performance evaluation

Computer Science – Performance

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

The Medical Article Records System or MARS has been developed at the U.S. National Library of Medicine (NLM) for automated data entry of bibliographical information from medical journals into MEDLINE, the premier bibliographic citation database at NLM. Currently, a rule-based algorithm (called ZoneCzar) is used for labeling important bibliographical fields (title, author, affiliation, and abstract) on medical journal article page images. While rules have been created for medical journals with regular layout types, new rules have to be manually created for any input journals with arbitrary or new layout types. Therefore, it is of interest to label any journal articles independent of their layout styles. In this paper, we first describe a system (called ZoneMatch) for automated generation of crucial geometric and non-geometric features of important bibliographical fields based on string-matching and clustering techniques. The rule based algorithm is then modified to use these features to perform style-independent labeling. We then describe a performance evaluation method for quantitatively evaluating our algorithm and characterizing its error distributions. Experimental results show that the labeling performance of the rule-based algorithm is significantly improved when the generated features are used.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Style-independent document labeling: design and performance evaluation does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Style-independent document labeling: design and performance evaluation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Style-independent document labeling: design and performance evaluation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-1731980

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.