A Support Tool for Tagset Mapping

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

EACL-Sigdat 95, contains 4 ps figures (minor graphic changes)

Scientific paper

Many different tagsets are used in existing corpora; these tagsets vary according to the objectives of specific projects (which may be as far apart as robust parsing vs. spelling correction). In many situations, however, one would like to have uniform access to the linguistic information encoded in corpus annotations without having to know the classification schemes in detail. This paper describes a tool which maps unstructured morphosyntactic tags to a constraint-based, typed, configurable specification language, a ``standard tagset''. The mapping relies on a manually written set of mapping rules, which is automatically checked for consistency. In certain cases, unsharp mappings are unavoidable, and noise, i.e. groups of word forms {\sl not} conforming to the specification, will appear in the output of the mapping. The system automatically detects such noise and informs the user about it. The tool has been tested with rules for the UPenn tagset \cite{up} and the SUSANNE tagset \cite{garside}, in the framework of the EAGLES\footnote{LRE project EAGLES, cf. \cite{eagles}.} validation phase for standardised tagsets for European languages.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

A Support Tool for Tagset Mapping does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with A Support Tool for Tagset Mapping, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Support Tool for Tagset Mapping will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-546077

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.