Valence extraction using EM selection and co-occurrence matrices

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

24 pages, 3 tables

Scientific paper

10.1007/s10579-009-9100-5

This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs unsupervised disambiguation of valence frame forests, obtained by applying a non-probabilistic deep grammar parser and some post-processing to the text. The second new idea concerns filtering of incorrect frames detected in the parsed text and is motivated by an observation that verbs which take similar arguments tend to have similar frames. This phenomenon is described in terms of newly introduced co-occurrence matrices. Using co-occurrence matrices, we split filtering into two steps. The list of valid arguments is first determined for each verb, whereas the pattern according to which the arguments are combined into frames is computed in the following stage. Our best extracted dictionary reaches an $F$-score of 45%, compared to an $F$-score of 39% for the standard frame-based BHT filtering.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Valence extraction using EM selection and co-occurrence matrices does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Valence extraction using EM selection and co-occurrence matrices, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Valence extraction using EM selection and co-occurrence matrices will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-249432

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.