Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Research and Reviews

Scientific paper

Spell-checking is the process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Basically, the larger the dictionary of a spell-checker is, the higher is the error detection rate; otherwise, misspellings would pass undetected. Unfortunately, traditional dictionaries suffer from out-of-vocabulary and data sparseness problems as they do not encompass large vocabulary of words indispensable to cover proper names, domain-specific terms, technical jargons, special acronyms, and terminologies. As a result, spell-checkers will incur low error detection and correction rate and will fail to flag all errors in the text. This paper proposes a new parallel shared-memory spell-checking algorithm that uses rich real-world word statistics from Yahoo! N-Grams Dataset to correct non-word and real-word errors in computer text. Essentially, the proposed algorithm can be divided into three sub-algorithms that run in a parallel fashion: The error detection algorithm that detects misspellings, the candidates generation algorithm that generates correction suggestions, and the error correction algorithm that performs contextual error correction. Experiments conducted on a set of text articles containing misspellings, showed a remarkable spelling error correction rate that resulted in a radical reduction of both non-word and real-word errors in electronic text. In a further study, the proposed algorithm is to be optimized for message-passing systems so as to become more flexible and less costly to scale over distributed machines.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-552402

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.