Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data

Computer Science – Databases

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data

: 2010-04-14
: arxiv.org/abs/1004.2372v1
: Computer Science
: Databases

: Scientific paper
: Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of deterministic regular expressions in which each alphabet symbol occurs at most k times, for some small k. We refer to such expressions as k-occurrence regular expressions (k-OREs for short). Motivated by this observation, we provide a probabilistic algorithm that learns k-OREs for increasing values of k, and selects the deterministic one that best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.

Affiliated with

Bex Geert Jan

Computer Science – Databases

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Gelade Wouter

Computer Science – Databases

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Neven Frank

Computer Science – Databases

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Vansummeren Stijn

Computer Science – Programming Languages

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-525916

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure