Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

9 pages, 7 figures -- this is a significant revision

Scientific paper

We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with $N$ independent words as vertices or nodes, and edges or links allotted to words occurring within $m$ places of a given vertex; we call $m$ the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance $m$. The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution $P(k)$ and clustering coefficient $C(k)$; we also studied the mean geodesic distance, and found all our texts were small-world networks. We observed a natural break-point at $k=\sqrt{N}$ where the power law in the degree distribution changed, leading to separate power law fit for the bulk and the tail of $P(k)$. Our linear discriminant analysis yielded a $73.8 \pm 5.15%$ accuracy for the correct classification of novels and $69.1 \pm 1.22%$ for news stories. We found an optimal word distance of $m=4$ and a minimum text length of 100 to 200 words $N$.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-51880

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.