🤖

Open Information Extraction and Open Relation Extraction Papers

 
notion image

Table of Contents

  1. General
  1. Literature Reviews
  1. Papers - Neural Networks
  1. Papers - Parse-based and statistical
  1. Papers - Older papers and legacy systems
  1. Training and Testing Data

General

This README containts OpenIE and ORE papers and resources. Summaries are by @jbecke and @TheodoreChristakis, to the best of our abilities after reading each paper or testing the system (when available). We welcome pull requests with additional resources, papers, or data.

Literature Reviews

    • A Survey on Open Information Extraction. Most up-to-date literature review (June 2018), convering non-neural network based approaches to OpenIE. Whereas I've classified by age in this document, the authors classify by method of extraction (learning-based, rule-based, clause-based, inter-propositional).

Papers - Neural Networks

    • Neural Open Information Extraction: AFAIK, the first use of ANNs (seq2seq with attention) applied to OpenIE. Author bootstrapped tuples from high-confidence OpenIE-4 and makes the data available. However, the data isn't very clean; a quick glance shows a lot of malformed/incorrect tuples.
    • Supervised Open Information Extraction: expands on the idea of turning QA datasets into OpenIE datasets. Trains an ANN with using an interesting feature representation, uses seq2seq model to generate BIO tags and then creates tuples from that using a deterministic algorithm.

Papers - Parse-based and statistical

  • Graphene generates n-ary extractions with semantically linking-labels like "TEMPORAL", "CAUSE", etc. as well as open relations
  • Stanford Open IE: produces maximally-shortened tuples. It seems to often produce tuples for which the reported confidience is often 1.0. GPL or proprietary available as part of Stanford Core NLP.
  • OpenIE-X (v4, v5, allen institute version). Works well with simple statements (see examples in this dataset). Outputs context for extractions and gives good confidence predictions that can be used to balance precision-recall. Note the restrictive license (research purposes only).
  • Open Relation Extraction and Grounding: Extracts argument pairs of relation tuples and forms weighted dependency trees between two arguments. It shows promising results in determining relative importance of each argument in the tree.
  • Unsupervised Open Relation Extraction: Used for unsupervised relation extraction from free text by using pretrained word embeddings while using a sentence's dependency parse tree as a foundation.

Papers - Older papers and legacy systems

  • From University of Washington
    • TextRunner - One of the earliest papers addressing open information extraction
    • Reverb - Improved the extraction to better form the tuple of (argument, relation, argument)
    • OLLIE - Addressed the issue of misleading propositions and non-verb mediated relations
  • CSD-IE - Generation of nested contractions which is especially effective in sentences using subordinating clauses
  • PropS: Syntax Based Proposition Extraction
  • ClausIE - Formed a strong relation between grammatical clauses, propositions, and OIE extractions by defining seven grammatical patterns
  • ReNoun - Used predominantly for noun-mediated relations.

Training and Testing Data

  • 35M sentence-tuple pairs: from the paper Neural Open Information Extraction. It was generated by OpenIE-4, removing any tuples less then 0.9 confidence. Because there is no sample data, I've copied a bit below. As you can see, the data is somewhat noisy. It might be useful for extra training data, but not as a gold dataset.
* moving and handling '' ' - a comprehensive course that covers safe handling and transport of casualties .
<arg1> '' ' - a comprehensive course </arg1> <rel> covers </rel> <arg2> safe handling and transport of casualties </arg2>

this word , adjectival magavan meaning `` possessing maga - '' , was once the premise that avestan maga - and median magu - were co-eval .
<arg1> - '' , was once the premise that avestan maga - and median magu - </arg1> <rel> were </rel> <arg2> co-eval </arg2>

melora walters as candy ' - a hooker who works for the motel where john person is staying , as a complimentary service to the guests .
<arg1> ' - a hooker </arg1> <rel> works </rel> <arg2> for the motel </arg2>

- - a hunter who uses bows and arrows instead of guns .
<arg1> - - a hunter </arg1> <rel> uses </rel> <arg2> bows and arrows instead of guns </arg2>
  • TupleInf Open IE Dataset: OpenIE-4 extractions of 8th grade and 4th grade questions. By inspection, these tend to be cleaner than the above dataset because of the simplicity of the language. Confidence-values are retained so you can make your own tradeoff between precision and recall. Note suitable for a gold dataset.
01 April 1969 The ATM would be a manned solar observatory making measurements of the Sun by telescopes and instruments above
0.96 (The ATM; would be; a manned solar observatory making measurements of the Sun by telescopes and instruments)
0.93 (a manned solar observatory; making; measurements of the Sun)

01 April 1969 The ATM would be a manned solar observatory making measurements of the Sun by telescopes and instruments above the Earth's atmosphere.
0.96 (The ATM; would be; a manned solar observatory making measurements of the Sun by telescopes and instruments above the Earth's atmosphere)
0.93 (a manned solar observatory; making; measurements of the Sun)

01 - Compare the physical properties of ice, liquid, water, and vapor.

01 Earthly Seasons PURPOSE: To show that the seasons are the consequence of the tilt of earth.

0.1% water can lower the melting temperature of peridotite by 100 C.
0.91 (0.1% water; can lower; the melting temperature of peridotite)

( 020 ) Celsius &#176;C The international temperature scale where water freezes at 0 (degrees) and boils at 100 (degrees).
0.89 (water; freezes; at 0 (degrees)
  • Squadie (not yet published, expect changes): this is our dataset derived from Squad. It uses a similar JSON format to SQuAD and contains 50,000 tuples. This tuple can then be matched with the corresponding sentence in the training corpus. Not suitable as a gold corpus. Squadie is useful for extracting implied relations. We have also converted Maluuba NewsQA.
 {
 "question": "Which film did Beyoncé star in 2001 with Mekhi Phifer?",
 "id": "56d4831f2ccc5a1400d83155",
 "answer": "Carmen: A Hip Hopera",
 "tuple": "<Which film\tdid Beyoncé star with Mekhi Phifer\tCarmen: A Hip Hopera>"
 },
 {
 "question": "What was the name of Destiny Child's third album?",
 "id": "56d4831f2ccc5a1400d83156",
 "answer": "Survivor",
 "tuple": "<Survivor\tthe name of\tDestiny Child 's third album>"
 },
 {
 "question": "Who filed a lawsuit over Survivor?",
 "id": "56d4831f2ccc5a1400d83157",
 "answer": "Luckett and Roberson",
 "tuple": "<Luckett and Roberson\tfiled a lawsuit over\tSurvivor>"
 },
 {
 "question": "When did Destiny's Child announce their hiatus?",
 "id": "56d4831f2ccc5a1400d83158",
 "answer": "October 2001",
 "tuple": "<Destiny 's Child\tannounce their hiatus\tOctober 2001>"
 }