Content deleted Content added
m →References: clean up, replace/remove deprecated cs1|2 parameters; using AWB |
CE |
||
(20 intermediate revisions by 14 users not shown) | |||
Line 1:
{{Short description|Machine learning algorithm}}
'''Coupled Pattern Learner''' (CPL) is a [[machine learning]] algorithm which couples the [[semi-supervised learning]] of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods.▼
▲Coupled Pattern Learner (CPL) is a [[machine learning]] algorithm which couples the [[semi-supervised learning]] of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods.
== Coupled Pattern Learner ==
[[Semi-supervised learning]] approaches using a small number of labeled examples with many unlabeled examples are usually unreliable as they produce an internally consistent, but incorrect set of extractions. CPL solves this problem by simultaneously learning
== CPL
CPL is an approach to [[semi-supervised learning]] that yields more accurate results by coupling the training of many information extractors. Basic idea behind CPL is that semi-supervised training of a single type of extractor such as ‘coach’ is much more
== CPL
=== Coupling of
CPL primarily relies on the notion of coupling the [[learning]] of multiple functions so as to constrain the semi-supervised learning problem. CPL constrains the learned function in two ways.
# Sharing among same-arity predicates according to logical relations
Line 19 ⟶ 18:
=== Relation argument type-checking ===
This is a type checking information used to couple the learning of relations and categories. For example, the arguments of the ‘ceoOf’ relation are declared to be of the categories ‘person’ and ‘company’. CPL does not promote a pair of noun phrases as an instance of a relation unless the two noun phrases are
=== Algorithm
Following is a quick summary of the CPL algorithm.<ref name=cpl2010 />
Line 27 ⟶ 26:
Output: Trusted instances/patterns for each predicate
'''for''' i=1,2,...,∞ '''do'''
'''foreach''' predicate p in O '''do'''
EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances;
FILTER candidates that violate coupling;
RANK candidate instances/patterns;
PROMOTE top candidates;
'''end'''
'''end'''
==== Inputs ====
A large [[Text corpus|corpus]] of Part-Of-Speech tagged sentences and an initial ontology with
==== Candidate extraction ====
CPL
* Category Instances
* Category Patterns
Line 45 ⟶ 44:
* Relation Patterns
==== Candidate
Candidate instances and patterns are
==== Candidate
CPL ranks candidate instances using the number of promoted patterns that they co-occur with so that candidates that occur with more patterns are ranked higher. Patterns are ranked using an estimate of the precision of each pattern.
==== Candidate
CPL ranks the candidates according to their assessment scores and promotes at most 100 instances and 5 patterns for each predicate. Instances and patterns are only promoted if they co-occur with at least two promoted patterns or instances, respectively.
== Meta-Bootstrap Learner ==
Meta-Bootstrap Learner (MBL) was also proposed by the authors of CPL
'''Input''': An ontology O, a set of extractors ε
'''Output''': Trusted instances for each predicate
'''for''' i=1,2,...,∞ '''do'''
'''foreach''' predicate p in O '''do'''
'''foreach''' extractor e in ε '''do'''
Extract new candidates for p using e with recently promoted instances;
'''end'''
FILTER candidates that violate mutual-exclusion or type-checking constraints;
PROMOTE candidates that were extracted by all extractors;
'''end'''
'''end'''
Line 72 ⟶ 71:
== Applications ==
In their paper <ref name=cbl2009 /> authors have presented results showing the potential of CPL to contribute new facts to existing repository of semantic knowledge, Freebase <ref>{{cite journal|year=2009 |title=Freebase data dumps |publisher=Metaweb Technologies |url=http://download.freebase.com/datadumps/ |url-status=dead |archiveurl=https://web.archive.org/web/20111206102101/http://download.freebase.com/datadumps/ |archivedate=December 6, 2011 }}</ref>
== See also ==
Line 83 ⟶ 82:
==References==
* {{cite journal|last=Liu|first=Qiuhua |author2=Xuejun Liao |author3=Lawrence Carin |year=2008|title=Semi-supervised multitask learning|journal=NIPS}}
* {{cite journal|last=Shinyama|first=Yusuke|author2=Satoshi Sekine|year=2006|title=Preemptive information extraction using unrestricted relation discovery|journal=HLT-
* {{cite journal|last=Chang|first=Ming-Wei|author2=Lev-Arie Ratinov |author3=Dan Roth |year=2007|title=Guiding semi-supervision with constraint driven learning|journal=ACL}}
* {{cite journal|last=Banko|first=Michele|author2=Michael J. Cafarella |author3=Stephen Soderland |author4=Matt Broadhead |author5=
* {{cite
* {{cite journal|last=Riloff|first=Ellen|author2=Rosie Jones|year=1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|journal=AAAI}}
* {{cite journal|last=Rosenfeld|first=Benjamin|author2=Ronen Feldman|year=2007|title=Using corpus statistics on entities to improve semi-supervised relation extraction from the web|journal=ACL}}
|