Pseudo-Label Generation for Multi-Label Text Classification

Metadata Updated: April 11, 2025

With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the ___domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

ahkh11.pdfPDF
ahkh11.pdf

Download

Landing PageLanding Page

Visit page

Dates

Metadata Created Date	November 12, 2020
Metadata Updated Date	April 11, 2025
Data Update Frequency	irregular

Metadata Source

Data.json Data.json Metadata
Download Metadata

Harvested from NASA Data.json

Additional Metadata

Resource Type	Dataset
Metadata Created Date	November 12, 2020
Metadata Updated Date	April 11, 2025
Publisher	Dashlink
Maintainer	Nikunj Oza
Identifier	DASHLINK_679
Data First Published	2013-03-28
Data Last Modified	2025-03-31
Public Access Level	public
Data Update Frequency	irregular
Bureau Code	026:00
Metadata Context	https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version	https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby	https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id	77989f75-f95a-4f38-951b-683b57b7006b
Harvest Source Id	58f92550-7a01-4f00-b1b2-8dc953bd598f
Harvest Source Title	NASA Data.json
Homepage URL	https://c3.nasa.gov/dashlink/resources/679/
Program Code	026:029
Source Datajson Identifier	True
Source Hash	c6d3367686483c75644dce05aaf1ab9be032e9d4948d3700ace36b5033d22e06
Source Schema Version	1.1

Didn't find what you're looking for? Suggest a dataset here.

Data Catalog