Metadata-Version: 2.3
Name: newspaperV3
Version: 0.3.0
Summary: Advanced news extraction, article parsing, and content analysis.
License: MIT
Keywords: newspaper,news,article,extraction,scraping,nlp,content,parsing
Author: Lucas Ou-Yang
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Requires-Dist: Pillow (>=8.1.2)
Requires-Dist: beautifulsoup4 (>=4.9.3)
Requires-Dist: cssselect (>=1.1.0)
Requires-Dist: feedfinder2 (>=0.0.4)
Requires-Dist: feedparser (>=6.0.2)
Requires-Dist: jieba3k (>=0.35.1)
Requires-Dist: lxml (>=4.6.2)
Requires-Dist: nltk (>=3.6.2)
Requires-Dist: python-dateutil (>=2.8.1)
Requires-Dist: requests (>=2.25.1)
Requires-Dist: tinysegmenter (>=0.3)
Requires-Dist: tldextract (>=3.1.0)
Project-URL: Homepage, https://github.com/salah55s/newspaperV3
Project-URL: Repository, https://github.com/salah55s/newspaperV3
Description-Content-Type: text/markdown

# newspaperV3

An advanced library for news extraction, article parsing, and content analysis. This is a fork/version based on the original `newspaper` library by Lucas Ou-Yang.

## Installation

Install the package using pip:

```bash
pip install newspaperV3
```

## Basic Usage

Here's a simple example of how to download and parse an article:

```python
from newspaperV3 import Article
import nltk

# NLTK data is required for the first run
# nltk.download('punkt')

url = 'https://www.cnn.com/2023/11/15/politics/us-china-meeting-biden-xi/index.html'

# Create an Article object
article = Article(url)

# Download and parse the article
article.download()
article.parse()

# Perform Natural Language Processing (NLP)
article.nlp()

# Print the results
print("Title:", article.title)
print("Authors:", article.authors)
print("Publish Date:", article.publish_date)
print("Top Image:", article.top_image)
print("\nSummary:")
print(article.summary)
print("\nKeywords:", article.keywords)
```

## Features

* **Article Extraction** : Automatically extract clean article text from web pages
* **Metadata Parsing** : Extract titles, authors, publication dates, and images
* **Natural Language Processing** : Generate summaries and extract keywords
* **Multi-language Support** : Process articles in various languages
* **Image Processing** : Extract and analyze article images
* **Content Analysis** : Advanced text processing and analysis capabilities

## Requirements

* Python 3.6+
* NLTK (for natural language processing)
* Additional dependencies installed automatically

## License

This project is licensed under the MIT License.

