Metadata-Version: 2.1
Name: CnbcNews
Version: 5.7
Description-Content-Type: text/markdown
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: pandas

# **CnbcNews** #

[![PyPI version](https://img.shields.io/pypi/v/CnbcNews.svg)](https://pypi.org/project/CnbcNews/)

CnbcNews is an open-source, easy-to-use news crawler that extracts structured information from the CNBC news website for machine learning purposes. It can recursively follow internal hyperlinks and read RSS feeds to fetch the most recent articles in any given field. You only need to provide the desired field ('technology', 'politics', 'business', 'markets', 'investing') of the news website to crawl it completely.

## Extracted information
CnbcNews extracts the following attributes from Cnbc news articles.
* article headline
* article body (main text)
* article's author name
* publication date
* label

## Features
* **works out of the box**: install with pip, add the desired field of your articles, run :-)
* run CnbcNews conveniently using its [**CLI**](#run-the-crawler-via-the-cli) mode

### Modes and use cases
CnbcNews supports two main use cases, which are explained in more detail in the following.

#### CLI mode
* stores extracted results in csv files in your own storage
* simple but extensive configuration (if you want to tweak the results)
* revisions: crawl articles multiple times and track changes

#### Library mode
* crawl and extract information given a list of article URLs
* to use CnbcNews within your own Python code

## Getting started
It's super easy.

### Installation
```
$ pip3 install CnbcNews
```

### Use within your own code (as a library)
You can access the core functionality of CnbcNews, i.e. extraction of semi-structured information from one or more news articles, in your own code by using CnbcNews in library mode.

```python
from CnbcNews import getArticles

getArticles(field="investing", number=50, dropna=True)
```

If you want to crawl multiple fields at a time, optionally with a timeout in seconds and number of articles for each field
```python
CnbcNews.from_fields([field1, field2, ...], number=10, timeout=6)
```

### Run the crawler (via the CLI)

```
$ CnbcNews-getArticles field [number] [dropna]
```

CnbcNews will then start crawling a few articles and The results are stored by default in CSV file.

## License

Copyright 2023-2024 Ahmed Bendrioua
