Metadata-Version: 2.3
Name: get-the-nini
Version: 0.1.1
Summary: Ninisite Scraper: Fetches all pages of a Ninisite discussion and formats in org-mode, Markdown, or JSON
Author: Feraidoon
Author-email: feraidoonmehri@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: beautifulsoup4 (>=4.9.0,<5.0.0)
Requires-Dist: pypandoc (>=1.6.0,<2.0.0)
Requires-Dist: pytz (>=2021.1,<2022.0)
Requires-Dist: requests (>=2.25.0,<3.0.0)
Requires-Dist: tqdm (>=4.60.0,<5.0.0)
Description-Content-Type: text/plain

#+TITLE: get-the-nini: Ninisite Post Scraper

A command-line tool for scraping discussion threads from the Ninisite website. It can take a topic ID or a full URL and save the entire conversation into a single, well-structured file.

*   *Code*: [[file:get_the_nini/main.py]]

*   *Purpose*: This tool is designed to archive and analyze discussion threads from ninisite.com, converting them into portable and easy-to-read formats.

*   *Features*
    - Scrape entire discussion threads by Topic ID or URL.
    - Automatically handles pagination.
    - Outputs in multiple formats: **Org-mode**, **Markdown**, and **JSON**.
    - Extracts rich metadata including topic title, author, categories, views, and post dates.
    - Preserves the structure of posts, including replies and quoted content.
    - Streaming output for Org-mode, ideal for large topics or viewing progress live.
    - Progress bar during page fetching.

*   *Requirements*
    This project is written in Python 3. It requires the following libraries:
    - `requests`
    - `beautifulsoup4`
    - `pypandoc`
    - `tqdm`
    - `pytz`

    **Note**: `pypandoc` is a wrapper for **Pandoc**. You must have Pandoc installed and available in your system's PATH for HTML-to-Org/Markdown conversion to work.

*   *Usage*
    The script is run from the command line, providing a topic ID or a full URL.

**Syntax**
#+begin_src sh
python get_the_nini/main.py [OPTIONS] <TOPIC_ID_OR_URL>
#+end_src

**Examples**

1.  **Scrape by Topic ID (Default Org-mode output)**
    This command will scrape the discussion for topic ID `11473285` and save it to an automatically generated file named `ninisite_11473285.org`.
    #+begin_src sh
    python get_the_nini/main.py 11473285
    #+end_src

2.  **Scrape using a full URL**
    #+begin_src sh
    python get_the_nini/main.py "https://www.ninisite.com/discussion/topic/11473285/"
    #+end_src

3.  **Specify an output file and format (Markdown)**
    The format can be inferred from the file extension, or specified explicitly with `--format`.
    #+begin_src sh
    python get_the_nini/main.py 11473285 -o output.md
    #+end_src

4.  **Output as JSON to stdout**
    Use `-o -` to direct output to standard output, which can be redirected to a file.
    #+begin_src sh
    python get_the_nini/main.py 11473285 --format json -o - > ninisite_11473285.json
    #+end_src

*   *Output Formats & Examples*
    The scraper can produce output in three different formats. Below are links to examples generated from the same topic.

**Org-mode (.org)**
A highly structured and readable plain-text format, perfect for use in Emacs. This is the default format and supports streaming output directly to a file as pages are scraped.
-   *Example*: [[file:examples/ninisite_11473285.org]]

**Markdown (.md)**
A popular lightweight markup language for easy conversion to HTML and other formats.
-   *Example*: [[file:examples/ninisite_11473285.md]]

**JSON (.json)**
A structured data format that includes all metadata and post content, suitable for programmatic analysis or integration into other systems.
-   *Example*: [[file:examples/ninisite_11473285.json]]

