Metadata-Version: 2.2
Name: textToKnowledgeGraph
Version: 0.1.2
Summary: A Python package to generate BEL statements and CX2 networks.
Home-page: https://github.com/ndexbio/llm-text-to-knowledge-graph
Author: Favour James
Author-email: favour.ujames196@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain==0.3.13
Requires-Dist: langchain_core==0.3.27
Requires-Dist: langchain_openai==0.2.13
Requires-Dist: lxml==5.2.1
Requires-Dist: ndex2<4.0.0,>=3.8.0
Requires-Dist: pandas
Requires-Dist: pydantic==2.10.4
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: Requests==2.32.3
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# textToKnowledgeGraph

A Python package to generate BEL statements and CX2 networks.

## Table of Contents

- [License](#license)
- [Project Description](#project-description)
- [Glossary](#glossary)
- [Installation](#installation)
- [Methodology](#methodology)
  - [BEL Generation](#bel-generation)
  - [CX2 Network Generation](#cx2-network-generation)
  - [Uploading to NDEx](#uploading-to-ndex)
- [Examples](#examples)

## License

## Project Description

`textToKnowledgeGraph` is a Python package that converts natural language scientific text into structured knowledge graphs using the capabilities of advanced language models (LLMs). It can be used for:

- Generating BEL statements.
- Extracting entities and interactions from scientific text.
- Uploading the generated CX2 networks to NDEx.

## Glossary

These discusses terms that would be used in this documentation:

- BEL (Biological Expression Language): BEL is a structured language used to represent scientific findings, especially in the biomedical domain, in a computable format. Learn More: [BEL Documentation](https://language.bel.bio/)
- CX2 (Cytoscape Exchange Format 2): CX2 is a JSON-based format used for storing and exchanging network data in Cytoscape. Learn More: [CX2 Specification](http://manual.cytoscape.org/en/stable/Supported_Network_File_Formats.html#cx2)
- PMCID (PubMed Central Identifier): A unique identifier for articles archived in PubMed Central (PMC), a free digital repository of biomedical and life sciences journal literature. Learn More: [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc/)
- NDEx (Network Data Exchange): NDEx is an online resource that facilitates the sharing, storage, and visualization of biological networks. Learn More: [NDEx](https://www.ndexbio.org)
- LangChain: LangChain is a framework for developing applications powered by language models. It allows easy integration of language models with data sources and APIs, enabling workflows like knowledge extraction and retrieval. 
Learn More: [LangChain](https://python.langchain.com/docs/introduction/)
- Cytoscape: Cytoscape is an open-source platform for visualizing and analyzing complex networks, including biological pathways, protein interaction networks, and more. Learn More: [Cytoscape](https://cytoscape.org)
- Knowledge Graph: A knowledge graph is a structured representation of knowledge in a graph format, where entities are nodes and relationships are edges. It enables intuitive querying, reasoning, and visualization of complex biological data, aiding in understanding biological systems and facilitating discoveries.

## Installation

Install the package via pip:

```bash
pip install textToKnowledgeGraph
```

## Methodology

- ## BEL Generation:

  - The `process_paper` function in [`textToKnowledgeGraph.main`](textToKnowledgeGraph/main.py) processes scientific papers to extract biological interactions and generate BEL statements.
  - The `llm_bel_processing` function in [`textToKnowledgeGraph.sentence_level_extraction`](textToKnowledgeGraph/sentence_level_extraction.py) handles sentence-level extraction of BEL statements using openai model.

- **CX2 Network Generation**:
  - The `convert_to_cx2` function in [`textToKnowledgeGraph.convert_to_cx2`](textToKnowledgeGraph/convert_to_cx2.py) converts extracted interactions into CX2 network format for visualization in Cytoscape.

- **Prompt Handling**:
  - The `get_prompt` function in [`textToKnowledgeGraph.get_interactions`](textToKnowledgeGraph/get_interactions.py) reads and processes prompt files to generate prompts for language models.

- **Chain Initialization**:
  - The `initialize_chains` function in [`textToKnowledgeGraph.get_interactions`](textToKnowledgeGraph/get_interactions.py) initializes extraction chains using the provided API key for interaction extraction.

- **Network Uploading**:
  - The `save_new_cx2_network` function in [`textToKnowledgeGraph.main`](textToKnowledgeGraph/main.py) uploads the generated CX2 networks to NDEx for sharing and visualization.

- **Model Workflow**:
  - The model processes scientific papers to extract biological interactions.
  - It uses language models to perform sentence-level extraction of BEL statements.
  - Extracted interactions are converted into CX2 network format.
  - Prompts are generated and processed to guide the extraction process.
  - Extraction chains are initialized using an API key.
  - Generated networks are uploaded to NDEx for visualization and sharing.

## Usage

To install python package:

```bash
pip install textToKnowledgeGraph
```

**Required parameters**:

- **pmc_id**: can only process one at a time

- **api_key**: open_ai api key

**Optional parameters**:

- **ndex_email**: The NDEx email for authentication. ndex_password: The NDEx password for authentication.

**Expected output**:

- **BEL statements**: extracted from the paper
- **CX2 network**: generated from the extracted BEL statements


