Metadata-Version: 2.1
Name: pycode2seq
Version: 0.0.1
Summary: Inference and training for multiple languages of code2seq
Home-page: https://github.com/kisate/pycode2seq
Author: Dmitrii Kharlapenko
Author-email: dimkakha@gmail.com
License: MIT
Download-URL: https://pypi.org/project/pycode2seq/
Keywords: code2seq,pytorch,pytorch-lightning,ml4code,ml4se
Platform: UNKNOWN
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch (>=1.9.0)
Requires-Dist: torchtext (>=0.10.0)
Requires-Dist: pytorch-lightning (>=1.3.5)
Requires-Dist: code2seq (==0.0.2)
Requires-Dist: antlr4-python3-runtime (==4.8)
Requires-Dist: setuptools (>=52.0.0)
Requires-Dist: tqdm (==4.58.0)
Requires-Dist: numpy (>=1.20.1)
Requires-Dist: regex (>=2019.11.1)

# pycode2seq

Training and inference with multiple languages of PyTorch's implementation of code2seq model.

## Installation

```shell
python setup.py install
```

## Inference

Minimal code example:

```python
import sys
from pycode2seq import DefaultModelRunner

def main(argv):
    runner = DefaultModelRunner(
        save_path = "./tmp",
    )

    #List of embeddings for each method
    method_embeddings = runner.run_embeddings_on_file(argv[1], "kt") 

    #Code2seq predictions
    predictions = runner.run_on_file(argv[1], "kt")

    #Predicted method names
    names = [runner.prediction_to_text(prediction) for prediction in predictions]

if __name__ == "__main__":
    main(sys.argv)
```

## Training

Download astminer and run:

```shell
./gradelw shadowJar
```

Mine projects for paths:

```shell
python training/mine_projects.py <data folder> <output folder> <path to astminer's cli.sh>
```

Combine mined paths:

```shell
python training/astminer_to_code2seq.py <data folder/holdout> <output folder> <holdout>
```

Build vocabulary with build_vocabulary.py from code2seq module

Combine vocabularies:

```shell
python training/combine_vocabularies.py
```

Expand weights:

```shell
python training/expand_weights.py
```


