Metadata-Version: 2.1
Name: spuco
Version: 2.0.3
Summary: SpuCo: Spurious Correlations Datasets and Benchmarks
Home-page: https://github.com/BigML-CS-UCLA/SpuCo
Author: Siddharth Joshi
Author-email: Siddharth Joshi <sjoshi804@cs.ucla.edu>
Project-URL: Source, https://github.com/BigML-CS-UCLA/SpuCo
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10.0
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: matplotlib>=3.7.1
Requires-Dist: numpy>=1.23.5
Requires-Dist: setuptools>=61.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision>=0.15.1
Requires-Dist: tqdm>=4.65.0
Requires-Dist: scikit-learn>=0.20.0
Requires-Dist: wilds>=2.0.0
Requires-Dist: transformers>=3.5.0
Requires-Dist: umap-learn>=0.5.5
Requires-Dist: grad-cam>=1.5.0
Provides-Extra: dev
Requires-Dist: black>=23.1.0; extra == "dev"
Requires-Dist: flake8>=3.9.2; extra == "dev"
Requires-Dist: pytest>=7.2.1; extra == "dev"
Requires-Dist: properscoring; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: docs
Requires-Dist: ipython>=7.34.0; extra == "docs"
Requires-Dist: nbsphinx>=0.8.12; extra == "docs"
Requires-Dist: sphinx>=5.3.0; extra == "docs"
Requires-Dist: sphinx_rtd_theme>=1.1.1; extra == "docs"

# SpuCo (Spurious Correlations Datasets and Benchmarks)

[![Documentation Status](https://readthedocs.org/projects/spuco/badge/?version=latest)](https://spuco.readthedocs.io/en/latest/?badge=latest)

SpuCo is a Python package developed to further research to address spurious correlations. Spurious correlations arise when machine learning models learn to exploit *easy* features that are not predictive of class membership but are correlated with a given class in the training data. This leads to catastrophically poor performance on the groups of data without such spurious features at test time.

![Diagram illustrating the spurious correlations problem](docs/source/intro_fig.png)

Link to Paper: https://arxiv.org/abs/2306.11957

The SpuCo package is designed to help researchers and practitioners evaluate the robustness of their machine learning algorithms against spurious correlations that may exist in real-world data. SpuCo provides:

- Modular implementations of current state-of-the-art (SOTA) methods to address spurious correlations
- SpuCoMNIST: a controllable synthetic dataset that explores real-world data properties such as spurious feature difficulty, label noise, and feature noise
- SpuCoAnimals: a large-scale vision dataset curated from ImageNet to explore real-world spurious correlations
- SpuCoSun: a large-scale vision dataset with created using backgrounds from SUN397 (class feature) and foregrounds (spurious feature) created using a text-to-image diffusion model corresponding to OpenImagesV7. Two versions of this dataset are provided: SpuCoSun Easy and SpuCoSun Hard with *easy* and *hard* spurious features, respectively. 

> Note: This project is under active development.

# Quickstart

Refer to quickstart for scripts and notebooks to get started with *SpuCo*

You can explore the data with the notebook: [Explore Data](quickstart/explore_data.ipynb)

You can find scripts / notebooks for training with SOTA methods in the folders under quickstart. These are organized by dataset name. 

## Installation

```python
pip install spuco
```

Requires >= Python 3.10

## Using with GuildAI

Creating gpu-affinitized queues 
```
for i in {0..7}; do guild run queue -b --gpus="$i" -y; done
```

## About Us

This package is maintained by [Siddharth Joshi](https://sjoshi804.github.io/) from the BigML group at UCLA, headed by [Professor Baharan Mirzasoleiman](http://web.cs.ucla.edu/~baharan/group.htm).


