Metadata-Version: 2.1
Name: pandas-parallel-apply
Version: 2.0
Summary: Wrapper for df and df[col].apply parallelized
Home-page: https://gitlab.com/mihaicristianpirvu/pandas-parallel-apply
License: WTFPL
Platform: UNKNOWN
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# pandas-parallel-apply

Parallel wrappers for `df.apply(fn)`, `df[col].apply(fn)`, `series.apply(fn)` and `df.groupby([cols]).apply(fn)` with tqdm included

## Installation

`pip install pandas-parallel-apply`

## Examples
See `examples/` for usage on some dummy dataframe and series.

## Usage

### Apply on each row of a dataframe

`df.apply(fn)` -> `DataFrameParallel(df, n_cores: int = None, pbar: bool = True).apply(fn)`

### Apply on a column of a dataframe and return the Series
`df[col].apply(fn, axis=1)` -> `DataFrameParallel(df, n_cores: int = None, pbar: bool = True)[col].apply(fn, axis=1)`

### Apply on a series
`series.apply(fn)` -> `SeriesParallel(series, n_cores: int = None, pbar: bool = True).apply(fn)`

### GroupBy apply
`df.groupby([cols]).apply(fn)` -> `DataFrameParallel(df, n_cores: int = None, pbar: bool = True).groupby([cols]).apply(fn)`

## Disclaimers

- This is an experimental repository. It may lead to unexpected behaviour.

- Not all the merging semantics of pandas are supported. Pandas has weird and complex methods of converting an apply return. For example, a series apply function may return a dataframe, a series, a dict, a list etc. All of these are converted in some specific way. Some cases may not be supported

- Groupby apply functions are **much** slower than their serial variant currently. Still experimenting with how to make it faster. It looks correct, just 10-100x slower for some small examples. May be better as dataframe get bigger.

- Using `n_cores=0` will call the underlying pandas code directly, so the interface is just a wrapper. Usinng `n_cores=1` will create a multiprocessing pool of just 1 core, so the code is parallel (thus not running on the main process), but may not yield much speed improvement, except for not blocking the main process. May be useful in some GUI apps

- We recommend only object oriented approach. You can use the internal `apply_on_df_parallel`, `apply_on_df_col_parallel`, `apply_on_series_parallel`, `apply_on_groupby_parallel`, but it usually adds unnecessary complexity to the code.

- You can ignore the `n_cores` argument to all the constructors. If not set, it will default to the environment variable `PANDAS_PARALLEL_APPLY_N_CORES`. If this is also not set, it defaults to 0 (serial apply).

That's all.


