You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hub/datasets-dask.md
+26
Original file line number
Diff line number
Diff line change
@@ -123,3 +123,29 @@ This is useful when you want to manipulate a subset of the columns or for analyt
123
123
# for the filtering and computation and skip the other columns.
124
124
df.token_count.mean().compute()
125
125
```
126
+
127
+
## Client
128
+
129
+
Most features in `dask` are optimized for a cluster or a local `Client` to launch the parallel computations:
130
+
131
+
```python
132
+
import dask.dataframe as dd
133
+
from distributed import Client
134
+
135
+
if__name__=="__main__": # needed for creating new processes
136
+
client = Client()
137
+
df = dd.read_parquet(...)
138
+
...
139
+
```
140
+
141
+
For local usage, the `Client` uses a Dask `LocalCluster` with multiprocessing by default. You can manually configure the multiprocessing of `LocalCluster` with
Note that if you use the default threaded scheduler locally without `Client`, a DataFrame can become slower after certain operations (more details [here](https://github.com/dask/dask-expr/issues/1181)).
150
+
151
+
Find more information on setting up a local or cloud cluster in the [Deploying Dask documentation](https://docs.dask.org/en/latest/deploying.html).
0 commit comments