starting with das array
- Dask call parts of NumPy ndarray API for handle array through some actions such as blocked algorithms,cutting up the large array into many small arrays.
- the accuracy of that blocked algorithms work with each other depend on Dask graphs.
- importing das package:
you can import special components in dask package to adapt for substantial demand.
for example,you can import Core functionality and so on as follows:
import dask # Core functionality
import dask.array as da # Dask arrays
import dask.dataframe as dd # Dask DataFrames
import dask.bag as db # Dask bags for unstructured data
if you have to do parallel computing, you should import dask.distributed
from dask.distributed import Client # For distributed computing
import dask_ml
is a sensible choice when you do something about machine learning.
import dask_ml # Dask-ML integration (needs separate installation)
a simple practice demonstrates how to compute mean of datas as follows:
# Start a local cluster
from dask.distributed import Client
client = Client() # Creates a local cluster
# Work with Dask collections
df = dd.read_csv('large_file.csv') # Create Dask DataFrame
result = df.groupby('column').mean().compute() # Compute result
Dask has several optional components that need to be installed separately.
4.to generate a new array depend on dask.array.
import numpy as np
import dask.array as da
the numbers from 0 to 49999 was put into array which size is 250*250
then the array was splited into 5*4
chunks.
250/50=5
200/50=4
import numpy as np
import dask.array as da
data = np.arange(50000).reshape(250, 200)
a = da.from_array(data, chunks=(50, 50))
print(a.chunks)
((50, 50, 50, 50, 50), (50, 50, 50, 50))
print(a.blocks[2,2])
dask.array<blocks, shape=(50, 50), dtype=int64, chunksize=(50, 50), chunktype=numpy.ndarray>