Dask study notes[1]

海边的水水

已于 2025-07-03 14:00:28 修改

阅读量906

点赞数 24

CC 4.0 BY-SA版权

分类专栏： JAX/Tensorflow 文章标签： python das

于 2025-07-03 13:59:45 首次发布

本文链接：https://blog.csdn.net/sakura_sea/article/details/149005433

JAX/Tensorflow 专栏收录该内容

27 篇文章

订阅专栏

文章目录

starting with das array
references

starting with das array

Dask call parts of NumPy ndarray API for handle array through some actions such as blocked algorithms,cutting up the large array into many small arrays.
the accuracy of that blocked algorithms work with each other depend on Dask graphs.
importing das package:
you can import special components in dask package to adapt for substantial demand.
for example,you can import Core functionality and so on as follows:

import dask  # Core functionality
import dask.array as da  # Dask arrays
import dask.dataframe as dd  # Dask DataFrames
import dask.bag as db  # Dask bags for unstructured data

if you have to do parallel computing, you should import dask.distributed

from dask.distributed import Client  # For distributed computing

import dask_ml is a sensible choice when you do something about machine learning.

import dask_ml  # Dask-ML integration (needs separate installation)

a simple practice demonstrates how to compute mean of datas as follows:

# Start a local cluster
from dask.distributed import Client
client = Client()  # Creates a local cluster

# Work with Dask collections
df = dd.read_csv('large_file.csv')  # Create Dask DataFrame
result = df.groupby('column').mean().compute()  # Compute result

Dask has several optional components that need to be installed separately.

4.to generate a new array depend on dask.array.

import numpy as np
import dask.array as da

the numbers from 0 to 49999 was put into array which size is 250*250 then the array was splited into 5*4 chunks.
250/50=5 200/50=4

 import numpy as np
import dask.array as da
data = np.arange(50000).reshape(250, 200)
a = da.from_array(data, chunks=(50, 50))
print(a.chunks)

((50, 50, 50, 50, 50), (50, 50, 50, 50))

print(a.blocks[2,2])

dask.array<blocks, shape=(50, 50), dtype=int64, chunksize=(50, 50), chunktype=numpy.ndarray>

references

https://docs.dask.org/