0% found this document useful (0 votes)
11 views

task-log-6118d51c44d2f6d2ffffffffffffffffffffffff01000000

The document logs the successful connection to a MongoDB database and the initialization of an AWS S3 client using an IAM role. It details the processing of a PDF document using AWS Textract for OCR, including various configurations and the handling of errors related to provisioned throughput limits. The log captures multiple stages of data processing, including text extraction and chunking operations.

Uploaded by

tushar.kanhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

task-log-6118d51c44d2f6d2ffffffffffffffffffffffff01000000

The document logs the successful connection to a MongoDB database and the initialization of an AWS S3 client using an IAM role. It details the processing of a PDF document using AWS Textract for OCR, including various configurations and the handling of errors related to provisioned throughput limits. The log captures multiple stages of data processing, including text extraction and chunking operations.

Uploaded by

tushar.kanhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 59

2025-02-13 23:12:33.131 | DEBUG | src.database_utils.

database_utils:__init__:42
- MongoDB connection successful with URI:
mongodb+srv://sentineldbuser:[email protected]/
2025-02-13 23:12:33.133 | INFO |
src.database_utils.database_utils:create_mongo_index:128 - Index status_batch_id
already exists on recipeactions.
2025-02-13 23:12:33.135 | INFO |
src.database_utils.database_utils:create_mongo_index:128 - Index status already
exists on recipeactions.
2025-02-13 23:12:33.137 | INFO |
src.database_utils.database_utils:create_mongo_index:128 - Index _id_status already
exists on recipeactionruns.
2025-02-13 23:12:33.262 | INFO | src.aws.s3_helper:_initialize_s3_client:97 -
Using EC2 IAM Role for S3 operations in region us-east-1
2025-02-13 23:12:33.262 |INFO | ray:process_item_ray:909 | data_prepro: {'ocr':
{'enabled': True, 'method': 'textract', 'force_recreate': True, 'extract_images':
True, 'extract_tables': False, 'extract_layouts': False, 'extract_forms': False,
'credentials': {'type': 'aws', 'properties': {'aws_credential_type': 'arn_role',
'aws_region': 'us-east-1', 'aws_external_id': '679cd3091fed3f5d66e4aeef',
'aws_iam_role_arn': 'arn:aws:iam::120569633920:role/karini-legal-role'}}},
'pii_masking': {'enabled': False, 'entities': {}, 'force_recreate': True,
'credentials': {'type': 'aws', 'properties': {'aws_credential_type': 'arn_role',
'aws_region': 'us-east-1', 'aws_external_id': '679cd3091fed3f5d66e4aeef',
'aws_iam_role_arn': 'arn:aws:iam::120569633920:role/karini-legal-role'}}},
'chunking': {'type': 'recursive', 'tokenizer': 'cl100k_base', 'overlap': 50,
'size': 525, 'force_recreate': True}, 'preprocessing_setting':
{'custom_lambda_preprocessor': {}, 'custom_metadata_extraction': {}}}
2025-02-13 23:12:33.393 | INFO |
src.services.components.preprocessing.preprocessor:__init__:85 - Using Assumed role
arn:aws:iam::120569633920:role/karini-legal-role with external id
679cd3091fed3f5d66e4aeef for AWS Textract
2025-02-13 23:12:33.395 | INFO |
src.services.components.chunking.chunking:__init__:48 - Chunking event: {'dataset':
{'dataset_id': '67aeec0db172dd7f07d39470', 'dataset_type': 'text',
'dataset_sources': [{'id': 'FLW_1', 'recursive': True, 'connector_type': 'aws',
'dataset_connector_id': '67aeec17b172dd7f07d394cc', 'credentials': {'type': 'aws',
'properties': {'aws_credential_type': 'arn_role', 'aws_region': 'us-east-1',
'aws_external_id': '679cd3091fed3f5d66e4aeef', 'aws_iam_role_arn':
'arn:aws:iam::120569633920:role/karini-legal-role'}}, 'path': 's3://karini-legal-
v2-docs/sample/', 'filters': {'filter': []}, 'source_type': 's3'}],
'preprocessing': {'ocr': {'enabled': True, 'method': 'textract', 'force_recreate':
True, 'extract_images': True, 'extract_tables': False, 'extract_layouts': False,
'extract_forms': False, 'credentials': {'type': 'aws', 'properties':
{'aws_credential_type': 'arn_role', 'aws_region': 'us-east-1', 'aws_external_id':
'679cd3091fed3f5d66e4aeef', 'aws_iam_role_arn':
'arn:aws:iam::120569633920:role/karini-legal-role'}}}, 'pii_masking': {'enabled':
False, 'entities': {}, 'force_recreate': True, 'credentials': {'type': 'aws',
'properties': {'aws_credential_type': 'arn_role', 'aws_region': 'us-east-1',
'aws_external_id': '679cd3091fed3f5d66e4aeef', 'aws_iam_role_arn':
'arn:aws:iam::120569633920:role/karini-legal-role'}}}, 'chunking': {'type':
'recursive', 'tokenizer': 'cl100k_base', 'overlap': 50, 'size': 525,
'force_recreate': True}, 'preprocessing_setting': {'custom_lambda_preprocessor':
{}, 'custom_metadata_extraction': {}}}, 'force_preprocessing': False, 'embeddings':
{'credentials': {'type': 'aws', 'properties': {'aws_credential_type': 'arn_role',
'aws_region': 'us-east-1', 'aws_external_id': '679cd3091fed3f5d66e4aeef',
'aws_iam_role_arn': 'arn:aws:iam::120569633920:role/karini-legal-role'}},
'dimension': 1024, 'modelid': 'amazon.titan-embed-text-v2:0', 'modelprovider':
'amazon-bedrock', 'endpoint_id': '67a0892abb1320ca3b3a2c37', 'force_recreate':
True, 'parameters': {'modelprovider': 'amazon-bedrock', 'tokenizer': 'cl100k_base',
'dimension': 1024, 'max_tokens': 8000, 'pricing': {'input': {'tokens': 1000,
'currency': '$', 'value': 2e-05}}, 'credentials': {'type': 'aws', 'enabled': False,
'aws': {'credential_type': 'arn_role'}}, 'modelid': 'amazon.titan-embed-text-
v2:0'}}, 'use_local_s3_storage': False}}
2025-02-13 23:12:33.395 | INFO |
src.services.components.chunking.chunking:__init__:51 - Chunking properties:
{'type': 'recursive', 'tokenizer': 'cl100k_base', 'overlap': 50, 'size': 525,
'force_recreate': True}
2025-02-13 23:12:33.890 | INFO |
src.services.components.connectors:get_data_connector:60 - Initializing connector
for type: aws
2025-02-13 23:12:34.007 | INFO | src.aws.s3_helper:_initialize_s3_client:76 -
Assumed role arn:aws:iam::120569633920:role/karini-legal-role for region us-east-1
2025-02-13 23:12:34.008 |INFO | ray:process_item_ray:944 | sourceref:
s3://karini-legal-v2-docs/sample/BHC/2005/BHC_2005_Mr._Mangesh_Govind_Patane_vs_Mr.
_Nagesh_Vasant_Kadam___Others_2005_BHC-AS_6620.pdf
2025-02-13 23:12:36.054 | INFO |
src.database_utils.database_utils:update_dataset_items:650 - Matched documents:1
2025-02-13 23:12:36.054 | INFO |
src.database_utils.database_utils:update_dataset_items:651 - Modified documents:1
2025-02-13 23:12:36.054 | INFO |
src.services.pipelines.data_ingestion_ray:get_processed_data_standalone:640 - ---
Processing data using OCR options ---:textract
2025-02-13 23:12:36.054 | INFO |
src.services.components.preprocessing.preprocessor:process:445 - Using Textract for
OCR Processing (without table extraction)
2025-02-13 23:12:36.054 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_with_textract:179 -
Extracting text from pdf page images using Textract
2025-02-13 23:12:36.054 | INFO |
src.services.components.preprocessing.preprocessor:pdf_to_images:157 - Processing
PDF with 14 pages
2025-02-13 23:12:39.152 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:12:47.470 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:12:51.466 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:12:53.001 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:12:56.539 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:21.549 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:23.271 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:33.366 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:41.177 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_10, started
140067932726848)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_10, started
140067932726848)>
│ │ │ └ (<weakref at 0x7f6442b5a2a0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b62060>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b62060>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b62060>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b328e0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b62060>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:41.195 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 5 retries left. Waiting
0.5 seconds...
2025-02-13 23:13:41.729 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_10, started
140067932726848)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_10, started
140067932726848)>
│ │ │ └ (<weakref at 0x7f6442b5a2a0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_10, started 140067932726848)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b62060>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b62060>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b62060>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b328e0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b62060>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:41.747 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 4 retries left. Waiting
1.0 seconds...
2025-02-13 23:13:42.167 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>
│ │ │ └ (<weakref at 0x7f6442aefce0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b331a0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>
> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in
extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:42.185 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 5 retries left. Waiting
0.5 seconds...
2025-02-13 23:13:42.729 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>
│ │ │ └ (<weakref at 0x7f6442aefce0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b331a0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:42.746 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 4 retries left. Waiting
1.0 seconds...
2025-02-13 23:13:42.890 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>
│ │ │ └ (<weakref at 0x7f6442b5ac50; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b33420>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:42.897 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 5 retries left. Waiting
0.5 seconds...
2025-02-13 23:13:43.239 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_2, started
140068563396160)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_2, started
140068563396160)>
│ │ │ └ (<weakref at 0x7f6442b3d260; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272c0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272c0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272c0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30e00>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272c0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:43.260 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 5 retries left. Waiting
0.5 seconds...
2025-02-13 23:13:43.432 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>
│ │ │ └ (<weakref at 0x7f6442b5ac50; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b33420>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:43.438 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 4 retries left. Waiting
1.0 seconds...
2025-02-13 23:13:43.793 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>
│ │ │ └ (<weakref at 0x7f6442aefce0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b331a0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:43.809 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 3 retries left. Waiting
2.0 seconds...
2025-02-13 23:13:43.810 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_2, started
140068563396160)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_2, started
140068563396160)>
│ │ │ └ (<weakref at 0x7f6442b3d260; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_2, started 140068563396160)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272c0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272c0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272c0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30e00>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272c0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:43.829 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 4 retries left. Waiting
1.0 seconds...
2025-02-13 23:13:44.469 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>
│ │ │ └ (<weakref at 0x7f6442b5ac50; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b33420>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:44.477 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 3 retries left. Waiting
2.0 seconds...
2025-02-13 23:13:44.732 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:45.855 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>
│ │ │ └ (<weakref at 0x7f6442aefce0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b331a0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:45.871 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 2 retries left. Waiting
4.0 seconds...
2025-02-13 23:13:46.504 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_11, started
140067922236992)>
│ │ │ └ (<weakref at 0x7f6442b5ac50; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_11, started 140067922236992)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b42600>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b33420>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b42600>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:46.508 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 2 retries left. Waiting
4.0 seconds...
2025-02-13 23:13:46.980 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>
│ │ │ └ (<weakref at 0x7f6442b3dad0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30fe0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:47.009 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 5 retries left. Waiting
0.5 seconds...
2025-02-13 23:13:47.208 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:47.530 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>
│ │ │ └ (<weakref at 0x7f6442b3dad0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30fe0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:47.549 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 4 retries left. Waiting
1.0 seconds...
2025-02-13 23:13:48.594 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>
│ │ │ └ (<weakref at 0x7f6442b3dad0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30fe0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:48.615 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 3 retries left. Waiting
2.0 seconds...
2025-02-13 23:13:49.920 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>
│ │ │ └ (<weakref at 0x7f6442aefce0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b331a0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:49.938 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 1 retries left. Waiting
8.0 seconds...
2025-02-13 23:13:50.601 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>
│ │ │ └ (<weakref at 0x7f6442b3f0b0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b31580>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:50.620 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 5 retries left. Waiting
0.5 seconds...
2025-02-13 23:13:50.659 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>
│ │ │ └ (<weakref at 0x7f6442b3dad0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30fe0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:50.685 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 2 retries left. Waiting
4.0 seconds...
2025-02-13 23:13:51.174 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>
│ │ │ └ (<weakref at 0x7f6442b3f0b0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b31580>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:51.194 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 4 retries left. Waiting
1.0 seconds...
2025-02-13 23:13:52.210 | INFO |
src.services.components.preprocessing.preprocessor:extract_text_from_image:190 -
Got 200 Textract response
2025-02-13 23:13:52.238 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>
│ │ │ └ (<weakref at 0x7f6442b3f0b0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b31580>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:52.255 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 3 retries left. Waiting
2.0 seconds...
2025-02-13 23:13:54.297 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>
│ │ │ └ (<weakref at 0x7f6442b3f0b0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b31580>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:54.312 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 2 retries left. Waiting
4.0 seconds...
2025-02-13 23:13:54.733 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>
│ │ │ └ (<weakref at 0x7f6442b3dad0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30fe0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:54.752 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 1 retries left. Waiting
8.0 seconds...
2025-02-13 23:13:57.995 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_0, started
140068573886016)>
│ │ │ └ (<weakref at 0x7f6442aefce0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_0, started 140068573886016)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b60560>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b331a0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b60560>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:58.013 | ERROR |
src.services.pipelines.pipeline_utils:func_with_retries:309 - Function:
extract_text_from_image
Failed despite best efforts after 5 tries.
args: (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\
x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s\x00\x00&s\x01\xf3lu\n\x00\x04L\
xc0IDATx\x9c\xec\xbdI\xac\xb4[U\xff/\x97\x80F\x140"\\\xa2\xde\x10A\x1a\xf5jP\x14\
x10\xf0gb\x8c\xd82\x91\xc4\x04\x1c93\x9a0\xc0\x84\t2\x10\x13g\xc6\x89ht`\x8c\x8e41\
x1a\x82\x11\x8c\xd6\xa9\xaa\xa7\x9e\xea\xfb\xbe\xef\xfb\xbe{\xaa;\xa7\xfe\xdf_\xad\
xff\xd9\xbf\xfdV\x9d\xf7\xdc\x17\xee\xfb\x9e\xf6\xfb\x19<\xd9\xb5\x9f\xdd\xae\xfd\
xd4d\xad\xbd\x, kwargs: no kwargs
Exception: ProvisionedThroughputExceededException('An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded')
2025-02-13 23:13:58.352 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>
│ │ │ └ (<weakref at 0x7f6442b3f0b0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b31580>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:13:58.368 | WARNING |
src.services.pipelines.pipeline_utils:func_with_retries:313 - Retrying
extract_text_from_image after exception: ProvisionedThroughputExceededException('An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded'). 1 retries left. Waiting
8.0 seconds...
2025-02-13 23:14:02.795 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_3, started
140068345869888)>
│ │ │ └ (<weakref at 0x7f6442b3dad0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_3, started 140068345869888)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b272f0>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b30fe0>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b272f0>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:14:02.814 | ERROR |
src.services.pipelines.pipeline_utils:func_with_retries:309 - Function:
extract_text_from_image
Failed despite best efforts after 5 tries.
args: (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\
x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s\x00\x00&s\x01\xf3lu\n\x00\x05\
x02\xe2IDATx\x9c\xec\x9dI\xa8m\xdbU\xf7\xf3^^\x12\x93`$\x88\x1a\x1b&BT\xe2\x13\
x14,\x82\xc6\x8e\xcd(\xd8\x11\xec\xda\xd1\x8e (\x88\xda1B \x8aF\x11\x91t\xfc\x04! \
x84 Q4A\xbb\xbbZ{\xed\xba\xae\xeb\xba\xae\xab\xb5\xabs\xf6\xf7w\x8f\xef\xcco\xde\
xbd\xcf9\xf7\xbe\xf7\xee=\xe5\xff\xd7X\xcc=\xd7,\xc7\\\xbb3\xc6\x1cc|\xe8@\x08!\
x84\x10B\x08!\x84\x10, kwargs: no kwargs
Exception: ProvisionedThroughputExceededException('An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded')
2025-02-13 23:14:06.427 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_from_image:199 -
Error in extract_text_from_image: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1032, in _bootstrap


self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f65cb835e40>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1075, in
_bootstrap_inner
self.run()
│ └ <FunctionWrapper at 0x7f6498607b50 for function at 0x7f65cb835b20>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 151, in __wrap_threading_run
return call_wrapped(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method Thread.run of <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>>
File "/home/ray/anaconda3/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-7_5, started
140068324890176)>
│ │ │ └ (<weakref at 0x7f6442b3f0b0; to 'ThreadPoolExecutor' at
0x7f645cc31b20>, <_queue.SimpleQueue object at 0x7f645cb96d40>, None,...
│ │ └ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
│ └ <function _worker at 0x7f65c90f25c0>
└ <Thread(ThreadPoolExecutor-7_5, started 140068324890176)>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 93,
in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7f65c90f3c40>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ │ │ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\
x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
│ │ └ <concurrent.futures.thread._WorkItem object at
0x7f6442b40e60>
│ └ <function
ThreadingInstrumentor.__wrap_thread_pool_submit.<locals>.wrapped_func at
0x7f6442b31580>
└ <concurrent.futures.thread._WorkItem object at 0x7f6442b40e60>
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:14:06.446 | ERROR |
src.services.pipelines.pipeline_utils:func_with_retries:309 - Function:
extract_text_from_image
Failed despite best efforts after 5 tries.
args: (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\
x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s\x00\x00&s\x01\xf3lu\n\x00\x04-\
x93IDATx\x9c\xec\xbdI\xa8u\xdbU\xf7\x9d\x9b\xeaZ\x10\x8964\xd7\x10c$\t\x12\x83\x06\
xc5\x84{\xb5a\x15T\xc4\x8eb\xeb\x15T\xb0!\x08\x12\x10l\xa9 (\xd8\xb2!\xdaPP\xb1%D\
xd0\x8e\xda\x08\x12v\xb5\xf6\xdau]\xd7u]WkW\xe7\xec\xef\xff\xee\xf1\x9d\xf9\xceg\
xefs\xce}\xee\xbd\xcfs\xca\xff\xaf\xb1\x98{\xae1\xab1\xd7\xee\x8c1\xe7\x18\x1f:\
x10B\x08!\x84\x10, kwargs: no kwargs
Exception: ProvisionedThroughputExceededException('An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded')
2025-02-13 23:14:06.447 | ERROR |
src.services.components.preprocessing.preprocessor:extract_text_with_textract:225 -
Error in extract_text_with_textract: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_private/workers/
default_worker.py", line 297, in <module>
worker.main_loop()
│ └ <function Worker.main_loop at 0x7f65c7c458a0>
└ <ray._private.worker.Worker object at 0x7f65c7e1e840>
File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_private/worker.py",
line 935, in main_loop
self.core_worker.run_task_loop()
│ │ └ <method 'run_task_loop' of 'ray._raylet.CoreWorker' objects>
│ └ <ray._raylet.CoreWorker object at 0x7f65c7c70f40>
└ <ray._private.worker.Worker object at 0x7f65c7e1e840>

File "/app/src/services/pipelines/data_ingestion_ray.py", line 1010, in


process_item_ray
) = get_processed_data_standalone(
└ <function get_processed_data_standalone at 0x7f646ee1b880>

File "/app/src/services/pipelines/data_ingestion_ray.py", line 693, in


get_processed_data_standalone
processed_data, processing_stats = processor.process(data)
│ │ └ b'%PDF-1.3\n3 0 obj\
n<</Type /Page\n/Parent 1 0 R\n/Resources 2 0 R\n/Contents 4 0 R>>\nendobj\n4 0
obj\n<</Filter /FlateDeco...
│ └ <function PreProcessor.process
at 0x7f649f11da80>

<src.services.components.preprocessing.preprocessor.PreProcessor object at
0x7f646e5d2420>

File "/app/src/services/components/preprocessing/preprocessor.py", line 446, in


process
pages = self.extract_text_with_textract(raw_bytes, page_limit)
│ │ │ └ None
│ │ └ b'%PDF-1.3\n3 0 obj\n<</Type /Page\
n/Parent 1 0 R\n/Resources 2 0 R\n/Contents 4 0 R>>\nendobj\n4 0 obj\n<</Filter
/FlateDeco...
│ └ <function PreProcessor.extract_text_with_textract at
0x7f649f11d4e0>
└ <src.services.components.preprocessing.preprocessor.PreProcessor
object at 0x7f646e5d2420>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 202, in


extract_text_with_textract
results = parallel_thread(extract_text_from_image, images)
│ │ └ <generator object
PreProcessor.pdf_to_images at 0x7f64438b0480>
│ └ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>
└ <function parallel_thread at 0x7f64a504e160>

File "/app/src/services/pipelines/pipeline_utils.py", line 212, in


parallel_thread
return list(executor.map(processing_function, inputs))
│ │ │ └ <generator object
PreProcessor.pdf_to_images at 0x7f64438b0480>
│ │ └ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>
│ └ <function Executor.map at 0x7f65cb6c2980>
└ <concurrent.futures.thread.ThreadPoolExecutor object at
0x7f645cc31b20>

File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 619,


in result_iterator
yield _result_or_cancel(fs.pop())
│ │ └ <method 'pop' of 'list' objects>
│ └ [<Future at 0x7f6442b421b0 state=finished returned
list>, <Future at 0x7f6442b62ba0 state=finished raised ProvisionedThroughp...
└ <function _result_or_cancel at 0x7f65cb6c1e40>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 317,
in _result_or_cancel
return fut.result(timeout)
└ None
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 456,
in result
return self.__get_result()
└ None
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401,
in __get_result
raise self._exception
└ None
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ └ None
│ └ None
└ None
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:14:06.463 | ERROR |
src.services.components.preprocessing.preprocessor:process:493 - Exception: An
error occurred (ProvisionedThroughputExceededException) when calling the
DetectDocumentText operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_private/workers/
default_worker.py", line 297, in <module>
worker.main_loop()
│ └ <function Worker.main_loop at 0x7f65c7c458a0>
└ <ray._private.worker.Worker object at 0x7f65c7e1e840>
File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_private/worker.py",
line 935, in main_loop
self.core_worker.run_task_loop()
│ │ └ <method 'run_task_loop' of 'ray._raylet.CoreWorker' objects>
│ └ <ray._raylet.CoreWorker object at 0x7f65c7c70f40>
└ <ray._private.worker.Worker object at 0x7f65c7e1e840>

File "/app/src/services/pipelines/data_ingestion_ray.py", line 1010, in


process_item_ray
) = get_processed_data_standalone(
└ <function get_processed_data_standalone at 0x7f646ee1b880>

File "/app/src/services/pipelines/data_ingestion_ray.py", line 693, in


get_processed_data_standalone
processed_data, processing_stats = processor.process(data)
│ │ └ b'%PDF-1.3\n3 0 obj\
n<</Type /Page\n/Parent 1 0 R\n/Resources 2 0 R\n/Contents 4 0 R>>\nendobj\n4 0
obj\n<</Filter /FlateDeco...
│ └ <function PreProcessor.process
at 0x7f649f11da80>

<src.services.components.preprocessing.preprocessor.PreProcessor object at
0x7f646e5d2420>

> File "/app/src/services/components/preprocessing/preprocessor.py", line 446, in


process
pages = self.extract_text_with_textract(raw_bytes, page_limit)
│ │ │ └ None
│ │ └ b'%PDF-1.3\n3 0 obj\n<</Type /Page\
n/Parent 1 0 R\n/Resources 2 0 R\n/Contents 4 0 R>>\nendobj\n4 0 obj\n<</Filter
/FlateDeco...
│ └ <function PreProcessor.extract_text_with_textract at
0x7f649f11d4e0>
└ <src.services.components.preprocessing.preprocessor.PreProcessor
object at 0x7f646e5d2420>

File "/app/src/services/components/preprocessing/preprocessor.py", line 202, in


extract_text_with_textract
results = parallel_thread(extract_text_from_image, images)
│ │ └ <generator object
PreProcessor.pdf_to_images at 0x7f64438b0480>
│ └ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>
└ <function parallel_thread at 0x7f64a504e160>

File "/app/src/services/pipelines/pipeline_utils.py", line 212, in


parallel_thread
return list(executor.map(processing_function, inputs))
│ │ │ └ <generator object
PreProcessor.pdf_to_images at 0x7f64438b0480>
│ │ └ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>
│ └ <function Executor.map at 0x7f65cb6c2980>
└ <concurrent.futures.thread.ThreadPoolExecutor object at
0x7f645cc31b20>

File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 619,


in result_iterator
yield _result_or_cancel(fs.pop())
│ │ └ <method 'pop' of 'list' objects>
│ └ [<Future at 0x7f6442b421b0 state=finished returned
list>, <Future at 0x7f6442b62ba0 state=finished raised ProvisionedThroughp...
└ <function _result_or_cancel at 0x7f65cb6c1e40>
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 317,
in _result_or_cancel
return fut.result(timeout)
└ None
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 456,
in result
return self.__get_result()
└ None
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401,
in __get_result
raise self._exception
└ None
File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/thread.py", line 59,
in run
result = self.fn(*self.args, **self.kwargs)
│ │ └ None
│ └ None
└ None
File
"/home/ray/anaconda3/lib/python3.12/site-packages/opentelemetry/instrumentation/
threading/__init__.py", line 170, in wrapped_func
return original_func(*func_args, **func_kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\
x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae7b00>

File "/app/src/services/pipelines/pipeline_utils.py", line 297, in


func_with_retries
return f(*args, **kwargs)
│ │ └ {}
│ └ (b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\
x0bl\x08\x02\x00\x00\x00P@\xc0\x0e\x00\x00\x00\tpHYs\x00\x00&s...
└ <function
PreProcessor.extract_text_with_textract.<locals>.extract_text_from_image at
0x7f6442ae77e0>

File "/app/src/services/components/preprocessing/preprocessor.py", line 184, in


extract_text_from_image
res = self.textract_client.detect_document_text(Document={"Bytes": image})
│ │ │ └ b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x00\x00\tpHYs\x00\x00&s\...
│ │ └ <function
ClientCreator._create_api_method.<locals>._api_call at 0x7f645cc3a660>
│ └ <botocore.client.Textract object at 0x7f645cb70950>
└ <src.services.components.preprocessing.preprocessor.PreProcessor object
at 0x7f646e5d2420>

File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line


569, in _api_call
return self._make_api_call(operation_name, kwargs)
│ │ │ └ {'Document': {'Bytes': b'\x89PNG\
r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x08\x12\x00\x00\x0bl\x08\x02\x00\x00\x00P@\xc0\
x0e\x00\x...
│ │ └ 'DetectDocumentText'
│ └ <function BaseClient._make_api_call at 0x7f6547ee0180>
└ <botocore.client.Textract object at 0x7f645cb70950>
File "/home/ray/anaconda3/lib/python3.12/site-packages/botocore/client.py", line
1023, in _make_api_call
raise error_class(parsed_response, operation_name)
│ │ └ 'DetectDocumentText'
│ └ {'Error': {'Message': 'Provisioned rate exceeded', 'Code':
'ProvisionedThroughputExceededException'}, 'ResponseMetadata': {'R...
└ <class 'botocore.errorfactory.ProvisionedThroughputExceededException'>

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred


(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:14:06.479 | ERROR |
src.services.pipelines.data_ingestion_ray:get_processed_data_standalone:840 - Error
in get_processed_data_standalone: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
Traceback (most recent call last):

File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_private/workers/
default_worker.py", line 297, in <module>
worker.main_loop()
│ └ <function Worker.main_loop at 0x7f65c7c458a0>
└ <ray._private.worker.Worker object at 0x7f65c7e1e840>
File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_private/worker.py",
line 935, in main_loop
self.core_worker.run_task_loop()
│ │ └ <method 'run_task_loop' of 'ray._raylet.CoreWorker' objects>
│ └ <ray._raylet.CoreWorker object at 0x7f65c7c70f40>
└ <ray._private.worker.Worker object at 0x7f65c7e1e840>

File "/app/src/services/pipelines/data_ingestion_ray.py", line 1010, in


process_item_ray
) = get_processed_data_standalone(
└ <function get_processed_data_standalone at 0x7f646ee1b880>

> File "/app/src/services/pipelines/data_ingestion_ray.py", line 785, in


get_processed_data_standalone
raise ValueError(message)
└ 'An error occurred (ProvisionedThroughputExceededException)
when calling the DetectDocumentText operation: Provisioned rate e...

ValueError: An error occurred (ProvisionedThroughputExceededException) when calling


the DetectDocumentText operation: Provisioned rate exceeded
2025-02-13 23:14:06.480 | ERROR |
src.services.pipelines.data_ingestion_ray:get_processed_data_standalone:841 -
Traceback (most recent call last):
File "/app/src/services/pipelines/data_ingestion_ray.py", line 785, in
get_processed_data_standalone
raise ValueError(message)
ValueError: An error occurred (ProvisionedThroughputExceededException) when calling
the DetectDocumentText operation: Provisioned rate exceeded

2025-02-13 23:14:06.480 | INFO |


src.services.pipelines.data_ingestion_ray:get_processed_data_standalone:843 - Error
in get_processed_data_standalone: An error occurred
(ProvisionedThroughputExceededException) when calling the DetectDocumentText
operation: Provisioned rate exceeded
2025-02-13 23:14:06.566 | INFO |
src.database_utils.database_utils:update_dataset_items:650 - Matched documents:1
2025-02-13 23:14:06.567 | INFO |
src.database_utils.database_utils:update_dataset_items:651 - Modified documents:1
2025-02-13 23:14:06.567 |ERROR | ray:process_item_ray:1206 | Error in Ray process
item ray: An error occurred (ProvisionedThroughputExceededException) when calling
the DetectDocumentText operation: Provisioned rate exceeded
2025-02-13 23:14:06.567 |INFO | ray:process_item_ray:1207 | Traceback (most
recent call last):
File "/app/src/services/pipelines/data_ingestion_ray.py", line 1010, in
process_item_ray
) = get_processed_data_standalone(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/src/services/pipelines/data_ingestion_ray.py", line 785, in
get_processed_data_standalone
raise ValueError(message)
ValueError: An error occurred (ProvisionedThroughputExceededException) when calling
the DetectDocumentText operation: Provisioned rate exceeded

2025-02-13 23:14:06.567 |INFO | ray:process_item_ray:1210 | Error in Ray process


item ray: An error occurred (ProvisionedThroughputExceededException) when calling
the DetectDocumentText operation: Provisioned rate exceeded

You might also like