0% found this document useful (0 votes)
24 views150 pages

System Design

The document discusses various AWS big data services including Amazon Athena for interactive querying of data stored in Amazon S3, AWS Glue for data cataloging and metadata management, and Amazon Redshift for data warehousing. It also covers Spark and YARN for distributed computing frameworks and algorithms like TF-IDF for text analysis. Finally, it mentions Poisson distribution and exponential functions as concepts relevant to modeling event occurrence over time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views150 pages

System Design

The document discusses various AWS big data services including Amazon Athena for interactive querying of data stored in Amazon S3, AWS Glue for data cataloging and metadata management, and Amazon Redshift for data warehousing. It also covers Spark and YARN for distributed computing frameworks and algorithms like TF-IDF for text analysis. Finally, it mentions Poisson distribution and exponential functions as concepts relevant to modeling event occurrence over time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150














Amazon Athena

Amazon Simple Storage AWS Glue


Service (Amazon S3)

Amazon Redshift




























































































𝑇𝑒𝑟𝑚 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦




𝑃𝑅(𝑇1 ) 𝑃𝑅(𝑇𝑛 )
𝑃𝑅 𝐴 = 1 − 𝑑 + 𝑑 + ⋯+
• 𝐶(𝑇1 ) 𝐶(𝑇𝑛 )

















- Cache •
-Tasks

(Spark, - Cache •
YARN) -Tasks

- Cache
-Tasks













































366 = 2176782336











𝑒 −𝜆𝑡






𝑇𝑒𝑟𝑚 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦














You might also like