0% found this document useful (0 votes)
3 views

Definition ML GCP

Uploaded by

sherinmaryklm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Definition ML GCP

Uploaded by

sherinmaryklm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Definitions

1. Machine Learning Basics

1.1 Machine Learning Concepts

Supervised Learning - Training with labeled data.

Unsupervised Learning - Clustering and dimensionality reduction.

Reinforcement Learning - Learning through rewards and penalties.

Common ML Algorithms

Linear Regression, Logistic Regression, Decision Trees.

Random Forest, Gradient Boosting (XGBoost, LightGBM).

Neural Networks and Deep Learning techniques.

1.2 Model Evaluation Metrics

Classification Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC.

Regression Metrics: RMSE, MSE, MAE, R².

Clustering Metrics: Silhouette Score, Davies-Bouldin Index.

1.3 Overfitting and Underfitting

Overfitting: High complexity models fit training data too closely.

Underfitting: Simple models fail to capture data patterns.

Techniques:

Regularization (L1, L2).

Dropout layers in neural networks.

Cross-validation.

2. Google Cloud AI and ML Services

2.1 Vertex AI
Unified platform for ML model training, deployment, and management.

Services:

Vertex AI Workbench: Jupyter Notebook environment.

Vertex AI Pipelines: Workflow orchestration.

Feature Store: Centralized storage for ML features.

Model Registry: Tracks model versions and metadata.

2.2 BigQuery ML

Build and deploy ML models directly in BigQuery.

Algorithms supported:

Linear/Logistic Regression, K-Means Clustering, XGBoost.

TensorFlow and AutoML models.

Use cases:

Forecasting, anomaly detection, classification.

2.3 AutoML

Tools to train custom ML models without coding expertise.

Supported data types:

Tables, images, text, video.

Features:

Hyperparameter tuning.

Explainable AI (XAI).

2.4 Pre-trained APIs

Vision AI: Image and video analysis.


Natural Language API: Sentiment analysis, entity recognition.

Translation API: Multilingual translation.

Speech-to-Text and Text-to-Speech APIs.

3. Data Engineering for ML

3.1 Data Preprocessing

Handling missing data (mean/mode imputation).

Feature scaling (Standardization/Normalization).

Categorical encoding (One-hot encoding, Label encoding).

Feature engineering techniques:

Polynomial features.

Feature extraction from text/images.

3.2 Data Storage and Pipelines

Google Cloud Storage for large datasets.

BigQuery for structured data.

Dataflow for ETL pipelines.

Pub/Sub for real-time streaming data.

3.3 Feature Engineering and Selection

Methods:

Principal Component Analysis (PCA).

Recursive Feature Elimination (RFE).

Mutual information.

Automate feature management using Vertex AI Feature Store.


4. Model Training and Tuning

4.1 Training ML Models

TensorFlow and PyTorch support in Vertex AI.

Distributed training with GPUs and TPUs.

Hyperparameter tuning:

Grid Search, Random Search, Bayesian Optimization.

4.2 Custom ML Models

TensorFlow Extended (TFX) for scalable pipelines.

ML frameworks (scikit-learn, XGBoost, TensorFlow).

Model evaluation and debugging tools (TensorBoard).

5. Model Deployment and Serving

5.1 Deployment Options

Online prediction: Real-time inference.

Batch prediction: Large-scale, asynchronous inference.

Deploy via Vertex AI Prediction or AI Platform.

5.2 CI/CD for ML Pipelines

Version control with Git and Cloud Source Repositories.

Continuous training and deployment with Vertex AI Pipelines.

MLflow and TensorBoard for experiment tracking.

5.3 Monitoring Models in Production

Drift detection using TensorFlow Data Validation (TFDV).

Monitoring performance with Vertex AI Model Monitoring.


Logging and alerting with Cloud Logging.

6. MLOps and Automation

6.1 ML Pipeline Automation

Kubeflow Pipelines for reproducible workflows.

Data preprocessing, training, evaluation, and deployment stages.

Workflow management with Apache Airflow.

6.2 Model Governance and Security

6.3 Versioning and Reproducibility

● Track data, code, and models.


● Implement Docker containers for consistent environments.
● Use DVC (Data Version Control) for managing ML experiments.

7. Responsible AI Practices
7.1 Fairness and Bias Mitigation

● Tools: What-If Tool, TensorFlow Model Analysis.


● Methods: Re-sampling, re-weighting, adversarial debiasing.

7.2 Interpretability and Explainability

● SHAP and LIME techniques.


● Vertex AI Explainable AI for model insights.

7.3 Privacy and Security

● Federated learning for privacy-preserving models.


● Differential privacy to protect sensitive data.

8. Generative AI and LLMs


8.1 Vertex AI Generative AI Studio

● Pre-trained foundation models for text, image, and code.


● Custom fine-tuning of generative models.
● Prompt engineering techniques.

8.2 Model Garden

● Ready-to-use pre-trained models (e.g., BERT, GPT).


● Integration with APIs for natural language tasks.

9. Additional Tools and Libraries


9.1 Data Science Libraries

● Pandas, NumPy, Matplotlib, Seaborn for data manipulation.


● Scikit-learn for ML algorithms.

9.2 Deep Learning Libraries

● TensorFlow and Keras for neural networks.


● PyTorch for flexible deep learning.

9.3 Optimization and Frameworks

● XGBoost and LightGBM for ensemble methods.


● Dask and Apache Beam for distributed computing.

You might also like