0% found this document useful (0 votes)
3 views

FigureInterviewNotes2

The document outlines various applications and challenges of AI in document processing and customer interaction, focusing on technical questions and coding examples. It discusses methods for assessing confidence in AI outputs, mitigating hallucinations, and evaluating the performance of large language models (LLMs). Additionally, it provides sample code for implementing AI solutions using Google Cloud Platform services.

Uploaded by

re wi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

FigureInterviewNotes2

The document outlines various applications and challenges of AI in document processing and customer interaction, focusing on technical questions and coding examples. It discusses methods for assessing confidence in AI outputs, mitigating hallucinations, and evaluating the performance of large language models (LLMs). Additionally, it provides sample code for implementing AI solutions using Google Cloud Platform services.

Uploaded by

re wi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Corey and Hunter: AI/LLM (Corey will focus on technical questions and some case studies.

Hunter will spend the 45 minutes on a coding


problem.)
 AI Document Processing Applications
 AI Customer interaction applications (chat/phone)
 Assessing confidence of an output
 Mitigating hallucinations
 non deterministic nature of LLMs
 Evaluating LLM outputs
 Model selection
 Agentic workflows
AI Document Processing Applications
Possible Interview Questions & Answers
General AI Document Processing Questions
Q: What are some common challenges in AI-based document processing for loan applications?
A:
 Variability in document formats (PDFs, scanned images, handwritten vs. typed text).
 Poor quality scans (blurry images, low resolution, shadows).
 Data privacy and security concerns (handling PII securely).
 Understanding unstructured data (parsing tables, detecting handwritten signatures).
 Compliance requirements (ensuring processed documents meet financial regulations).

Q: How would you automate the processing of ID documents and loan applications using AI?
A:
 Use OCR (Optical Character Recognition) (e.g., Google Cloud Vision API) to extract text.
 Use NLP models (e.g., Document AI) to classify sections and extract relevant fields.
 Apply Named Entity Recognition (NER) to identify key fields like name, SSN, loan amount.
 Validate extracted data using business rules (e.g., checking if a name matches across documents).
 Integrate with GCP services for secure storage (Cloud Storage) and processing (Cloud Functions, BigQuery).

Q: How would you handle handwritten text in loan applications?


A:
 Use Google Cloud Vision API for handwriting recognition.
 Pre-process images using image enhancement (OpenCV) to improve OCR accuracy.
 Fine-tune a transformer-based model (like TrOCR or Donut) for better recognition.

Q: How would you ensure data privacy and compliance in AI-based document processing?
A:
 Use encryption (Cloud KMS) for document storage.
 Implement access controls using IAM policies.
 Anonymize sensitive PII fields before processing.
 Ensure compliance with GDPR, CCPA, or financial regulations.
Sample Code for an AI Document Processing Pipeline
GCP Version (Using Cloud Document AI & Vision API)
This example:
 Uploads a document to Cloud Storage.
 Extracts text using Document AI.
 Uses Cloud Natural Language API to analyze key fields.
from google.cloud import documentai, storage, language_v1

PROJECT_ID = "your-gcp-project-id"
BUCKET_NAME = "your-bucket-name"
PROCESSOR_ID = "your-document-ai-processor-id"
LOCATION = "us" # Adjust based on your region

def upload_to_gcs(local_file_path, gcs_file_name):


"""Uploads a document to Google Cloud Storage."""
client = storage.Client()
bucket = client.bucket(BUCKET_NAME)
blob = bucket.blob(gcs_file_name)
blob.upload_from_filename(local_file_path)
return f"gs://{BUCKET_NAME}/{gcs_file_name}"

def extract_text_from_document(gcs_uri):
"""Processes a document using Document AI."""
client = documentai.DocumentUnderstandingServiceClient()
input_config = {"gcs_source": {"uri": gcs_uri}, "mime_type": "application/pdf"}

request = {"name": f"projects/{PROJECT_ID}/locations/{LOCATION}/processors/{PROCESSOR_ID}",


"raw_document": input_config}

result = client.process_document(request=request)
text = result.document.text
return text

def analyze_text_with_nlp(text):
"""Uses NLP API to extract entities (e.g., names, amounts, addresses)."""
client = language_v1.LanguageServiceClient()
document = language_v1.Document(content=text, type_=language_v1.Document.Type.PLAIN_TEXT)
response = client.analyze_entities(document=document)
return [(entity.name, entity.type_.name) for entity in response.entities]

# Example usage
file_path = "loan_application.pdf"
gcs_uri = upload_to_gcs(file_path, "loan_application.pdf")
extracted_text = extract_text_from_document(gcs_uri)
entities = analyze_text_with_nlp(extracted_text)

print("Extracted Entities:", entities)


AI Customer interaction applications (chat/phone)
Possible Interview Questions & Answers
General AI Customer Interaction Questions

Q: What are the key challenges in building a real-time chat suggestion system for call centers?
A:
 Latency – The system must process speech, retrieve data, and generate responses in real time.
 Speech Recognition Accuracy – Background noise, accents, and technical jargon can reduce accuracy.
 Personalization – Responses should be context-aware and tailored to the user.
 Data Privacy & Security – User data must be handled securely (PII protection).
 Scalability – Needs to handle thousands of concurrent interactions.

Q: How would you build a real-time call center chat assistant using AI?
A:
 Use Speech-to-Text API (Google Cloud Speech-to-Text) to transcribe calls in real-time.
 Use NLP models (like Gemini/PaLM 2, RAG, or fine-tuned LLMs) to extract user intent.
 Query a database (BigQuery, Firestore) to retrieve user loan history and metadata.
 Use a retrieval-augmented generation (RAG) model to provide personalized responses.
 Display results in an agent-friendly UI (e.g., Google Contact Center AI).

Q: How would you optimize speech recognition for call center applications?
A:
 Use custom speech models trained on industry-specific vocabulary.
 Preprocess audio (noise reduction, echo cancellation).
 Use keyword boosting to improve recognition of important terms (e.g., “HELOC,” “loan balance”).

Q: How do you handle hallucinations in AI-generated responses?


A:
 Use retrieval-augmented generation (RAG) to ground responses in real customer data.
 Implement confidence thresholds to filter uncertain responses.
 Allow agents to edit/approve responses before sending.

Q: How do you ensure compliance and security in an AI-driven customer support system?
A:
 Encrypt customer data in transit (TLS) and at rest (Cloud KMS).
 Use role-based access control (IAM) to restrict sensitive data.
 Store only anonymized transcripts for model fine-tuning.
Sample Code for a Real-Time Chat Suggestion System
GCP Version (Using Speech-to-Text, Vertex AI & Firestore)

This example:
 Transcribes live speech using Google Cloud Speech-to-Text.
 Extracts intent using Vertex AI’s Generative Model (PaLM 2 / Gemini).
 Fetches customer data from Firestore.
 Suggests a response to the agent.
from google.cloud import speech, firestore, aiplatform
import time

PROJECT_ID = "your-gcp-project-id"
LOCATION = "us-central1"
MODEL_NAME = "gemini-pro"
customer_id = "user123"

# Initialize clients
speech_client = speech.SpeechClient()
firestore_client = firestore.Client()
vertex_ai_client = aiplatform.generation_models.TextGenerationModel.from_pretrained(MODEL_NAME)

def transcribe_audio(audio_file):
"""Converts speech to text using Google Cloud Speech-to-Text."""
with open(audio_file, "rb") as audio:
content = audio.read()
audio_config = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
response = speech_client.recognize(config=config, audio=audio_config)
return response.results[0].alternatives[0].transcript if response.results else ""

def get_customer_data(customer_id):
"""Fetches customer data from Firestore."""
doc_ref = firestore_client.collection("customers").document(customer_id)
doc = doc_ref.get()
return doc.to_dict() if doc.exists else {}

def generate_response(transcript, customer_data):


"""Uses Vertex AI (PaLM 2 / Gemini) to generate a response."""
prompt = f"User: {transcript}\n\nAgent should respond based on this info: {customer_data}"
response = vertex_ai_client.predict([prompt])
return response.text

# Example usage
audio_text = transcribe_audio("call_audio.wav")
customer_info = get_customer_data(customer_id)
response_suggestion = generate_response(audio_text, customer_info)

print("Customer Query:", audio_text)


print("Suggested Response:", response_suggestion)
Assessing confidence of an output
Q: Why is assessing confidence in AI model outputs important?
A:
 Ensures reliable decision-making, especially in financial applications like loan processing.
 Helps detect hallucinations or incorrect predictions in LLMs.
 Improves user trust by providing confidence scores with responses.
 Enables adaptive system behavior, e.g., escalating uncertain results for human review.
 Supports compliance with financial regulations by ensuring accurate AI-driven interactions.

Q: What techniques can be used to estimate confidence in model predictions?


A:
 Softmax Probabilities – For classification models, the softmax output indicates confidence.
 Logits & Temperature Scaling – Adjusting temperature can calibrate confidence scores.
 Monte Carlo (MC) Dropout – Running a model multiple times with dropout enabled to estimate uncertainty.
 Bayesian Neural Networks (BNNs) – Use probabilistic distributions instead of fixed weights to model uncertainty.
 Embedding Similarity – Measure cosine similarity between generated responses and retrieved documents.
 Calibration Metrics – Use Expected Calibration Error (ECE) and Brier Score to quantify confidence reliability.

Q: How would you measure the confidence of an LLM’s response on GCP?


A:
 Log Probability from Vertex AI LLMs – Use the predict() function in Vertex AI to extract token probabilities.
 Retrieval-Augmented Generation (RAG) Score – Compare retrieved knowledge with generated responses.
 Self-Consistency – Run the model multiple times and check consistency across outputs.
 Human-in-the-loop (HITL) Review – Use low-confidence thresholds to escalate to manual review in GCP workflows.

Q: How do you handle low-confidence predictions?


A:
 Fallback Mechanisms – Use GCP services like Vertex AI Search to retrieve relevant information when the LLM has low confidence.
 Threshold-Based Routing – Use Cloud Functions to flag low-confidence predictions for human review in Google Cloud Workflows.
 Explainability Methods – Leverage Vertex AI Explainable AI to interpret model predictions and improve transparency.

Q: How would you evaluate confidence calibration on GCP?


A:
 Reliability Diagrams – Use BigQuery ML to analyze predicted confidence vs. actual correctness.
 Expected Calibration Error (ECE) – Compute calibration error using Python in Vertex AI Workbench.
 Brier Score – Measure the mean squared difference between predicted confidence and actual outcomes using BigQuery ML.
All this basically says is that their internal classifier has a built in confidence score.

Key Takeaways:
 Uses Vertex AI Endpoints for text classification.
 Extracts softmax confidence scores for model predictions.
All this basically says is to run the prediction 10 times and check the std in the softmax scores. If the softmax is varying widely then it’s probably not reliable.

Key Takeaways:
 Runs multiple forward passes with dropout enabled to estimate uncertainty.
 Higher standard deviation means lower confidence.
This basically just takes the generated token confidences and looks at what the average value is to get an idea how happy it is with its answer.

Key Takeaways:
 Uses Vertex AI Generative AI (Gemini) to extract log probabilities.
 Lower log prob = lower confidence, flagging uncertain responses.
Mitigating hallucinations
Q: What are AI hallucinations, and why are they a problem?
A:
 AI hallucinations occur when a model generates false or misleading information that appears confident but lacks factual grounding.
 In financial applications like loan processing, hallucinations can lead to incorrect customer guidance, regulatory non-compliance, and
reputational damage.
 It is critical to detect and mitigate hallucinations to maintain trust and ensure AI-generated outputs align with verified data.

Q: What strategies can be used to mitigate hallucinations in AI systems?


A:
 Retrieval-Augmented Generation (RAG) – Ground responses in external databases or knowledge bases.
 Confidence Scoring & Thresholds – Filter out low-confidence responses using probability thresholds.
 Fact Verification Models – Cross-check AI-generated responses against verified financial documents.
 Human-in-the-Loop (HITL) Review – Route uncertain responses to human agents.
 Prompt Engineering – Design structured prompts to constrain the AI’s response space.
 Fine-tuning on High-Quality Data – Reduce hallucinations by fine-tuning on domain-specific, high-accuracy data.

Q: How can hallucination mitigation be implemented on Google Cloud Platform (GCP)?


A:
 Vertex AI + RAG – Use Vertex AI’s embedding models with a vector database like BigQuery ML or Vertex AI Matching Engine to retrieve
factual data.
 Self-Consistency Check – Generate multiple responses and ensure agreement before presenting results.
 Fact Checking with BigQuery – Cross-reference AI outputs with structured financial data.
 Real-time HITL Review – Use Cloud Functions to flag and escalate potential hallucinations.

Q: How can hallucinations be measured and evaluated?


A:
 Faithfulness Score – Measure factual consistency between generated text and retrieved documents.
 BERTScore or Cosine Similarity – Compare embeddings of generated responses with ground truth documents.
 Adversarial Testing – Expose models to tricky prompts to evaluate robustness.
 Human Evaluation Metrics – Implement a human review process in GCP workflows for qualitative assessment.
Non-deterministic nature of LLMs
Q: What does it mean that LLMs are non-deterministic?
A: Large Language Models (LLMs) are non-deterministic because they do not always produce the same output for a given input. This is due to
factors like temperature settings, sampling techniques (e.g., top-k, top-p sampling), and the probabilistic nature of deep learning models. Even
when using the same input prompt, slight variations in internal computations can lead to different results.

Q: How does temperature affect LLM output variability?


A: Temperature controls randomness in output generation:
 High temperature (e.g., 1.0+) – More diverse, creative, and unpredictable responses.
 Low temperature (e.g., 0.1 or 0) – More deterministic, focused, and repetitive responses.
 Setting temperature to 0 forces the model to always pick the most probable token, reducing non-determinism but potentially leading to
repetitive outputs.

Q: What are some strategies to make LLMs more deterministic?


A:
 Set temperature to a low value (or 0 for fully deterministic behavior).
 Use beam search instead of stochastic sampling to improve consistency.
 Fix a random seed when making API/model calls.
 Limit sampling randomness with top-k (limiting vocabulary choices) or top-p (nucleus sampling) methods.
 Use retrieval-augmented generation (RAG) to constrain outputs to factual sources.

Q: How can Google Cloud Platform (GCP) help manage LLM non-determinism?
A:
 Vertex AI’s Text Generation Models (PaLM 2, Text-Bison) allow control over temperature, top-k, and top-p.
 Cloud Logging & BigQuery to track output variations over time.
 Vertex AI Model Monitoring to detect drift or unexpected randomness.
 Cloud Functions to standardize model calls with fixed parameters.
Summary
 Explain why LLMs are non-deterministic
o probabilistic models
o temperature
o sampling techniques
 Discuss ways to reduce randomness
o low temperature
o fixed seeds
o top-k
o greedy decoding
 Highlight GCP tools for controlling LLM behavior (Vertex AI, Cloud Logging, BigQuery, Cloud Functions).
Evaluating LLM outputs
Q: What does it mean to evaluate LLM outputs, and why is it important?
A: Evaluating LLM outputs involves assessing the quality, reliability, and appropriateness of responses. This is crucial in financial applications
(e.g., loan processing) to ensure compliance, prevent misinformation, and maintain trust. Evaluation methods help detect bias, hallucinations,
and factual inconsistencies before deploying AI solutions.

Q: What are some key metrics used to evaluate LLM outputs?


A:
 Accuracy/Factual Consistency – Measures alignment with ground truth data.
 BLEU/ROUGE Scores – Common NLP metrics for comparing model output to a reference.
 BERTScore/Semantic Similarity – Evaluates meaning preservation using embeddings.
 Confidence Scores – Provides a probability measure of model certainty.
 Toxicity/Bias Detection – Flags inappropriate or unfair responses.
 Human Evaluation – Uses experts to assess correctness, fluency, and relevance.

Q: How can Google Cloud Platform (GCP) be used to evaluate LLM outputs?
A:
 Vertex AI Model Monitoring – Tracks model performance and detects drift.
 BigQuery ML for Quality Analysis – Stores outputs and performs statistical analysis.
 Confidence Thresholding – Uses probability scores from models like PaLM 2 or Text-Bison to filter low-confidence responses.
 Cloud Functions for Automated Audits – Runs checks on generated text before presenting it to users.
 Perspective API for Toxicity Detection – Flags harmful or biased responses.

Q: How can we ensure model outputs remain high-quality over time?


A:
 Automated Evaluation Pipelines – Regularly test outputs using GCP services.
 Human-in-the-Loop (HITL) Review – Validate critical responses manually.
 Retrieval-Augmented Generation (RAG) – Use factual grounding to minimize errors.
 A/B Testing – Continuously compare different model versions.
 Feedback Loops – Integrate real user feedback to refine models.
Model selection
Q: What factors do you consider when selecting an AI model for a given task?
A: Model selection depends on:
 Task requirements (e.g., classification, summarization, retrieval, generation)
 Performance metrics (e.g., accuracy, F1-score, latency, cost)
 Data constraints (e.g., labeled data availability, domain specificity)
 Computational resources (e.g., GPU/TPU requirements, inference time)
 Scalability & deployment feasibility (e.g., cloud vs. edge deployment)
 Explainability & regulatory compliance (e.g., interpretability in financial/legal settings)

Q: When would you choose a pre-trained model versus training a custom model?
A:
 Pre-trained models (e.g., PaLM 2, BERT, T5) are ideal when:
o You have limited labeled data.
o The task aligns closely with common NLP tasks (summarization, classification, retrieval).
o You need quick deployment with minimal fine-tuning.
 Custom models (e.g., fine-tuned Transformer models) are better when:
o The task requires domain adaptation (e.g., legal or financial text processing).
o Performance from off-the-shelf models is insufficient.
o You have sufficient labeled data for training and tuning.

Q: How does Google Cloud Platform (GCP) assist in model selection?


A:
 Vertex AI Model Garden provides pre-trained models (PaLM, BERT, T5, Gemini) for quick testing.
 Vertex AI AutoML enables training custom models with minimal ML expertise.
 BigQuery ML allows SQL-based model training for structured data.
 TPU/GPU on GCP enables training/fine-tuning of large-scale models.
 Model monitoring tools help evaluate latency, accuracy, and cost trade-offs.

Summary
 Explain model selection criteria (performance, data, computational needs, regulatory constraints).
 Discuss pre-trained vs. custom models based on the use case.
Agentic workflows
Q: What are agentic workflows in the context of AI applications?
A: Agentic workflows involve AI systems that can autonomously take actions, interact with external environments, and iteratively refine their
outputs based on feedback. These workflows enable:
 Task decomposition (breaking down complex tasks into subtasks)
 Decision-making loops (adjusting responses dynamically)
 External tool usage (retrieving data, triggering actions, and updating information)
 Self-improvement (learning from user feedback or additional context)

Q: How do agentic workflows enhance AI-powered customer interactions?


A: They enable:
 Proactive assistance (predicting customer needs and providing information before it is requested)
 Dynamic problem-solving (adjusting responses based on real-time customer input)
 Seamless multi-step interactions (managing conversations that require retrieving and processing information across different systems)
 Automated follow-ups (scheduling actions, confirming transactions, etc.)

Q: How would you implement agentic workflows for AI-driven customer support on GCP?
A:
 Dialogflow CX for managing complex conversation flows
 Vertex AI Agents for creating AI-powered agents that interact with APIs, databases, and cloud functions
 Cloud Functions to automate backend processes triggered by AI actions
 BigQuery for querying user history and application status
 Pub/Sub for managing event-driven workflows

You might also like