0% found this document useful (0 votes)
1 views

Comparative Analysis of Differential Privacy Implementations on Synthetic Data

The document presents a comparative analysis of two differential privacy implementations, PyDP and IBM diffprivlib, focusing on their effectiveness in maintaining privacy and utility in synthetic medical data. PyDP excels in preserving data accuracy for analytics, while IBM diffprivlib prioritizes privacy, making it suitable for sensitive applications. The findings suggest that PyDP is better for smaller datasets requiring high utility, whereas IBM diffprivlib is ideal for large-scale, privacy-critical tasks.

Uploaded by

maxovi2685
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Comparative Analysis of Differential Privacy Implementations on Synthetic Data

The document presents a comparative analysis of two differential privacy implementations, PyDP and IBM diffprivlib, focusing on their effectiveness in maintaining privacy and utility in synthetic medical data. PyDP excels in preserving data accuracy for analytics, while IBM diffprivlib prioritizes privacy, making it suitable for sensitive applications. The findings suggest that PyDP is better for smaller datasets requiring high utility, whereas IBM diffprivlib is ideal for large-scale, privacy-critical tasks.

Uploaded by

maxovi2685
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Authors:-

Huwida E. Said, Qusay H.


Mahmoud, Faiza Hashim, Mandeep
Goyal

Presenter:-
Qusay H. Mahmoud

Comparative Analysis of Differential Privacy


Implementations on Synthetic Data
Jan 6th-8th, 2025
Introduction
Context:-
• The digitization of sensitive data (e.g., medical records) demands robust privacy
solutions to protect individuals while enabling valuable insights.
• In healthcare, balancing privacy and utility is critical as inaccurate analyses can
lead to harmful decisions.
• Differential privacy (DP) addresses these challenges by adding controlled noise
to data, making it difficult to infer individual information.

Objective:-
• Compare PyDP (Google) and IBM diffprivlib for:
• Effectiveness in maintaining privacy and utility.
• Trade-offs in performance and computational efficiency.

2
Differential Privacy Overview
➔ Differential Privacy ensures that outputs reveal minimal information about any
single data point in the dataset.
➔ Protects individual privacy while preserving overall patterns for analysis.

Core Parameters:
• Epsilon (ε): Governs the amount of noise; lower ε = stronger privacy, less utility.
• Delta (δ): Probability of privacy violation in Gaussian mechanisms.

Mechanisms:
• Laplace Mechanism: Adds noise based on scale; ideal for numeric data.
• Gaussian Mechanism: Adds more noise for stricter privacy requirements.

DP is a powerful tool for medical datasets, ensuring data is usable for research while
safeguarding sensitive information.

3
Comparative Libraries
PyDP (Google):
• High-performance library tailored for large datasets.
• Limited to Laplace and Gaussian mechanisms.
• Designed for analytics-driven use cases.

IBM diffprivlib:
• Broad support for machine learning integration.
• Offers 11 mechanisms, including advanced configurations.
• Preferred for privacy-critical tasks in sensitive fields like healthcare.

Selection Rationale:
• PyDP chosen for its strong utility in data analytics.
• IBM diffprivlib chosen for its ability to integrate privacy within ML workflows.

4
Experimental Setup
Dataset and Configuration:
• Dataset: Synthetic medical data, 1,000 rows, 4 columns.
• Mechanisms Tested: Laplace and Gaussian.
• Parameters:
• Epsilon (ε) = 1.0: Balanced privacy and utility.
• Delta (δ) = 0.00001: Common standard for Gaussian mechanisms.

Experiment Goals:
• Measure the trade-off between privacy and utility.
• Compare computational performance for large datasets.

5
Results: Utility and Privacy
Observations:
• PyDP:
• Laplace mechanism maintains better accuracy and preserves data patterns.
• Gaussian mechanism offers stronger privacy but distorts data more
significantly.
• IBM diffprivlib:
• Consistently introduces more noise, resulting in stronger privacy but
reduced utility.
• Better suited for cases where privacy is a top priority.

Key Insight:
• PyDP favors accuracy for analytics.
• IBM diffprivlib prioritizes privacy, especially for sensitive use cases.

6
Performance Comparison
Runtime Analysis:
• PyDP:
• Slower for both mechanisms, especially Gaussian (~4x slower than IBM
diffprivlib).
• Computational time increases with dataset size.
• IBM diffprivlib:
• Consistently faster across both mechanisms, with minimal runtime variance.
• Ideal for large-scale datasets requiring quick processing.

Conclusion:
• PyDP: Best for smaller datasets and high utility.
• IBM diffprivlib: Best for large-scale, privacy-critical applications.

7
Discussion
Key Takeaways:
• Strengths of Each Tool:
• PyDP excels in preserving data utility for analytics tasks.
• IBM diffprivlib prioritizes privacy guarantees, especially in sensitive
applications.
• Trade-offs:
• Adjusting ε and mechanisms allows tailoring privacy to specific scenarios.
• Computational overhead is higher for PyDP, making IBM’s library better for
large-scale tasks.

Practical Guidance:
• Use PyDP for tasks like financial modeling.
• Use IBM diffprivlib for healthcare diagnostics where privacy is paramount.

8
Conclusion
Summary:
• Differential Privacy is essential for balancing data utility with privacy protection.
• PyDP and IBM diffprivlib each serve distinct purposes:
• PyDP: Accuracy-driven use cases.
• IBM diffprivlib: Privacy-centric applications.

Future Work:
• Evaluate additional tools across diverse datasets.
• Extend analysis to real-world data in healthcare and other sensitive domains.
• Explore integration with advanced technologies like federated learning or
homomorphic encryption.

9
Thank You

10

You might also like