SlideShare a Scribd company logo
Data compression with Python: application of differentData compression with Python: application of different
algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files
Vinicius Seus¹ (viniciusseus@gmail.com)
Alex Camargo¹ (alexcamargoweb@gmail.com)
Diego Mengarda² (diegormengarda@gmail.com)
¹FURG
²UNIPAMPA
Brazil
MOL2NET
International Conference Series on Multidisciplinary Sciences
http://sciforum.net/conference/mol2net-03
2
Introduction
This work proposes and evaluates an implementation of different
algorithms of data compression using the Python programming
language allied to the use of threads.
 As a case study it was used genomic data available from the
NCBI (National Center for Biotechnology Information) public
database.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
3
Introduction
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
Figure 1. Graphical Abstract
4
Materials and Methods
For the experimental environment it was used a simplified structure
composed of a computer with: 3.07GHz processor (12 cores), 12 GB
RAM and 500 GB hard disk.
 Table 1 shows the results of the experiments for the file
"ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of
122 MB, referring to an excerpt from the genome identified by
"Aligator sinensis", belonging to the family Alligatoridae.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
5
Conclusions
The main contribution of this work was to present an algorithm
option for data compression based on the Python
programming language.
 By default, the algorithms were not designed to work in
parallel, however, with the use of the Python Threading library
this was achieved.
 With the experimental environment implemented, it was
possible to analyze the performance of both the compression
rate and the compression time for each algorithm.
 As future works, we intend to extend the range of algorithms to
be studied as well as the application and analysis of
decompression metrics with emphasis on public genomic data.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
6
References
BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression
algorithm. 1994.
VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto
Alegre, Brasil, v. 1, 2014.
WELCH, Terry A. A technique for high-performance data compression. Computer,
v. 6, n. 17, p. 8-19, 1984.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
Ad

More Related Content

Similar to Data compression with Python: application of different algorithms with the use of threads in genome files (20)

Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...
Alex Camargo
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
ijtsrd
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow execution
Mario Jose Villamizar Cano
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
CSCJournals
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
Jaclyn Williams
 
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
ijcsit
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
Priscill Orue Esquivel
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
Pablo Bernabeu
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
ecij
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
ecij
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
Oladokun Sulaiman
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
Alejandra Gonzalez-Beltran
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
University Medicine Greifswald
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
International Journal of Modern Research in Engineering and Technology
 
short-story.pptx
short-story.pptxshort-story.pptx
short-story.pptx
SravaniRaparla
 
Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...
Alex Camargo
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
ijtsrd
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow execution
Mario Jose Villamizar Cano
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
CSCJournals
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano
 
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
ijcsit
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
Priscill Orue Esquivel
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
Pablo Bernabeu
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
ecij
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
ecij
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
Oladokun Sulaiman
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
University Medicine Greifswald
 

More from Alex Camargo (20)

Escola Bíblica - Eclesiologia
Escola Bíblica - EclesiologiaEscola Bíblica - Eclesiologia
Escola Bíblica - Eclesiologia
Alex Camargo
 
Escola Bíblica - Demonologia
Escola Bíblica - DemonologiaEscola Bíblica - Demonologia
Escola Bíblica - Demonologia
Alex Camargo
 
Python para finanças: explorando dados financeiros
Python para finanças: explorando dados financeirosPython para finanças: explorando dados financeiros
Python para finanças: explorando dados financeiros
Alex Camargo
 
A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?
Alex Camargo
 
IA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínasIA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínas
Alex Camargo
 
Introdução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurançaIntrodução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurança
Alex Camargo
 
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Alex Camargo
 
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Alex Camargo
 
Empatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom SamaritanoEmpatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom Samaritano
Alex Camargo
 
Alta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem praticaAlta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem pratica
Alex Camargo
 
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidadeBioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Alex Camargo
 
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoInteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Alex Camargo
 
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
Alex Camargo
 
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Alex Camargo
 
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoFake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Alex Camargo
 
PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020
Alex Camargo
 
Ia versus covid 19 - alex
Ia versus covid 19 - alexIa versus covid 19 - alex
Ia versus covid 19 - alex
Alex Camargo
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence
Alex Camargo
 
Aula 5 - Considerações finais
Aula 5 - Considerações finaisAula 5 - Considerações finais
Aula 5 - Considerações finais
Alex Camargo
 
Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)
Alex Camargo
 
Escola Bíblica - Eclesiologia
Escola Bíblica - EclesiologiaEscola Bíblica - Eclesiologia
Escola Bíblica - Eclesiologia
Alex Camargo
 
Escola Bíblica - Demonologia
Escola Bíblica - DemonologiaEscola Bíblica - Demonologia
Escola Bíblica - Demonologia
Alex Camargo
 
Python para finanças: explorando dados financeiros
Python para finanças: explorando dados financeirosPython para finanças: explorando dados financeiros
Python para finanças: explorando dados financeiros
Alex Camargo
 
A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?
Alex Camargo
 
IA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínasIA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínas
Alex Camargo
 
Introdução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurançaIntrodução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurança
Alex Camargo
 
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Alex Camargo
 
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Alex Camargo
 
Empatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom SamaritanoEmpatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom Samaritano
Alex Camargo
 
Alta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem praticaAlta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem pratica
Alex Camargo
 
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidadeBioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Alex Camargo
 
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoInteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Alex Camargo
 
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
Alex Camargo
 
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Alex Camargo
 
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoFake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Alex Camargo
 
PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020
Alex Camargo
 
Ia versus covid 19 - alex
Ia versus covid 19 - alexIa versus covid 19 - alex
Ia versus covid 19 - alex
Alex Camargo
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence
Alex Camargo
 
Aula 5 - Considerações finais
Aula 5 - Considerações finaisAula 5 - Considerações finais
Aula 5 - Considerações finais
Alex Camargo
 
Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)
Alex Camargo
 
Ad

Recently uploaded (20)

Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Ad

Data compression with Python: application of different algorithms with the use of threads in genome files

  • 1. Data compression with Python: application of differentData compression with Python: application of different algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files Vinicius Seus¹ ([email protected]) Alex Camargo¹ ([email protected]) Diego Mengarda² ([email protected]) ¹FURG ²UNIPAMPA Brazil MOL2NET International Conference Series on Multidisciplinary Sciences http://sciforum.net/conference/mol2net-03
  • 2. 2 Introduction This work proposes and evaluates an implementation of different algorithms of data compression using the Python programming language allied to the use of threads.  As a case study it was used genomic data available from the NCBI (National Center for Biotechnology Information) public database. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
  • 3. 3 Introduction MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition Figure 1. Graphical Abstract
  • 4. 4 Materials and Methods For the experimental environment it was used a simplified structure composed of a computer with: 3.07GHz processor (12 cores), 12 GB RAM and 500 GB hard disk.  Table 1 shows the results of the experiments for the file "ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of 122 MB, referring to an excerpt from the genome identified by "Aligator sinensis", belonging to the family Alligatoridae. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
  • 5. 5 Conclusions The main contribution of this work was to present an algorithm option for data compression based on the Python programming language.  By default, the algorithms were not designed to work in parallel, however, with the use of the Python Threading library this was achieved.  With the experimental environment implemented, it was possible to analyze the performance of both the compression rate and the compression time for each algorithm.  As future works, we intend to extend the range of algorithms to be studied as well as the application and analysis of decompression metrics with emphasis on public genomic data. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
  • 6. 6 References BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression algorithm. 1994. VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto Alegre, Brasil, v. 1, 2014. WELCH, Terry A. A technique for high-performance data compression. Computer, v. 6, n. 17, p. 8-19, 1984. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition