Data compression with Python: application of different algorithms with the use of threads in genome files

Sep 30, 20181 like271 views

Alex Camargo

Data compression with Python: application of differentData compression with Python: application of different
algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files
Vinicius Seus¹ (viniciusseus@gmail.com)
Alex Camargo¹ (alexcamargoweb@gmail.com)
Diego Mengarda² (diegormengarda@gmail.com)
¹FURG
²UNIPAMPA
Brazil
MOL2NET
International Conference Series on Multidisciplinary Sciences
http://sciforum.net/conference/mol2net-03

2
Introduction
This work proposes and evaluates an implementation of different
algorithms of data compression using the Python programming
language allied to the use of threads.
 As a case study it was used genomic data available from the
NCBI (National Center for Biotechnology Information) public
database.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

3
Introduction
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
Figure 1. Graphical Abstract

4
Materials and Methods
For the experimental environment it was used a simplified structure
composed of a computer with: 3.07GHz processor (12 cores), 12 GB
RAM and 500 GB hard disk.
 Table 1 shows the results of the experiments for the file
"ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of
122 MB, referring to an excerpt from the genome identified by
"Aligator sinensis", belonging to the family Alligatoridae.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

5
Conclusions
The main contribution of this work was to present an algorithm
option for data compression based on the Python
programming language.
 By default, the algorithms were not designed to work in
parallel, however, with the use of the Python Threading library
this was achieved.
 With the experimental environment implemented, it was
possible to analyze the performance of both the compression
rate and the compression time for each algorithm.
 As future works, we intend to extend the range of algorithms to
be studied as well as the application and analysis of
decompression metrics with emphasis on public genomic data.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

6
References
BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression
algorithm. 1994.
VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto
Alegre, Brasil, v. 1, 2014.
WELCH, Terry A. A technique for high-performance data compression. Computer,
v. 6, n. 17, p. 8-19, 1984.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

Multiscale computing involves combining models with different scales, such as combining molecular and continuum models. It is done by coupling simulation codes through various frameworks. This allows separate models to be run on different computational resources and updated independently while being coupled together. The document discusses several multiscale applications, such as modeling blood flow and clay-polymer nanocomposites, and technologies for automating code coupling like MUSCLE, MPWide and FabSim. It concludes that multiscale computing is increasingly important for accurately modeling large physical systems and the software exists to conveniently combine models and run them in parallel on supercomputers.

A beginner's guide to igraph in pythonTapesh Mandal

This document provides a beginner's guide to using the igraph library in Python. It discusses what networks are, how to realize and visualize graphs in igraph. It also covers scale-free networks and power laws, using a co-authorship dataset to demonstrate that these networks follow a power law distribution with a few nodes having many connections. Code and a plot obtained are shown as an example of analyzing a scale-free network in igraph.

MS Imaging data in ProteomeXchange (HUPO 2014)Juan Antonio Vizcaino

The document discusses representing imaging mass spectrometry (MS) data. It describes imzML, a common data standard for MS imaging data. It also outlines how MS imaging data can be submitted to the ProteomeXchange repository via the PRIDE database. MS imaging generates data from tissue sections, and imzML encodes both the raw data files and metadata about the images. Submitting to ProteomeXchange involves uploading raw data files, result files, and metadata descriptions to allow sharing and reuse of MS imaging experiments.

BioInformatics MCQAfra Fathima

NSF Quantum Leap Poster 2019Nathan Frey, PhD

APS March Meeting Nathan Frey 2020Nathan Frey, PhD

The document discusses using high-throughput computational methods and machine learning to discover materials with coexisting magnetic and topological orders. It presents a workflow that couples magnetic and topological property prediction with the existing Materials Project infrastructure. This workflow involves calculating magnetic orderings for over 3,000 transition metal oxides, predicting critical temperatures, and classifying materials using machine learning before determining topological band properties. The results have uncovered 27 ferromagnetic semimetal and 7 antiferromagnetic topological insulator candidates.

The IUGONET project and its international cooperation on development of metad...Iugo Net

This document discusses the IUGONET project and its international cooperation on developing a metadata database for upper atmospheric study. It notes that IUGONET has developed infrastructures and tools like a metadata database and data analysis software to facilitate distribution and use of ground-based upper atmospheric data. IUGONET is in discussion with other international projects like SPASE and ESPAS to collaborate on their metadata databases. The metadata database currently includes data from IUGONET institutions as well as other Japanese organizations conducting upper atmospheric research.

Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyNathan Frey, PhD

Machine learning and artificial intelligence have transformed our online experience, and for an increasing number of individuals, these fields are fundamentally changing the way we work. In this talk, I will discuss how machine learning is used in the physical sciences, particularly materials science and chemistry, and what transformative impacts we have seen or might expect to see in the future. This discussion will focus on the unique challenges (and opportunities) faced by materials and chemistry researchers applying machine learning in their work. I will present a brief introduction to machine learning for physical scientists and give examples related to synthesis, property prediction and engineering, and artificial intelligence that “reads” research articles. These examples will introduce some of the most prevalent and useful open-source software tools that drive modern machine learning applications. Two significant themes will be emphasized throughout: the careful evaluation of machine learning results and the central importance of data quality and quantity. Finally, I will provide some mundane, “human learned” speculation about the future of machine learning in physical science and recommended resources for further study.

Optimizing queries via search server ElasticSearch: a study applied to large ...Alex Camargo

This document summarizes a study that optimized queries on large genomic datasets using the ElasticSearch search server. The researchers compared ElasticSearch to MySQL and PostgreSQL databases. They found that ElasticSearch achieved significantly faster response times for queries, with performance gains of 91.7% over MySQL and 94.9% over PostgreSQL. As future work, the researchers intend to investigate adapting searches in ElasticSearch to support autocomplete queries using LIKE, as supported in other databases.

Computational of Bioinformaticsijtsrd

Computational methods to analyze biological data. It is a way to introduce some of the many resources available for analyzing sequence data with bioinformatics software. This paper will cover the theoretical approaches to data resources and we will get knowledge about some sequential alignments with its databases. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics, and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Databases are essential for bioinformatics research and applications. Many databases exist, covering various information types for example, DNA and protein sequences, molecular structures, phenotypes, and biodiversity. Databases may contain empirical data. Conceptualizing biology in terms of molecules and then applying informatics techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale. In this materialistic world, People are studying bioinformatics in different ways. Some people are devoted to developing new computational tools, both from software and hardware viewpoints, for the better handling and processing of biological data. They develop new models and new algorithms for existing questions and propose and tackle new questions when new experimental techniques bring in new data. Other people take the study of bioinformatics as the study of biology with the viewpoint of informatics and systems. Durgesh Raghuvanshi | Vivek Solanki | Neha Arora | Faiz Hashmi "Computational of Bioinformatics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30891.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30891/computational-of-bioinformatics/durgesh-raghuvanshi

Bio-UnaGrid: Easing bioinformatics workflow executionMario Jose Villamizar Cano

We propose the Bio-UnaGrid infrastructure to facilitate the automatic execution of intensive-computing workflows that require the use of existing application suites and distributed computing infrastructures. With Bio-UnaGrid, bioinformatics workflows are easily created and executed, with a simple click and in a transparent manner, on different cluster and grid computing infrastructures (line command is not used). To provide more processing capabilities, at low cost, Bio-UnaGrid use the idle processing capabilities of computer labs with Windows, Linux and Mac desktop computers, using a key virtualization strategy. We implement Bio-UnaGrid in a dedicated cluster and a computer lab. Results of performance tests evidence the gain obtained by our researchers.

Python for Big Data AnalyticsEdureka!

Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Using Python makes Programmers more productive and their programs ultimately better. Python is continued to be a favorite option for data scientists who use it for building and using Machine learning applications and other scientific computations. Python runs on Windows, Linux/Unix, Mac OS and has been ported to Java and .NET virtual machines. Python is free to use, even for the commercial products, because of its OSI-approved open source license. Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain.

A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals

We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.

2015 genome-centerc.titus.brown

This document discusses the challenges and opportunities biology faces with increasing data generation. It outlines four key points: 1) Research approaches for analyzing infinite genomic data streams, such as digital normalization which compresses data while retaining information. 2) The need for usable software and decentralized infrastructure to perform real-time, streaming data analysis. 3) The importance of open science and reproducibility given most researchers cannot replicate their own computational analyses. 4) The lack of data analysis training in biology and efforts at UC Davis to address this through workshops and community building.

Scientific Workflows: what do we have, what do we miss?Paolo Romano

This document discusses scientific workflows and outlines some key points: - Scientific workflows are used to automate data retrieval and analysis processes from multiple databases and tools. Workflow management systems help implement these processes. - Issues with current workflow systems include lack of automatic composition capabilities, performance limitations especially with large data volumes, and ensuring reproducibility of results over time as databases and tools change. - The document outlines approaches to address these issues such as using ontologies to support automatic composition, optimizing for performance through parallelization and alternative services, and capturing provenance data to improve reproducibility and reuse of analyses.

Pine education-platformJaclyn Williams

This document discusses using the T-BioInfo platform to provide practical education in bioinformatics. It describes how the platform can integrate different types of omics data and analysis into intuitive, visual pipelines. This allows non-experts to analyze and interpret complex datasets. Example projects are provided, such as using RNA-seq data to identify genes involved in a disease. The goal is to teach bioinformatics through collaborative, project-based learning without requiring programming skills. Learners would reconstruct simulated biological processes and contribute to ongoing analysis of real scientific datasets.

A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit

The process of building ontology is a very complex and time - consuming process especially when dealing with huge amount of data. Unfortunately current marketed tools are very limited and don’t meet all user needs. Indeed, t hese software build the core of the ontology from initial data that generates a big number of information. In this paper, we aim to resolve these problems by adding an extension to the well known ontology editor Protégé in order to work towards a complete FCA - based framework which resolves the limitation of other tools in building fuzzy - ontology . W e will give , in this paper , some details on our sem i - automat ic collaborative tool called FOD Tab Plug - in which takes into consideration another degree of granularity in the process of generation . In fact, i t follows a bottom - up strategy based on conceptual clustering, fuzzy logic and Formal Concept Analysis (FCA) a nd it defines ontology between classes resulting from a preliminary classification of data and not from the initial large amount of data .

Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel

It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation. Multi-SNP interactions, also known as epistatic interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. However, epistatic analysis methods are both computationally expensive, and have limited accessibility for biologists wanting to analyse GWAS datasets due to being command line based. Here we present APPistatic, a prototype desktop version of a pipeline for epistatic analysis of GWAS datasets. his application combines ease-of-use, via a GUI, with accelerated implementation of BOOST and FaST-LMM epistatic analysis methods.

Towards reproducibility and maximally-open dataPablo Bernabeu

The BlueBRIDGE approach to collaborative researchBlue BRIDGE

USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij

Nowadays, real-time systems and intelligent systems offer more and more control interface based on voice recognition or human language recognition. Robots and drones will soon be mainly controlled by voice. Other robots will integrate bots to interact with their users, this can be useful both in industry and entertainment. At first, researchers were digging on the side of "ontology reasoning". Given all the technical constraints brought by the treatment of ontologies, an interesting solution has emerged in last years: the construction of a model based on machine learning to connect a human language to a knowledge base (based for example on RDF). We present in this paper our contribution to build a bot that could be used on real-time systems and drones/robots, using recent machine learning technologies.

USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij

Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman

The International Journal of Computer Science and Information Security (IJCSIS) is a publication venue for novel research in computer science and information security. This issue from December 2010 contains 5 research papers. The first paper proposes a 128-bit chaotic hash function that uses the logistic map and MD5/SHA-1 hashes. The second paper discusses constructing an ontology for representing human emotions in videos to improve video retrieval. The third paper proposes an intelligent memory controller for H.264 encoders to reduce external memory access. The fourth paper investigates the impact of fragmentation on query performance in distributed databases. The fifth paper examines the effect of guard intervals in a proposed MIMO-OFDM system for wireless communication.

OpenTox Europe 2013Alejandra Gonzalez-Beltran

The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.

International Journal of Engineering Research and Development (IJERD)IJERD Editor

This document discusses using Python coding to store datasets in Hadoop databases for real-time applications. It begins by introducing big data and where Hadoop databases are used. The basic platform of Hadoop stores datasets using Java programs but Python is proposed as it is more user-friendly, efficient to code, debug and execute on all platforms. Examples are given comparing Python and Java programs. The major differences between the languages are outlined in a table. The document then discusses using Python for various real-world projects and platforms before concluding Python is better suited than Java for big data applications.

Standards and tools for model management in biomedical researchUniversity Medicine Greifswald

Slides from the presentation at IDAMO 2016, Rostock. May 2016. Most scientific discoveries rely on previous or other findings. A lack of transparency and openness led to what many consider the "reproducibility crisis" in systems biology and systems medicine. The crisis arose from missing standards and inappropriate support of standards in software tools. As a consequence, numerous results in low-and high-profile publications cannot be reproduced. In my presentation, I summarise key challenges of reproducibility in systems biology and systems medicine, and I demonstrate available solutions to the related problems.

The Value and Benefits of Data-to-Text TechnologiesInternational Journal of Modern Research in Engineering and Technology

Data-to-text technologies present an enormous and exciting opportunity to help audiences understand some of the insights present in today’s vasts and growing amounts of electronic data. In this article we analyze the potential value and benefits of these solutions as well as their risks and limitations for a wider penetration. These technologies already bring substantial advantages of cost, time, accuracy and clarity versus other traditional approaches or format. On the other hand, there are still important limitations that restrict the broad applicability of these solutions, most importantly in the limited quality of their output. However we find that the current state of development is sufficient for the application of these solution across many domains and use cases and recommend businesses of all sectors to consider how to deploy them to enhance the value they are currently getting from their data. As the availability of data keeps growing exponentially and natural language generation technology keeps improving, we expect data-to-text solutions to take a much more bigger role in the production of automated content across many different domains.

short-story.pptxSravaniRaparla

This document discusses using large language models like GPT-3, Codex, and ChatGPT to generate data visualizations from natural language queries. It conducted experiments providing natural language prompts to these models to generate Python scripts for visualizations. The results showed the models were effective at producing visualizations from natural language when supported by well-engineered prompts, demonstrating large language models can support end-to-end generation of visualizations from natural language input.

Escola Bíblica - EclesiologiaAlex Camargo

Escola Bíblica - DemonologiaAlex Camargo

More Related Content

Similar to Data compression with Python: application of different algorithms with the use of threads in genome files (20)

Optimizing queries via search server ElasticSearch: a study applied to large ...Alex Camargo

Computational of Bioinformaticsijtsrd

Bio-UnaGrid: Easing bioinformatics workflow executionMario Jose Villamizar Cano

Python for Big Data AnalyticsEdureka!

A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals

2015 genome-centerc.titus.brown

Scientific Workflows: what do we have, what do we miss?Paolo Romano

Pine education-platformJaclyn Williams

A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit

Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel

Towards reproducibility and maximally-open dataPablo Bernabeu

The BlueBRIDGE approach to collaborative researchBlue BRIDGE

USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij

Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman

OpenTox Europe 2013Alejandra Gonzalez-Beltran

International Journal of Engineering Research and Development (IJERD)IJERD Editor

Standards and tools for model management in biomedical researchUniversity Medicine Greifswald

The Value and Benefits of Data-to-Text TechnologiesInternational Journal of Modern Research in Engineering and Technology

short-story.pptxSravaniRaparla

Optimizing queries via search server ElasticSearch: a study applied to large ...Alex Camargo

Computational of Bioinformaticsijtsrd

Bio-UnaGrid: Easing bioinformatics workflow executionMario Jose Villamizar Cano

Python for Big Data AnalyticsEdureka!

A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals

2015 genome-centerc.titus.brown

Scientific Workflows: what do we have, what do we miss?Paolo Romano

Pine education-platformJaclyn Williams

A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit

Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel

Towards reproducibility and maximally-open dataPablo Bernabeu

The BlueBRIDGE approach to collaborative researchBlue BRIDGE

USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij

Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman

OpenTox Europe 2013Alejandra Gonzalez-Beltran

International Journal of Engineering Research and Development (IJERD)IJERD Editor

Standards and tools for model management in biomedical researchUniversity Medicine Greifswald

The Value and Benefits of Data-to-Text TechnologiesInternational Journal of Modern Research in Engineering and Technology

short-story.pptxSravaniRaparla

More from Alex Camargo (20)

Escola Bíblica - EclesiologiaAlex Camargo

Escola Bíblica - DemonologiaAlex Camargo

Python para finanças: explorando dados financeirosAlex Camargo

[1] O documento apresenta uma palestra sobre Python para finanças, explorando dados financeiros no FLISOL 2023. [2] É introduzido o mercado financeiro e seus principais conceitos. Em seguida, é explicado como Python é usado na área financeira, por meio de bibliotecas, coleta e visualização de dados e modelagem. [3] Por fim, é apresentado um estudo de caso utilizando o Google Colab para acessar dados de ações e visualizá-los.

A practical guide: How to use Bitcoins?Alex Camargo

This document provides a practical guide on how to use Bitcoins. It discusses Alex Camargo's presentations on cryptocurrencies and Bitcoin. It then introduces Bitcoin, explaining that it operates on a decentralized network using blockchain technology. It outlines the steps to use Bitcoins, including getting a wallet, purchasing coins, sending coins, and using them to make purchases. Finally, it concludes that Bitcoins provide benefits like low fees but also stresses the importance of security and awareness of risks like volatility.

IA e Bioinformática: modelos computacionais de proteínasAlex Camargo

Este documento apresenta uma palestra sobre inteligência artificial e bioinformática, com foco em modelos computacionais de proteínas. Apresenta breve introdução sobre IA, bioinformática e suas aplicações, abordando problemas como predição de estrutura e função de proteínas, alinhamento de sequências e desenvolvimento de fármacos. Discorre também sobre tendências da área, como aprendizado de máquina e processamento paralelo, e aplicações em diagnóstico médico.

Introdução às criptomoedas: investimento, mercado e segurançaAlex Camargo

Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!Alex Camargo

Cristão versus Redes Sociais - Alex (Arca da Aliança)Alex Camargo

Empatia e compaixão: O Bom SamaritanoAlex Camargo

O documento apresenta uma palestra sobre empatia e compaixão com base na parábola bíblica do Bom Samaritano em Lucas 10:36-37. A palestra discute quem eram os samaritanos, o relato do crime contra o homem abandonado e como o sacerdote e o levita não o ajudaram, ao contrário do samaritano que teve compaixão. A mensagem principal é sobre a importância de ter empatia e agir com compaixão para com os necessitados, assim como o Bom Samaritano fez.

Alta performance em IA: uma abordagem praticaAlex Camargo

O documento discute alta performance em inteligência artificial (IA) de forma prática. Apresenta o palestrante Alex Camargo e seus projetos em IA aplicada, como sistemas de apoio médico. Discutem conceitos como aprendizado de máquina (ML), aprendizado profundo (DL) e ferramentas para desenvolvimento de IA como Python, TensorFlow e PyTorch. Demonstra experimentos com paralelismo em redes neurais profundas usando módulos como tf.data para melhorar a velocidade. Por fim, aborda considerações sobre o mercado de trabalho em

Bioinformática do DNA ao medicamento: ferramentas e usabilidadeAlex Camargo

O documento discute bioinformática, definindo-a como o emprego de ferramentas computacionais no estudo de problemas biológicos. Aborda a história da bioinformática desde a descoberta da estrutura do DNA, o Projeto Genoma Humano, e o desenvolvimento de estratégias de planejamento de fármacos utilizando ferramentas computacionais. Também discute os principais problemas alvo da bioinformática, como análise de sequências e estruturas, e tendências atuais como manipulação de grandes dados, processamento paral

Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoAlex Camargo

IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)Alex Camargo

Algoritmos de inteligência artificial para classificação de notícias falsas. ...Alex Camargo

Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoAlex Camargo

PredictCovid: IA. SIEPE UNIPAMPA 2020Alex Camargo

O documento descreve o sistema PredictCovid, que usa inteligência artificial para apoiar a triagem de pacientes com suspeita de COVID-19. O sistema treina um modelo de deep learning usando imagens médicas e pode classificar novos casos como positivo ou negativo. O objetivo é fornecer uma ferramenta gratuita e segura para auxiliar médicos durante a pandemia. Os resultados iniciais mostraram alta acurácia na classificação de imagens de raio-x.

Ia versus covid 19 - alexAlex Camargo

2a Mini-conf PredictCovid. Field: Artificial IntelligenceAlex Camargo

1. The team trained a CNN model on a COVID-19 X-ray image dataset to automatically detect COVID-19 in chest X-rays. They used tools like TensorFlow, Keras, and Python. 2. They evaluated the model using techniques like cross-validation, data augmentation, TensorBoard for visualization, and checkpointing to save models during training. 3. Future work could focus on reducing memory usage, improving model interpretation, and developing multi-modal COVID detectors using different types of medical data.

Aula 5 - Considerações finaisAlex Camargo

Aula 04 - Injeção de código (Cross-Site Scripting)Alex Camargo

Escola Bíblica - EclesiologiaAlex Camargo

Escola Bíblica - DemonologiaAlex Camargo

Python para finanças: explorando dados financeirosAlex Camargo

A practical guide: How to use Bitcoins?Alex Camargo

IA e Bioinformática: modelos computacionais de proteínasAlex Camargo

Introdução às criptomoedas: investimento, mercado e segurançaAlex Camargo

Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!Alex Camargo

Cristão versus Redes Sociais - Alex (Arca da Aliança)Alex Camargo

Empatia e compaixão: O Bom SamaritanoAlex Camargo

Alta performance em IA: uma abordagem praticaAlex Camargo

Bioinformática do DNA ao medicamento: ferramentas e usabilidadeAlex Camargo

Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoAlex Camargo

IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)Alex Camargo

Algoritmos de inteligência artificial para classificação de notícias falsas. ...Alex Camargo

Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoAlex Camargo

PredictCovid: IA. SIEPE UNIPAMPA 2020Alex Camargo

Ia versus covid 19 - alexAlex Camargo

2a Mini-conf PredictCovid. Field: Artificial IntelligenceAlex Camargo

Aula 5 - Considerações finaisAlex Camargo

Aula 04 - Injeção de código (Cross-Site Scripting)Alex Camargo

Recently uploaded (20)

Data Science Courses in India iim skillsdharnathakur29

This comprehensive Data Science course is designed to equip learners with the essential skills and knowledge required to analyze, interpret, and visualize complex data. Covering both theoretical concepts and practical applications, the course introduces tools and techniques used in the data science field, such as Python programming, data wrangling, statistical analysis, machine learning, and data visualization.

chapter 4 Variability statistical research .pptxjustinebandajbn

Deloitte Analytics - Applying Process Mining in an audit contextProcess mining Evangelist

Mieke Jans is a Manager at Deloitte Analytics Belgium. She learned about process mining from her PhD supervisor while she was collaborating with a large SAP-using company for her dissertation. Mieke extended her research topic to investigate the data availability of process mining data in SAP and the new analysis possibilities that emerge from it. It took her 8-9 months to find the right data and prepare it for her process mining analysis. She needed insights from both process owners and IT experts. For example, one person knew exactly how the procurement process took place at the front end of SAP, and another person helped her with the structure of the SAP-tables. She then combined the knowledge of these different persons.

C++_OOPs_DSA1_Presentation_Template.pptxaquibnoor22079

Developing Security Orchestration, Automation, and Response ApplicationsVICTOR MAESTRE RAMIREZ

VKS-Python-FIe Handling text CSV Binary.pptxVinod Srivastava

Digilocker under workingProcess Flow.pptxsatnamsadguru491

Process Mining and Data Science in the Financial IndustryProcess mining Evangelist

Lalit Wangikar, a partner at CKM Advisors, is an experienced strategic consultant and analytics expert. He started looking for data driven ways of conducting process discovery workshops. When he read about process mining the first time around, about 2 years ago, the first feeling was: “I wish I knew of this while doing the last several projects!". Interviews are subject to all the whims human recollection is subject to: specifically, recency, simplification and self preservation. Interview-based process discovery, therefore, leaves out a lot of “outliers” that usually end up being one of the biggest opportunity area. Process mining, in contrast, provides an unbiased, fact-based, and a very comprehensive understanding of actual process execution.

Cleaned_Lecture 6666666_Simulation_I.pdfalcinialbob1234

Molecular methods diagnostic and monitoring of infection - Repaired.pptx7tzn7x5kky

Principles of information security Chapter 5.pptEstherBaguma

Stack_and_Queue_Presentation_Final (1).pptxbinduraniha86

Calories_Prediction_using_Linear_Regression.pptxTijiLMAHESHWARI

Decision Trees in Artificial-Intelligence.pdfSaikat Basu

EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbJessaMaeEvangelista2

Classification_in_Machinee_Learning.pptxwencyjorda88

IAS-slides2-ia-aaaaaaaaaaain-business.pdfmcgardenlevi9

Conic Sectionfaggavahabaayhahahahahs.pptxtaiwanesechetan

LLM finetuning for multiple choice google bertChadapornK

04302025_CCC TUG_DataVista: The Design Storyccctableauusergroup

Data Science Courses in India iim skillsdharnathakur29

chapter 4 Variability statistical research .pptxjustinebandajbn

Deloitte Analytics - Applying Process Mining in an audit contextProcess mining Evangelist

C++_OOPs_DSA1_Presentation_Template.pptxaquibnoor22079

Developing Security Orchestration, Automation, and Response ApplicationsVICTOR MAESTRE RAMIREZ

VKS-Python-FIe Handling text CSV Binary.pptxVinod Srivastava

Digilocker under workingProcess Flow.pptxsatnamsadguru491

Process Mining and Data Science in the Financial IndustryProcess mining Evangelist

Cleaned_Lecture 6666666_Simulation_I.pdfalcinialbob1234

Molecular methods diagnostic and monitoring of infection - Repaired.pptx7tzn7x5kky

Principles of information security Chapter 5.pptEstherBaguma

Stack_and_Queue_Presentation_Final (1).pptxbinduraniha86

Calories_Prediction_using_Linear_Regression.pptxTijiLMAHESHWARI

Decision Trees in Artificial-Intelligence.pdfSaikat Basu

EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbJessaMaeEvangelista2

Classification_in_Machinee_Learning.pptxwencyjorda88

IAS-slides2-ia-aaaaaaaaaaain-business.pdfmcgardenlevi9

Conic Sectionfaggavahabaayhahahahahs.pptxtaiwanesechetan

LLM finetuning for multiple choice google bertChadapornK

04302025_CCC TUG_DataVista: The Design Storyccctableauusergroup

Data compression with Python: application of different algorithms with the use of threads in genome files

1. Data compression with Python: application of differentData compression with Python: application of different algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files Vinicius Seus¹ ([email protected]) Alex Camargo¹ ([email protected]) Diego Mengarda² ([email protected]) ¹FURG ²UNIPAMPA Brazil MOL2NET International Conference Series on Multidisciplinary Sciences http://sciforum.net/conference/mol2net-03

2. 2 Introduction This work proposes and evaluates an implementation of different algorithms of data compression using the Python programming language allied to the use of threads.  As a case study it was used genomic data available from the NCBI (National Center for Biotechnology Information) public database. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

3. 3 Introduction MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition Figure 1. Graphical Abstract

4. 4 Materials and Methods For the experimental environment it was used a simplified structure composed of a computer with: 3.07GHz processor (12 cores), 12 GB RAM and 500 GB hard disk.  Table 1 shows the results of the experiments for the file "ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of 122 MB, referring to an excerpt from the genome identified by "Aligator sinensis", belonging to the family Alligatoridae. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

5. 5 Conclusions The main contribution of this work was to present an algorithm option for data compression based on the Python programming language.  By default, the algorithms were not designed to work in parallel, however, with the use of the Python Threading library this was achieved.  With the experimental environment implemented, it was possible to analyze the performance of both the compression rate and the compression time for each algorithm.  As future works, we intend to extend the range of algorithms to be studied as well as the application and analysis of decompression metrics with emphasis on public genomic data. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

6. 6 References BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression algorithm. 1994. VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto Alegre, Brasil, v. 1, 2014. WELCH, Terry A. A technique for high-performance data compression. Computer, v. 6, n. 17, p. 8-19, 1984. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

Data compression with Python: application of different algorithms with the use of threads in genome files

Recommended

More Related Content

Similar to Data compression with Python: application of different algorithms with the use of threads in genome files (20)

More from Alex Camargo (20)

Recently uploaded (20)

Data compression with Python: application of different algorithms with the use of threads in genome files