Multiscale computing involves combining models with different scales, such as combining molecular and continuum models. It is done by coupling simulation codes through various frameworks. This allows separate models to be run on different computational resources and updated independently while being coupled together. The document discusses several multiscale applications, such as modeling blood flow and clay-polymer nanocomposites, and technologies for automating code coupling like MUSCLE, MPWide and FabSim. It concludes that multiscale computing is increasingly important for accurately modeling large physical systems and the software exists to conveniently combine models and run them in parallel on supercomputers.
This document provides a beginner's guide to using the igraph library in Python. It discusses what networks are, how to realize and visualize graphs in igraph. It also covers scale-free networks and power laws, using a co-authorship dataset to demonstrate that these networks follow a power law distribution with a few nodes having many connections. Code and a plot obtained are shown as an example of analyzing a scale-free network in igraph.
The document discusses representing imaging mass spectrometry (MS) data. It describes imzML, a common data standard for MS imaging data. It also outlines how MS imaging data can be submitted to the ProteomeXchange repository via the PRIDE database. MS imaging generates data from tissue sections, and imzML encodes both the raw data files and metadata about the images. Submitting to ProteomeXchange involves uploading raw data files, result files, and metadata descriptions to allow sharing and reuse of MS imaging experiments.
The document discusses various topics related to bioinformatics and the internet. It provides multiple choice questions about when the term bioinformatics emerged, what it is regarded as part of, who created the first bioinformatics database, and more. It also covers topics like internet protocols, web browsers, operating systems, and bioinformatics databases.
The document discusses using high-throughput computational methods and machine learning to discover materials with coexisting magnetic and topological orders. It presents a workflow that couples magnetic and topological property prediction with the existing Materials Project infrastructure. This workflow involves calculating magnetic orderings for over 3,000 transition metal oxides, predicting critical temperatures, and classifying materials using machine learning before determining topological band properties. The results have uncovered 27 ferromagnetic semimetal and 7 antiferromagnetic topological insulator candidates.
The IUGONET project and its international cooperation on development of metad...Iugo Net
This document discusses the IUGONET project and its international cooperation on developing a metadata database for upper atmospheric study. It notes that IUGONET has developed infrastructures and tools like a metadata database and data analysis software to facilitate distribution and use of ground-based upper atmospheric data. IUGONET is in discussion with other international projects like SPASE and ESPAS to collaborate on their metadata databases. The metadata database currently includes data from IUGONET institutions as well as other Japanese organizations conducting upper atmospheric research.
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyNathan Frey, PhD
Machine learning and artificial intelligence have transformed our online experience, and for an increasing number of individuals, these fields are fundamentally changing the way we work. In this talk, I will discuss how machine learning is used in the physical sciences, particularly materials science and chemistry, and what transformative impacts we have seen or might expect to see in the future. This discussion will focus on the unique challenges (and opportunities) faced by materials and chemistry researchers applying machine learning in their work. I will present a brief introduction to machine learning for physical scientists and give examples related to synthesis, property prediction and engineering, and artificial intelligence that “reads” research articles. These examples will introduce some of the most prevalent and useful open-source software tools that drive modern machine learning applications. Two significant themes will be emphasized throughout: the careful evaluation of machine learning results and the central importance of data quality and quantity. Finally, I will provide some mundane, “human learned” speculation about the future of machine learning in physical science and recommended resources for further study.
Optimizing queries via search server ElasticSearch: a study applied to large ...Alex Camargo
This document summarizes a study that optimized queries on large genomic datasets using the ElasticSearch search server. The researchers compared ElasticSearch to MySQL and PostgreSQL databases. They found that ElasticSearch achieved significantly faster response times for queries, with performance gains of 91.7% over MySQL and 94.9% over PostgreSQL. As future work, the researchers intend to investigate adapting searches in ElasticSearch to support autocomplete queries using LIKE, as supported in other databases.
Computational methods to analyze biological data. It is a way to introduce some of the many resources available for analyzing sequence data with bioinformatics software. This paper will cover the theoretical approaches to data resources and we will get knowledge about some sequential alignments with its databases. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics, and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Databases are essential for bioinformatics research and applications. Many databases exist, covering various information types for example, DNA and protein sequences, molecular structures, phenotypes, and biodiversity. Databases may contain empirical data. Conceptualizing biology in terms of molecules and then applying informatics techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale. In this materialistic world, People are studying bioinformatics in different ways. Some people are devoted to developing new computational tools, both from software and hardware viewpoints, for the better handling and processing of biological data. They develop new models and new algorithms for existing questions and propose and tackle new questions when new experimental techniques bring in new data. Other people take the study of bioinformatics as the study of biology with the viewpoint of informatics and systems. Durgesh Raghuvanshi | Vivek Solanki | Neha Arora | Faiz Hashmi "Computational of Bioinformatics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30891.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30891/computational-of-bioinformatics/durgesh-raghuvanshi
We propose the Bio-UnaGrid infrastructure to facilitate the automatic execution of intensive-computing workflows that require the use of existing application suites and distributed computing infrastructures. With Bio-UnaGrid, bioinformatics workflows are easily created and executed, with a simple click and in a transparent manner, on different cluster and grid computing infrastructures (line command is not used). To provide more processing capabilities, at low cost, Bio-UnaGrid use the idle processing capabilities of computer labs with Windows, Linux and Mac desktop computers, using a key virtualization strategy. We implement Bio-UnaGrid in a dedicated cluster and a computer lab. Results of performance tests evidence the gain obtained by our researchers.
Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Using Python makes Programmers more productive and their programs ultimately better. Python is continued to be a favorite option for data scientists who use it for building and using Machine learning applications and other scientific computations.
Python runs on Windows, Linux/Unix, Mac OS and has been ported to Java and .NET virtual machines. Python is free to use, even for the commercial products, because of its OSI-approved open source license.
Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain.
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
This document discusses the challenges and opportunities biology faces with increasing data generation. It outlines four key points:
1) Research approaches for analyzing infinite genomic data streams, such as digital normalization which compresses data while retaining information.
2) The need for usable software and decentralized infrastructure to perform real-time, streaming data analysis.
3) The importance of open science and reproducibility given most researchers cannot replicate their own computational analyses.
4) The lack of data analysis training in biology and efforts at UC Davis to address this through workshops and community building.
Scientific Workflows: what do we have, what do we miss?Paolo Romano
This document discusses scientific workflows and outlines some key points:
- Scientific workflows are used to automate data retrieval and analysis processes from multiple databases and tools. Workflow management systems help implement these processes.
- Issues with current workflow systems include lack of automatic composition capabilities, performance limitations especially with large data volumes, and ensuring reproducibility of results over time as databases and tools change.
- The document outlines approaches to address these issues such as using ontologies to support automatic composition, optimizing for performance through parallelization and alternative services, and capturing provenance data to improve reproducibility and reuse of analyses.
This document discusses using the T-BioInfo platform to provide practical education in bioinformatics. It describes how the platform can integrate different types of omics data and analysis into intuitive, visual pipelines. This allows non-experts to analyze and interpret complex datasets. Example projects are provided, such as using RNA-seq data to identify genes involved in a disease. The goal is to teach bioinformatics through collaborative, project-based learning without requiring programming skills. Learners would reconstruct simulated biological processes and contribute to ongoing analysis of real scientific datasets.
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
The process of building ontology is a very
complex and time
-
consuming process
especially when dealing
with huge amount of data. Unfortunately current
marketed
tools are very limited and don’t meet
all
user
needs.
Indeed, t
hese software build the core of the ontology from initial data that generates
a
big number of
information.
In this paper, we
aim to resolve these problems
by adding an extension to the well known
ontology editor Protégé in order to work towards a complete
FCA
-
based framework
which resolves the
limitation of other tools in
building fuzzy
-
ontology
.
W
e will give
, in this paper
, some
details on
our
sem
i
-
automat
ic collaborative tool
called FOD Tab Plug
-
in
which
takes into consideration another degree of
granularity in the process of generation
.
In fact, i
t follows a bottom
-
up strategy based on conceptual
clustering, fuzzy logic and Formal Concept Analysis (FCA) a
nd it defines ontology between classes
resulting from a preliminary classification of data and not from the initial large amount of data
.
It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation. Multi-SNP interactions, also known as epistatic interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. However, epistatic analysis methods are both computationally expensive, and have limited accessibility for biologists wanting to analyse GWAS datasets due to being command line based. Here we present APPistatic, a prototype desktop version of a pipeline for epistatic analysis of GWAS datasets. his application combines ease-of-use, via a GUI, with accelerated implementation of BOOST and FaST-LMM epistatic analysis methods.
Towards reproducibility and maximally-open dataPablo Bernabeu
Presented at the Open Scholarship Prize Competition 2021, organised by Open Scholarship Community Galway.
Video of the presentation: https://nuigalway.mediaspace.kaltura.com/media/OSW2021A+OSCG+Open+Scholarship+Prize+-+The+Final!/1_d7ekd3d3/121659351#t=56:08
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
Nowadays, real-time systems and intelligent systems offer more and more control interface based on voice
recognition or human language recognition. Robots and drones will soon be mainly controlled by voice.
Other robots will integrate bots to interact with their users, this can be useful both in industry and
entertainment. At first, researchers were digging on the side of "ontology reasoning". Given all the
technical constraints brought by the treatment of ontologies, an interesting solution has emerged in last
years: the construction of a model based on machine learning to connect a human language to a knowledge
base (based for example on RDF). We present in this paper our contribution to build a bot that could be
used on real-time systems and drones/robots, using recent machine learning technologies.
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
Nowadays, real-time systems and intelligent systems offer more and more control interface based on voice recognition or human language recognition. Robots and drones will soon be mainly controlled by voice. Other robots will integrate bots to interact with their users, this can be useful both in industry and entertainment. At first, researchers were digging on the side of "ontology reasoning". Given all the technical constraints brought by the treatment of ontologies, an interesting solution has emerged in last years: the construction of a model based on machine learning to connect a human language to a knowledge
base (based for example on RDF). We present in this paper our contribution to build a bot that could be used on real-time systems and drones/robots, using recent machine learning technologies.
Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman
The International Journal of Computer Science and Information Security (IJCSIS) is a publication venue for novel research in computer science and information security. This issue from December 2010 contains 5 research papers. The first paper proposes a 128-bit chaotic hash function that uses the logistic map and MD5/SHA-1 hashes. The second paper discusses constructing an ontology for representing human emotions in videos to improve video retrieval. The third paper proposes an intelligent memory controller for H.264 encoders to reduce external memory access. The fourth paper investigates the impact of fragmentation on query performance in distributed databases. The fifth paper examines the effect of guard intervals in a proposed MIMO-OFDM system for wireless communication.
The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document discusses using Python coding to store datasets in Hadoop databases for real-time applications. It begins by introducing big data and where Hadoop databases are used. The basic platform of Hadoop stores datasets using Java programs but Python is proposed as it is more user-friendly, efficient to code, debug and execute on all platforms. Examples are given comparing Python and Java programs. The major differences between the languages are outlined in a table. The document then discusses using Python for various real-world projects and platforms before concluding Python is better suited than Java for big data applications.
Slides from the presentation at IDAMO 2016, Rostock. May 2016.
Most scientific discoveries rely on previous or other findings. A lack of transparency and openness led to what many consider the "reproducibility crisis" in systems biology and systems medicine. The crisis arose from missing standards and inappropriate support of
standards in software tools. As a consequence, numerous results in low-and high-profile publications cannot be reproduced.
In my presentation, I summarise key challenges of reproducibility in systems biology and systems medicine, and I demonstrate available solutions to the related problems.
Data-to-text technologies present an enormous and exciting opportunity to help
audiences understand some of the insights present in today’s vasts and growing amounts of electronic
data. In this article we analyze the potential value and benefits of these solutions as well as their risks
and limitations for a wider penetration. These technologies already bring substantial advantages of
cost, time, accuracy and clarity versus other traditional approaches or format. On the other hand,
there are still important limitations that restrict the broad applicability of these solutions, most
importantly in the limited quality of their output. However we find that the current state of
development is sufficient for the application of these solution across many domains and use cases and
recommend businesses of all sectors to consider how to deploy them to enhance the value they are
currently getting from their data. As the availability of data keeps growing exponentially and natural
language generation technology keeps improving, we expect data-to-text solutions to take a much
more bigger role in the production of automated content across many different domains.
This document discusses using large language models like GPT-3, Codex, and ChatGPT to generate data visualizations from natural language queries. It conducted experiments providing natural language prompts to these models to generate Python scripts for visualizations. The results showed the models were effective at producing visualizations from natural language when supported by well-engineered prompts, demonstrating large language models can support end-to-end generation of visualizations from natural language input.
O documento apresenta um plano de aula sobre a história da Igreja, abordando tópicos como a fundação da Igreja no dia de Pentecostes, as ordenações como batismo e ceia do Senhor, as missões da Igreja de pregar o evangelho e edificar os membros, e uma introdução aos principais períodos da história da Igreja desde a era apostólica até a Igreja no Brasil.
O documento apresenta um plano de aula sobre demonologia, abordando a doutrina de Satanás e dos demônios em 8 capítulos. O curso discutirá a existência, natureza e influência de Satanás e dos demônios de acordo com as Escrituras, incluindo a queda de Lúcifer, a personalidade e obra de Satanás, assim como o destino final de ser lançado no lago de fogo.
Ad
More Related Content
Similar to Data compression with Python: application of different algorithms with the use of threads in genome files (20)
Optimizing queries via search server ElasticSearch: a study applied to large ...Alex Camargo
This document summarizes a study that optimized queries on large genomic datasets using the ElasticSearch search server. The researchers compared ElasticSearch to MySQL and PostgreSQL databases. They found that ElasticSearch achieved significantly faster response times for queries, with performance gains of 91.7% over MySQL and 94.9% over PostgreSQL. As future work, the researchers intend to investigate adapting searches in ElasticSearch to support autocomplete queries using LIKE, as supported in other databases.
Computational methods to analyze biological data. It is a way to introduce some of the many resources available for analyzing sequence data with bioinformatics software. This paper will cover the theoretical approaches to data resources and we will get knowledge about some sequential alignments with its databases. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics, and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Databases are essential for bioinformatics research and applications. Many databases exist, covering various information types for example, DNA and protein sequences, molecular structures, phenotypes, and biodiversity. Databases may contain empirical data. Conceptualizing biology in terms of molecules and then applying informatics techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale. In this materialistic world, People are studying bioinformatics in different ways. Some people are devoted to developing new computational tools, both from software and hardware viewpoints, for the better handling and processing of biological data. They develop new models and new algorithms for existing questions and propose and tackle new questions when new experimental techniques bring in new data. Other people take the study of bioinformatics as the study of biology with the viewpoint of informatics and systems. Durgesh Raghuvanshi | Vivek Solanki | Neha Arora | Faiz Hashmi "Computational of Bioinformatics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30891.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30891/computational-of-bioinformatics/durgesh-raghuvanshi
We propose the Bio-UnaGrid infrastructure to facilitate the automatic execution of intensive-computing workflows that require the use of existing application suites and distributed computing infrastructures. With Bio-UnaGrid, bioinformatics workflows are easily created and executed, with a simple click and in a transparent manner, on different cluster and grid computing infrastructures (line command is not used). To provide more processing capabilities, at low cost, Bio-UnaGrid use the idle processing capabilities of computer labs with Windows, Linux and Mac desktop computers, using a key virtualization strategy. We implement Bio-UnaGrid in a dedicated cluster and a computer lab. Results of performance tests evidence the gain obtained by our researchers.
Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Using Python makes Programmers more productive and their programs ultimately better. Python is continued to be a favorite option for data scientists who use it for building and using Machine learning applications and other scientific computations.
Python runs on Windows, Linux/Unix, Mac OS and has been ported to Java and .NET virtual machines. Python is free to use, even for the commercial products, because of its OSI-approved open source license.
Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain.
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
This document discusses the challenges and opportunities biology faces with increasing data generation. It outlines four key points:
1) Research approaches for analyzing infinite genomic data streams, such as digital normalization which compresses data while retaining information.
2) The need for usable software and decentralized infrastructure to perform real-time, streaming data analysis.
3) The importance of open science and reproducibility given most researchers cannot replicate their own computational analyses.
4) The lack of data analysis training in biology and efforts at UC Davis to address this through workshops and community building.
Scientific Workflows: what do we have, what do we miss?Paolo Romano
This document discusses scientific workflows and outlines some key points:
- Scientific workflows are used to automate data retrieval and analysis processes from multiple databases and tools. Workflow management systems help implement these processes.
- Issues with current workflow systems include lack of automatic composition capabilities, performance limitations especially with large data volumes, and ensuring reproducibility of results over time as databases and tools change.
- The document outlines approaches to address these issues such as using ontologies to support automatic composition, optimizing for performance through parallelization and alternative services, and capturing provenance data to improve reproducibility and reuse of analyses.
This document discusses using the T-BioInfo platform to provide practical education in bioinformatics. It describes how the platform can integrate different types of omics data and analysis into intuitive, visual pipelines. This allows non-experts to analyze and interpret complex datasets. Example projects are provided, such as using RNA-seq data to identify genes involved in a disease. The goal is to teach bioinformatics through collaborative, project-based learning without requiring programming skills. Learners would reconstruct simulated biological processes and contribute to ongoing analysis of real scientific datasets.
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
The process of building ontology is a very
complex and time
-
consuming process
especially when dealing
with huge amount of data. Unfortunately current
marketed
tools are very limited and don’t meet
all
user
needs.
Indeed, t
hese software build the core of the ontology from initial data that generates
a
big number of
information.
In this paper, we
aim to resolve these problems
by adding an extension to the well known
ontology editor Protégé in order to work towards a complete
FCA
-
based framework
which resolves the
limitation of other tools in
building fuzzy
-
ontology
.
W
e will give
, in this paper
, some
details on
our
sem
i
-
automat
ic collaborative tool
called FOD Tab Plug
-
in
which
takes into consideration another degree of
granularity in the process of generation
.
In fact, i
t follows a bottom
-
up strategy based on conceptual
clustering, fuzzy logic and Formal Concept Analysis (FCA) a
nd it defines ontology between classes
resulting from a preliminary classification of data and not from the initial large amount of data
.
It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation. Multi-SNP interactions, also known as epistatic interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. However, epistatic analysis methods are both computationally expensive, and have limited accessibility for biologists wanting to analyse GWAS datasets due to being command line based. Here we present APPistatic, a prototype desktop version of a pipeline for epistatic analysis of GWAS datasets. his application combines ease-of-use, via a GUI, with accelerated implementation of BOOST and FaST-LMM epistatic analysis methods.
Towards reproducibility and maximally-open dataPablo Bernabeu
Presented at the Open Scholarship Prize Competition 2021, organised by Open Scholarship Community Galway.
Video of the presentation: https://nuigalway.mediaspace.kaltura.com/media/OSW2021A+OSCG+Open+Scholarship+Prize+-+The+Final!/1_d7ekd3d3/121659351#t=56:08
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
Nowadays, real-time systems and intelligent systems offer more and more control interface based on voice
recognition or human language recognition. Robots and drones will soon be mainly controlled by voice.
Other robots will integrate bots to interact with their users, this can be useful both in industry and
entertainment. At first, researchers were digging on the side of "ontology reasoning". Given all the
technical constraints brought by the treatment of ontologies, an interesting solution has emerged in last
years: the construction of a model based on machine learning to connect a human language to a knowledge
base (based for example on RDF). We present in this paper our contribution to build a bot that could be
used on real-time systems and drones/robots, using recent machine learning technologies.
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
Nowadays, real-time systems and intelligent systems offer more and more control interface based on voice recognition or human language recognition. Robots and drones will soon be mainly controlled by voice. Other robots will integrate bots to interact with their users, this can be useful both in industry and entertainment. At first, researchers were digging on the side of "ontology reasoning". Given all the technical constraints brought by the treatment of ontologies, an interesting solution has emerged in last years: the construction of a model based on machine learning to connect a human language to a knowledge
base (based for example on RDF). We present in this paper our contribution to build a bot that could be used on real-time systems and drones/robots, using recent machine learning technologies.
Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman
The International Journal of Computer Science and Information Security (IJCSIS) is a publication venue for novel research in computer science and information security. This issue from December 2010 contains 5 research papers. The first paper proposes a 128-bit chaotic hash function that uses the logistic map and MD5/SHA-1 hashes. The second paper discusses constructing an ontology for representing human emotions in videos to improve video retrieval. The third paper proposes an intelligent memory controller for H.264 encoders to reduce external memory access. The fourth paper investigates the impact of fragmentation on query performance in distributed databases. The fifth paper examines the effect of guard intervals in a proposed MIMO-OFDM system for wireless communication.
The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document discusses using Python coding to store datasets in Hadoop databases for real-time applications. It begins by introducing big data and where Hadoop databases are used. The basic platform of Hadoop stores datasets using Java programs but Python is proposed as it is more user-friendly, efficient to code, debug and execute on all platforms. Examples are given comparing Python and Java programs. The major differences between the languages are outlined in a table. The document then discusses using Python for various real-world projects and platforms before concluding Python is better suited than Java for big data applications.
Slides from the presentation at IDAMO 2016, Rostock. May 2016.
Most scientific discoveries rely on previous or other findings. A lack of transparency and openness led to what many consider the "reproducibility crisis" in systems biology and systems medicine. The crisis arose from missing standards and inappropriate support of
standards in software tools. As a consequence, numerous results in low-and high-profile publications cannot be reproduced.
In my presentation, I summarise key challenges of reproducibility in systems biology and systems medicine, and I demonstrate available solutions to the related problems.
Data-to-text technologies present an enormous and exciting opportunity to help
audiences understand some of the insights present in today’s vasts and growing amounts of electronic
data. In this article we analyze the potential value and benefits of these solutions as well as their risks
and limitations for a wider penetration. These technologies already bring substantial advantages of
cost, time, accuracy and clarity versus other traditional approaches or format. On the other hand,
there are still important limitations that restrict the broad applicability of these solutions, most
importantly in the limited quality of their output. However we find that the current state of
development is sufficient for the application of these solution across many domains and use cases and
recommend businesses of all sectors to consider how to deploy them to enhance the value they are
currently getting from their data. As the availability of data keeps growing exponentially and natural
language generation technology keeps improving, we expect data-to-text solutions to take a much
more bigger role in the production of automated content across many different domains.
This document discusses using large language models like GPT-3, Codex, and ChatGPT to generate data visualizations from natural language queries. It conducted experiments providing natural language prompts to these models to generate Python scripts for visualizations. The results showed the models were effective at producing visualizations from natural language when supported by well-engineered prompts, demonstrating large language models can support end-to-end generation of visualizations from natural language input.
O documento apresenta um plano de aula sobre a história da Igreja, abordando tópicos como a fundação da Igreja no dia de Pentecostes, as ordenações como batismo e ceia do Senhor, as missões da Igreja de pregar o evangelho e edificar os membros, e uma introdução aos principais períodos da história da Igreja desde a era apostólica até a Igreja no Brasil.
O documento apresenta um plano de aula sobre demonologia, abordando a doutrina de Satanás e dos demônios em 8 capítulos. O curso discutirá a existência, natureza e influência de Satanás e dos demônios de acordo com as Escrituras, incluindo a queda de Lúcifer, a personalidade e obra de Satanás, assim como o destino final de ser lançado no lago de fogo.
Python para finanças: explorando dados financeirosAlex Camargo
[1] O documento apresenta uma palestra sobre Python para finanças, explorando dados financeiros no FLISOL 2023. [2] É introduzido o mercado financeiro e seus principais conceitos. Em seguida, é explicado como Python é usado na área financeira, por meio de bibliotecas, coleta e visualização de dados e modelagem. [3] Por fim, é apresentado um estudo de caso utilizando o Google Colab para acessar dados de ações e visualizá-los.
A practical guide: How to use Bitcoins?Alex Camargo
This document provides a practical guide on how to use Bitcoins. It discusses Alex Camargo's presentations on cryptocurrencies and Bitcoin. It then introduces Bitcoin, explaining that it operates on a decentralized network using blockchain technology. It outlines the steps to use Bitcoins, including getting a wallet, purchasing coins, sending coins, and using them to make purchases. Finally, it concludes that Bitcoins provide benefits like low fees but also stresses the importance of security and awareness of risks like volatility.
IA e Bioinformática: modelos computacionais de proteínasAlex Camargo
Este documento apresenta uma palestra sobre inteligência artificial e bioinformática, com foco em modelos computacionais de proteínas. Apresenta breve introdução sobre IA, bioinformática e suas aplicações, abordando problemas como predição de estrutura e função de proteínas, alinhamento de sequências e desenvolvimento de fármacos. Discorre também sobre tendências da área, como aprendizado de máquina e processamento paralelo, e aplicações em diagnóstico médico.
Introdução às criptomoedas: investimento, mercado e segurançaAlex Camargo
O documento introduz conceitos fundamentais sobre criptomoedas, incluindo criptografia, chaves privadas e públicas, protocolos, blockchains e Bitcoin. Ele também fornece exemplos práticos de como usar sites como CoinMarketCap e corretoras, analisar projetos e gerenciar carteiras de criptomoedas.
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!Alex Camargo
O documento introduz conceitos sobre criptomoedas como Bitcoin, explicando o que são criptografia, chaves privadas, protocolos e blockchains. Também aborda como criar sua própria moeda criptográfica.
Cristão versus Redes Sociais - Alex (Arca da Aliança)Alex Camargo
O documento discute como os cristãos devem usar as redes sociais de forma ética, reconhecendo seus riscos e benefícios. Aborda tópicos como a natureza viciante das redes, a privacidade de dados, a sensualização de imagens, e a possibilidade de evangelização digital se feita com sabedoria e discrição. O objetivo é motivar um uso consciente das redes sob uma perspectiva cristã.
O documento apresenta uma palestra sobre empatia e compaixão com base na parábola bíblica do Bom Samaritano em Lucas 10:36-37. A palestra discute quem eram os samaritanos, o relato do crime contra o homem abandonado e como o sacerdote e o levita não o ajudaram, ao contrário do samaritano que teve compaixão. A mensagem principal é sobre a importância de ter empatia e agir com compaixão para com os necessitados, assim como o Bom Samaritano fez.
Alta performance em IA: uma abordagem praticaAlex Camargo
O documento discute alta performance em inteligência artificial (IA) de forma prática. Apresenta o palestrante Alex Camargo e seus projetos em IA aplicada, como sistemas de apoio médico. Discutem conceitos como aprendizado de máquina (ML), aprendizado profundo (DL) e ferramentas para desenvolvimento de IA como Python, TensorFlow e PyTorch. Demonstra experimentos com paralelismo em redes neurais profundas usando módulos como tf.data para melhorar a velocidade. Por fim, aborda considerações sobre o mercado de trabalho em
Bioinformática do DNA ao medicamento: ferramentas e usabilidadeAlex Camargo
O documento discute bioinformática, definindo-a como o emprego de ferramentas computacionais no estudo de problemas biológicos. Aborda a história da bioinformática desde a descoberta da estrutura do DNA, o Projeto Genoma Humano, e o desenvolvimento de estratégias de planejamento de fármacos utilizando ferramentas computacionais. Também discute os principais problemas alvo da bioinformática, como análise de sequências e estruturas, e tendências atuais como manipulação de grandes dados, processamento paral
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoAlex Camargo
O documento discute a aplicação da inteligência artificial para reconhecimento de caracteres escritos à mão. Ele apresenta os conceitos e ferramentas como Keras, TensorFlow e OpenCV usadas no reconhecimento óptico de caracteres (OCR) e reconhecimento de escrita manual com deep learning. O autor também fornece códigos e referências sobre o tema.
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)Alex Camargo
O documento apresenta uma introdução à inteligência artificial (IA) e deep learning, incluindo redes neurais convolucionais (CNNs). É demonstrado como a IA pode ser usada para identificar casos de COVID-19 em raios-X do pulmão usando o Google Colab para treinar modelos de aprendizagem profunda.
Algoritmos de inteligência artificial para classificação de notícias falsas. ...Alex Camargo
Este documento resume 3 trabalhos acadêmicos sobre classificação de notícias falsas usando inteligência artificial. O trabalho de Costa (2019) obteve a maior acurácia (97,5%) usando word embedding Glove e redes neurais convolucionais treinadas em um dataset de 28711 registros. Os outros trabalhos obtiveram acurácias menores usando LSVC ou LSVM com TF-IDF em datasets menores.
Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoAlex Camargo
O documento discute conceitos, métodos e aplicações relacionados à identificação e mitigação de notícias falsas. Aborda tópicos como definição de fake news, tipos de desinformação, legislação aplicável, métodos computacionais para detecção e projetos sobre o tema. Apresenta também perfis dos autores e referências bibliográficas.
O documento descreve o sistema PredictCovid, que usa inteligência artificial para apoiar a triagem de pacientes com suspeita de COVID-19. O sistema treina um modelo de deep learning usando imagens médicas e pode classificar novos casos como positivo ou negativo. O objetivo é fornecer uma ferramenta gratuita e segura para auxiliar médicos durante a pandemia. Os resultados iniciais mostraram alta acurácia na classificação de imagens de raio-x.
O documento apresenta uma palestra sobre inteligência artificial e COVID-19. A palestra inclui uma introdução do palestrante, detalhes sobre o projeto PredictCovid para triagem de pacientes, uma explicação geral de conceitos de IA, ML e DL, e demonstrações de ferramentas para desenvolvimento de sistemas de IA.
1. The team trained a CNN model on a COVID-19 X-ray image dataset to automatically detect COVID-19 in chest X-rays. They used tools like TensorFlow, Keras, and Python.
2. They evaluated the model using techniques like cross-validation, data augmentation, TensorBoard for visualization, and checkpointing to save models during training.
3. Future work could focus on reducing memory usage, improving model interpretation, and developing multi-modal COVID detectors using different types of medical data.
Este documento apresenta as considerações finais do Módulo VII - Desenvolvimento Web de um curso de Introdução à Segurança da Informação e de Sistemas. Discute princípios importantes de desenvolvimento web seguro, como validação de dados de entrada, projeto para implementar políticas de segurança e defesa em camadas. Recomenda recursos adicionais sobre o tópico no YouTube.
Aula 04 - Injeção de código (Cross-Site Scripting)Alex Camargo
O documento apresenta um plano de aula sobre Cross-Site Scripting (XSS). A aula irá explicar o conceito de execução de comandos entre sites diferentes através de JavaScript injetado em formulários de uma aplicação acadêmica vulnerável. Serão mostrados exemplos práticos de código-fonte vulnerável e corrigido para prevenir ataques XSS, e os alunos farão um quiz sobre o tema.
This comprehensive Data Science course is designed to equip learners with the essential skills and knowledge required to analyze, interpret, and visualize complex data. Covering both theoretical concepts and practical applications, the course introduces tools and techniques used in the data science field, such as Python programming, data wrangling, statistical analysis, machine learning, and data visualization.
Mieke Jans is a Manager at Deloitte Analytics Belgium. She learned about process mining from her PhD supervisor while she was collaborating with a large SAP-using company for her dissertation.
Mieke extended her research topic to investigate the data availability of process mining data in SAP and the new analysis possibilities that emerge from it. It took her 8-9 months to find the right data and prepare it for her process mining analysis. She needed insights from both process owners and IT experts. For example, one person knew exactly how the procurement process took place at the front end of SAP, and another person helped her with the structure of the SAP-tables. She then combined the knowledge of these different persons.
Lalit Wangikar, a partner at CKM Advisors, is an experienced strategic consultant and analytics expert. He started looking for data driven ways of conducting process discovery workshops. When he read about process mining the first time around, about 2 years ago, the first feeling was: “I wish I knew of this while doing the last several projects!".
Interviews are subject to all the whims human recollection is subject to: specifically, recency, simplification and self preservation. Interview-based process discovery, therefore, leaves out a lot of “outliers” that usually end up being one of the biggest opportunity area. Process mining, in contrast, provides an unbiased, fact-based, and a very comprehensive understanding of actual process execution.
Decision Trees in Artificial-Intelligence.pdfSaikat Basu
Have you heard of something called 'Decision Tree'? It's a simple concept which you can use in life to make decisions. Believe you me, AI also uses it.
Let's find out how it works in this short presentation. #AI #Decisionmaking #Decisions #Artificialintelligence #Data #Analysis
https://saikatbasu.me
Data compression with Python: application of different algorithms with the use of threads in genome files
1. Data compression with Python: application of differentData compression with Python: application of different
algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files
Vinicius Seus¹ ([email protected])
Alex Camargo¹ ([email protected])
Diego Mengarda² ([email protected])
¹FURG
²UNIPAMPA
Brazil
MOL2NET
International Conference Series on Multidisciplinary Sciences
http://sciforum.net/conference/mol2net-03
2. 2
Introduction
This work proposes and evaluates an implementation of different
algorithms of data compression using the Python programming
language allied to the use of threads.
As a case study it was used genomic data available from the
NCBI (National Center for Biotechnology Information) public
database.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
4. 4
Materials and Methods
For the experimental environment it was used a simplified structure
composed of a computer with: 3.07GHz processor (12 cores), 12 GB
RAM and 500 GB hard disk.
Table 1 shows the results of the experiments for the file
"ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of
122 MB, referring to an excerpt from the genome identified by
"Aligator sinensis", belonging to the family Alligatoridae.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
5. 5
Conclusions
The main contribution of this work was to present an algorithm
option for data compression based on the Python
programming language.
By default, the algorithms were not designed to work in
parallel, however, with the use of the Python Threading library
this was achieved.
With the experimental environment implemented, it was
possible to analyze the performance of both the compression
rate and the compression time for each algorithm.
As future works, we intend to extend the range of algorithms to
be studied as well as the application and analysis of
decompression metrics with emphasis on public genomic data.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
6. 6
References
BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression
algorithm. 1994.
VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto
Alegre, Brasil, v. 1, 2014.
WELCH, Terry A. A technique for high-performance data compression. Computer,
v. 6, n. 17, p. 8-19, 1984.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition