0% found this document useful (0 votes)

35 views8 pages

94-977-1-PB

Uploaded by

mnksouri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views8 pages

94-977-1-PB

Uploaded by

mnksouri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Journal of Farrell, D 2016 DataExplore: An Application for General Data Analysis in Research and

open research software Education. Journal of Open Research Software, 4: e9, DOI: http://dx.doi.org/10.5334/jors.94

SOFTWARE METAPAPER

DataExplore: An Application for General Data Analysis in

Research and Education
Damien Farrell1
1
UCD School of Veterinary Medicine, University College Dublin, Ireland
[email protected]

DataExplore is an open source desktop application for data analysis and plotting intended for use in both
research and education. It is intended primarily for non-programmers who need to do relatively advanced
table manipulation methods. Common tasks that might not be familiar to spreadsheet users such as
table pivot, merge and join functionality are included as core elements. Creation of new columns using
arithmetic expressions and pre-defined functions is possible. Table filtering may be done with simple
boolean queries. The other primary feature is rapid dynamic plot creation from selected data. Multiple
plots from various selections and data sources can also be composed using a grid layout. It is thus possible
to create p ublication quality plots. A plugin system allows the addition of features with several plugins
already available by default. The program is written in Python and is based on the PyData suite of Python
libraries.

Keywords: python; scientific plotting; table analysis; pandas

Funding statement: The author is funded by an Irish Research Council Postdoctoral Fellowship
(GOIPD/2015/475).

(1) Overview out of multiple factors. The opaque way in which cell
Introduction based formulae are often used makes it hard to track cal-
Recent years have seen a rapid growth in the importance culations. The use of conditional formatting to present
of data handling and analysis in the sciences. Such is results makes it almost impossible to interpret them in
the complexity and volume of data it has given rise another format. Also because they may be used for data
to data scientist specialisations within many fields. In entry and analysis at the same time, ad hoc changes to
the biological sciences, the data analysis task is often the raw data are encouraged. Finally, statistical analysis
assigned to a bioinformatician. Such expert skills are using the most common commercial product, Excel, have
essential when the complexity of the task is too much been criticized [4]. These problems make reproducible
for experimentalists unfamiliar with advanced compu- science difficult.
tational techniques. However there is a danger of over For the general scientific user these limitations can
reliance upon data analysts particularly in cases where be partly overcome by using other tools to compliment
an analysis can be done with relatively basic computa- spreadsheets. Many researchers use separate plotting
tional skills. packages such as GraphPad Prism [5] and statistical
Spreadsheets are widely used in scientific research. tools like SPSS [6] for analysis. However this gives rise to
They have also advanced a great deal in sophistication another problem – these are commercial applications.
[1] since their introduction and are by now a standard tool They are very expensive and the source code is closed.
for anyone dealing with numerical data. In the sciences This has serious implications for reproducibility. The
there has however been a tendency to rely on the spread- tendency to use commercial software is highly prevalent
sheet for tasks that they were not originally designed in academia even though there are some viable open
for [2]. Though advanced features like pivot tables are source solutions available. This may partly be a problem
available many general users are sometimes not aware of general awareness on the part of the user. Veusz [7]
of them and make the worksheet more complicated and SciDAVis [8] are good examples of free plotting
than it needs to be. Even if a spreadsheet can perform packages that compare favourably with commercial
a task using a macro it is often much more complex to products though they do not seem to be widely known.
accomplish than it would be with a few lines of code [3]. Commercial tools are also frequently rather feature
Spreadsheets have another more serious problem in that heavy and complex for the general user with entire
they make reproducible analysis very difficult. This arises courses devoted to teaching them.
Art. e9, p. 2 of 8 Farrell: DataExplore

Scientists in certain data intensive subject areas, are established plotting library for Python and produces pub-
now beginning to adapt to scripting languages like lication-quality figures in a variety of hard copy formats
R and Python [9]. These are a much better foundation to and interactive environments across platforms.
build future skills on and since they are open platforms In some plotting packages like Veusz, SciDaviz and mjo-
they allow users to publish full end-to-end instructions graph [20] plots are designed by the addition of multiple
that anyone in the world can reproduce for free. They plot elements to which data is attached. DataExplore is
also facilitate workflows with large data [10]. Adoption more data centric like R-studio and plots are generated
of these scripting tools is easier said than done because dynamically from the currently selected data and chosen
of the intimidating nature of programming to many. options. The idea is that rows and columns can quickly be
This is one reason R might not easily gain traction with chosen or tables edited and new plots generated instantly
experimentalists since it requires at least some program- with minimal mouse clicks. Plots cannot currently be
ming skill. R-studio [11] goes a long way to address this interactively edited though this is an option that could be
issue as it provides a user friendly environment for newer added later.
users. Though much progress has been on new web based
tools it still a challenge to build highly interactive applica- Architecture
tions inside the browser. There is still therefore space for The software is written in Python and makes extensive
graphical desktop tools to provide a familiar compliment use of the PyData libraries. These form an ecosystem of
to spreadsheets for non-programmers. libraries that can provide a complete solution to data
analysis from visualization to machine learning. The
Objectives graphical interface is built with Tkinter/ttk, the standard
DataExplore is intended for rapid exploratory analysis of graphical tool kit for Python. Like other Python packages
tabulated data. Quick transformation and visualization of the library is broken into modules which contain classes
data are core features. The use of the Python PyData stack grouped by function. Table, plotting and dialog widgets
[12] as the back-end means a large number of well tested are in their own modules as shown in Figure 1. The core
algorithms are already available. The dichotomy between class is the pandastable widget which is a Tkinter canvas
programming and tools with a graphical interface is usu- object. This is used to display a Pandas DataFrame via a
ally a sharp one [13] with users preferring one or the other model class that carries out changes to the DataFrame
approach. DataExplore is also intended to help bridge this based on user interaction and stores some additional
gap by readily making possible processing steps normally data about the Table. This widget is designed to be
familiar to data analysts. The main objectives of the soft- re-used in any Tkinter application. The DataExplore
ware are: application module itself is built around the table widget,
a plot viewer module and several plugins.
• allow quick exploration and visualization of a data set
• allow a familiar graphical interface but implement User interface
more advanced table analysis features than currently The application consists essentially of a table and associ-
accessible in spreadsheets ated plot viewer, shown in Figure 2. Multiple sets of tables
• help to bridge the gap between graphical interface can be loaded and saved as single projects. For certain func-
and command driven or programmatic approaches to tions a child table or sub-table is created below the main
data analysis one. This may be to store the results of a table manipula-
• scale to medium sized datasets, i.e. a table of the order tion such as an aggregation or to paste in another table so
of 1–5 million rows that will fit in the memory of that it can be joined to the main one. Another use would
most computers be to paste a portion of the selected data and plot it. The
• allow publication quality plots to be made easily and sub-table can be created and discarded as needed.
encourage clear scientific visualization [14] Unlike a spreadsheet, the focus is not on data entry.
Though individual cell entry is possible, users are encouraged
Implementation and architecture to keep their original data separate and unchanged. Results
Methodology can be exported to csv or other formats if required. This is
The core R data structure is called data.frame, a versatile important to robust analysis. An undo/redo feature is not
matrix structure that stores multiple data types [15]. It yet implemented but will likely be useful in the future
has been replicated in Python as a core component of when more complex series of processing steps need to be
the Pandas library [16]. This has opened the way to much experimented with.
more convenient R style data analysis in Python. Pandas Plot options are laid out in a set of tabbed control pan-
DataFrame structures, which use the efficient ndarray els below the plot allowing the user to switch quickly
data container class in numpy [17], are now well inte- between basic and other plotting modes. Currently a 3D
grated into other Python data analysis libraries, creating plot mode and grid layout options are also available. Table
a very useful ecosystem. These libraries are often grouped functions are accessed either from the right toolbar (see
together as part of the PyData stack [18]. DataExplore is Figure 2), the right-click context menu inside the table or
based on using DataFrames to present tabulated data and from the main menu. Dialogs such as plugin interfaces are
on matplotlib [19] for plotting. Matplotlib is a very well usually placed below the table.
Farrell: DataExplore Art. e9, p. 3 of 8

Figure 1: Outline of pandastable library modules. The graphical user interface is a scaffolding using these modules.

Figure 2: Application interface.

Art. e9, p. 4 of 8 Farrell: DataExplore

Features table is restored when the filters are cleared. The syntax is
Import of text files straightforward to learn for beginners and may be useful
Import of csv and general plain text formats is a stand- for teaching logical AND/OR/NOT row-wise operations.
ard feature of Pandas using the read_csv method and
supports many options. The most essential of these are Table manipulation
available via the import dialog accessible from the toolbar Common transformations such as transpose, aggregation,
or by right-clicking anywhere in the table and using the pivot and merge are supported. Results are mostly placed
context menu. in the sub-table so as not to overwrite the main table. The
sub-table can also be used to plot from or copied into the
Row and column indexes main table or another sheet. For operations involving two
The index is a fundamental feature of the underlying tables (like concatenate or merge) the second dataset is
DataFrame. This performs the central role of data align- loaded into the sub-table (by importing or pasting) which
ment or getting and setting of subsets of the table. A can then be joined to the main table.
more novel aspect is the use of “hierarchical” indexing.
This is essentially a way of representing data with an arbi- Plotting behaviour
trary number of dimensions in a 2D table. In our program The design is oriented around quick generation of plots
mostly the use of multi-indexes is implicit to the way the from the current selections. This means that the current
program works but it opens the door to add more useful plot is constantly overridden. However plots can be saved
functionality later on. For now the index can be displayed and recalled in the current session if required. To produce
or hidden in the table and columns can be turned into multiple plots in one figure a grid layout mode is used.
indexes. This is useful for plotting since the index is often Changing the number of rows/columns makes a finer
the implied x-axis for plotting. grid and adjusting the row/column spans allows a variety
of sub plot combinations to be created. When the user
Table filtering wants to add a new sub plot they simple select the row
Currently filtering of the table is done using a quite sim- and column location to add them. An example is shown
ple string query method. An entry box is used to enter in Figure 3. It is also possible to use this method to make
the query and the table updated accordingly. The main inset plots by overlaying them on the main plot.

Figure 3: A figure generated directly from the application using the Titanic data. This uses the grid layout mode to
combine multiple sub plots together.
Farrell: DataExplore Art. e9, p. 5 of 8

Categorical plots and the table updated to reflect the changes immedi-
Data can be grouped and plotted either by grouping by ately. This should prove a useful way to teach coding
categorical columns in the plot dialog or performing a skills in a familiar environment.
groupby-aggregate step and plotting the resulting table.
The factor plots plugin provides even more advanced plot- Documentation and usage
ting capabilities. Factor plots allow multiple comparisons Documentation is provided in the form of a wiki on github
to be made in a single graph. That is, you can split data by at https://github.com/dmnfarrell/pandastable/wiki/.
more than one variable along an axis or between plots. Specific case studies/tutorials along with links to screen
In seaborn these dimensions are called row/col (the plot casts and details of new features can be viewed on the blog
dimensions) x,y (axes) and hue (grouping/color within at http://dmnfarrell.github.io/. The case studies provide a
plots). These concepts are illustrated in the seaborn docu- visual guide through real world examples. This blog will
mentation on factor plotting [21] and on the blog. be kept up to date as the program is further developed.

3D plots Current case studies

Plotting interactive 3D projections that can be zoomed 1. Looking at the Titanic dataset (basic). Exploratory
and rotated is provided using the matplotlib mplot3d methods for beginners using the Titanic data from
module. Scatter, bar, contour, wireframe and surface plots a Kaggle Getting Started Competition [25]. This
are available. Plotting of column selections does not have illustrates the used of the software in initial explo-
a single unambiguous representation in 3D space for the rations of data for beginners using plots of distri-
latter three kinds of plots. So pre-defined modes must be butions, breaking down columns by category and
used that tell the program how to interpret the selected re-binning categorical data.
data. Currently the default is to interpret the third column 2. Plotting miRNA abundance data (advanced). This
z as a function of the first two x and y, i.e. of the form uses miRNA-sequencing expression data [26] to
(x,y) → z. Other modes, such as support for parameterized show more complex methods of representing data
functions, are still to be added. sets with multiple sample labels. It includes a dem-
onstration of how to create long form data for use
Data fitting with the factor plotting plugin.
The statsmodels [22] library is used for data fitting in
DataExplore since it works well with Pandas and has a Future developments
simple programming interface. It provides descriptive The software at time of writing is at version 0.7. Regular
statistics, statistical tests, plotting functions, and imple- releases will be made as bugs are fixed or features added.
mentation of standard estimators used in model fitting. Many useful developments can be applied to this software
String formulas are supported using Patsy and this is used to exploit the wealth of functionality in Python scientific
in to allow the user to type in their formulas providing libraries. The plugin system allows such new features to
using the special syntax. be added very easily by third parties with some knowledge
of Python. It is important to underline that development
Plugins should be led by user feedback rather than the authors.
Plugins are for adding custom functionality that is not Some particularly useful potential features are mentioned
present in the main application. These are Python scripts here:
implemented by sub-classing the Plugin class in the
plugin module. At minimum a plugin must have a main() • Workflow tracking mechanism. Obviously in a point
method which is called by the application to launch it. and click environment people will not remember
Otherwise a script can generally contain any code the every step they take in an analysis. To aid reproduc-
author wishes. Usually the idea will be to implement a ibility some method of recording processing steps
dialog that the user interacts with but this could also be a would be useful.
single function that runs on the current table or all sheets • Integration with Jupyter notebooks. The Jupyter note-
at once without further user interaction. Three plugins book [27] is a web application that allows you to create
are currently provided with the library: and share documents that contain live code, visualiza-
tions and explanatory text. It has potential to be used
• Batch file renaming utility – a tool for renaming multi- for the kind of workflow tracking mentioned above.
ple files at once that can be useful for importing files. • Plugins for more specific kinds of data analysis such as
• Factor plotting – advanced categorical plotting using principal component analysis.
the Seaborn library [23]. seaborn provides a high-level • Batch conversion and/or joining of multiple text/csv
interface to matplotlib for drawing attractive graphics. files likely using a plugin.
It also understands DataFrames without any need for • Improvements to scale with larger data sets.
converting them. • Add support for loading remote data sources and
• IPython console – Here you can call any Python com- sharing results.
mands and even shell commands provided in IPython • Enhance plot support with the ability to add anno-
[24]. The underlying table DataFrame can be manipu- tations and allow arbitrary placement of sub plots
lated directly with code snippets or external scripts amongst other features.
Art. e9, p. 6 of 8 Farrell: DataExplore

Quality control Additional system requirements

Git is used for version control and bug tracking. Github’s No special requirements.
issue tracking supports milestones, labels and assignees
for filtering and categorizing bugs. It is therefore the ideal Dependencies
way to handle user feedback. The following Python libraries are required dependencies:
Numpy >= 1.5
Testing matplotlib >= 1.1
Testing is done using the standard Python unittest frame- pandas >= 0.17
work (PyUnit). Tests can be executed from the cloned numexpr >= 2.4
source directory. Since automated testing of Tkinter widg- xlrd >= 0.9
ets is known to be difficult, the tests currently concentrate
on table functions that do not require user interaction. Optional dependencies
For other tests user feedback is essential. The project uses seaborn >= 0.6
the Travis continuous integration service [28]. Travis CI statsmodels >= 0.6 (requires scipy)
automatically detects when a commit has been made and ipython >= 4.0
pushed to the GitHub repository. Each time this happens,
it will try to build the project and run the tests. This model Software location
of development scales well for multiple developers. Archive (e.g. institutional repository, general repository)
Name: Zenodo
Data integrity Persistent identifier: 10.5281/zenodo.44891
Since there is minimal emphasis on data entry per se, the Licence: GPL v3
user is encouraged to keep raw data unchanged. Projects Publisher: Damien Farrell
saved in the native format as multiple sets of worksheets Date published: 17/1/16
are kept separate from the original data. Projects are saved Code repository
in MessagePack format [29] which is an efficient binary Name: GitHub
serialization format and is used to save Python objects. In Identifier: https://github.com/dmnfarrell/pandastable
our case the DataFrame, meta data like the current table Licence: GPL v3
selections and plotting options (as Python dictionaries) Date published: 11/2/14
for each sheet are saved together in one project so that
the workspace can be reloaded conveniently. Individual Language
DataFrames can also be saved alone without any extra English.
meta data so that they can be persisted and reloaded
outside the context of the application. Though we have (3) Reuse potential
found the format efficient and reliable, MessagePack sup- The software is designed for a general science and techni-
port is still experimental in Pandas. The project files are cal audience and is not tied to any specific field. Though
not designed be used as an archive for raw data sets but it was initially developed with the biological sciences in
rather seen as a workspace that can be updated constantly mind, it has very general application. It will be most use-
and interchanged between users wishing to share analy- ful for students and researchers who are not program-
sis and workflows. A methodology for workflow tracking mers but need to do convenient exploration of their data.
is planned that will involve refinement of the this file Educators at all levels are also a target for the software.
format. With the easy to learn user interface it is a good way to
introduce basic data manipulation methods. Specific uses
(2) Availability include producing plots for reports or publication, quick
Operating system visualization of small to medium sized datasets, database
This software is supported on any operating system that style filtering with string queries or fitting linear models.
supports a standard Python installation which includes It is hoped that some of this functionality will help stu-
Linux, Windows and OSX. In all systems the DataExplore dents become familiar with the more advanced analyti-
application can be provided by installing the pandasta- cal methods available via programming languages. Data
ble Python library via pip or easy_install. This requires a scientists may also find the tool useful for quick plots of
working Python installation. It is also available as a pack- their data before or after detailed analyses and as a way of
age via the self contained Anaconda Python distribution. sharing results with others.
In Windows a binary installer is available, packaged with The use of proprietary software without access to the
cx_Freeze [30], that installs an executable and all the code base is still very common in science. This makes it
required libraries without the need for a separate Python hard to do reproducible work that can be shared. This
install. This installer is a 32-bit executable which will run software is based on well established open source Python
on all windows systems. Detailed install instructions are libraries that do not have such issues. These high-quality
given in the documentation. tools for scientific computing provide the ideal platform
to build a user friendly open source application for data
Programming language analysis. In the long-term, it is hoped that a community of
Python version >= 3.4 or 2.7. users can be built, some of whom will be developers able to
Farrell: DataExplore Art. e9, p. 7 of 8

provide their own plugins or extend the core a pplication. 10(9): e1003833. DOI: http://dx.doi.org/10.1371/
The project can be also be forked without restriction. journal.pcbi.1003833
15. Data science retreat 2013 R: the good parts.
Competing Interests Available at http://blog.datascienceretreat.com/post/
The authors declare that they have no competing interests. 69789735503/r-the-good-parts [Accessed: 08-Jan-2016].
16. Mckinney, W 2015 Pandas, Python Data Analysis
Acknowledgements Library. Available at http://pandas.pydata.org/.
Thanks to Prof. Stephen Gordon for supporting work on 17. van der Walt, S, Colbert, S C and Varoquaux, G
this project. 2011 (March) The NumPy Array: A Structure for
Efficient Numerical Computation. Comput Sci Eng,
References 13(2): 22–30. DOI: http://dx.doi.org/10.1109/MCSE.
1. Weathington, J 2015 5 things every data scientist 2011.37
should know about Excel. Available at http://www. 18. The PyData Community 2015 PyData. Available at
techrepublic.com/article/5-things-every-data-scientist- http://pydata.org/downloads/.
should-know-about-excel/ [Accessed: 16-Jan-2016]. 19. Hunter, J D 2007 (May) Matplotlib: A 2D Graphics
2. Burns, P 2014 Spreadsheet Addiction. Available at Environment. Comput Sci Eng, 9(3): 90–95. DOI:
http://www.burns-stat.com/documents/tutorials/ http://dx.doi.org/10.1109/MCSE.2007.55
spreadsheet-addiction/. 20. Tanahashi, M 2014 mjograph. Available at http://
3. Moffitt, C 2014 Common Excel Tasks Demonstrated www.ochiailab.dnj.ynu.ac.jp/mjograph/.
in Pandas. Available at http://pbpython.com/excel- 21. Waskom, M 2015 Seaborn factorplot documentation.
pandas-comp.html. Available at http://stanford.edu/~mwaskom/software/
4. McCullough, B D and Heiser, D A 2008 On the accu- seaborn/generated/seaborn.factorplot.html [Accessed:
racy of statistical procedures in Microsoft Excel 2007. 12-Jan-2016].
Comput Stat Data Anal, 52(10): 4570–4578. DOI: 22. Statsmodels Developers 2015 Statsmodels. Available
http://dx.doi.org/10.1016/j.csda.2008.03.004 at http://statsmodels.sourceforge.net/.
5. GraphPad Software 2015 GraphPad Prism version 23. Waskom, M 2012 Seaborn. Available at http://
6.0. Available at http://www.graphpad.com/scientific- stanford.edu/~mwaskom/software/seaborn/. DOI:
software/prism/. http://dx.doi.org/10.1109/MCSE.2007.53
6. IBM 2015 IBM SPSS Statistics. Available at http:// 24. Pérez, F and Granger, B E 2007 (May) {IP}ython: a
www-01.ibm.com/software/analytics/spss/. System for Interactive Scientific Computing. Comput
7. Sanders, J 2015 Veusz. Available at http://home.gna. Sci Eng, 9(3): 21–29.
org/veusz/. 25. Kaggle 2012 Titanic: Machine Learning from Disaster.
8. Benkert, T, Franke, K and Standish, R 2007 Available at https://www.kaggle.com/c/titanic
SciDAVis. Available at http://scidavis.sourceforge.net/. [Accessed: 16-Jan-2016].
9. Buffalo, V 2015 Bioinformatics Data Skills. O’Reilly Media. 26. Farrell, D, Shaughnessy, R G, Britton, L,
10. Heller, M 2015 Learn to crunch big data with R. Avail- MacHugh, D E, Markey, B and Gordon, S V 2015
able at http://www.infoworld.com/article/2880360/ The Identification of Circulating MiRNA in Bovine
big-data/learn-to-crunch-big-data-with-r.html Serum and Their Potential as Novel Biomarkers of
[A ccessed: 07-Jan-2016]. Early Mycobacterium avium subsp paratuberculosis
11. RStudio Inc. 2015 RStudio: Integrated Development Infection. PLoS One, 10(7): e0134310. DOI: http://
for R. Boston MA. dx.doi.org/10.1371/journal.pone.0134310
12. Oliphant, T E 2007 (May) Python for Scientific 27. Project Jupyter 2015 Jupyter Notebook. Available at
Computing. Comput Sci Eng, (9)3: 10–20. DOI: http://jupyter.org/ [Accessed: 21-Jan-2016].
http://dx.doi.org/10.1109/MCSE.2007.58 28. Travis CI Community 2011 Travis continuous inte-
13. Ward, N 2013 Excel, SPSS, Minitab or R? Available at gration. Available at https://travis-ci.org/.
https://learnandteachstatistics.wordpress.com/2013/ 29. Furuhashi, S 2008 MessagePack. Available at http://
02/11/excel-spss-minitab-or-r/ [Accessed: 09-Sep-2015]. msgpack.org/index.html.
14. Rougier, N P, Droettboom, M and Bourne, P E 2014 30. Tuininga, A 2014 cx_Freeze. Available at http://cx-
Ten Simple Rules for Better Figures. PLoS Comput Biol, freeze.sourceforge.net/.
Art. e9, p. 8 of 8 Farrell: DataExplore

How to cite this article: Farrell, D 2016 DataExplore: An Application for General Data Analysis in Research and Education.
Journal of Open Research Software, 4: e9, DOI: http://dx.doi.org/10.5334/jors.94

Submitted: 15 September 2015 Accepted: 08 March 2016 Published: 22 March 2016

Copyright: © 2016 The Author(s). This is an open-access article distributed under the terms of the Creative Commons
Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

Journal of Open Research Software is a peer-reviewed open access journal published by

Ubiquity Press
OPEN ACCESS

Andrews M. Doing Data Science in R. an Introduction...2021
No ratings yet
Andrews M. Doing Data Science in R. an Introduction...2021
486 pages
Ccpda Book
No ratings yet
Ccpda Book
46 pages
Unit-1
No ratings yet
Unit-1
84 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Computing Environments For Data Analysis
No ratings yet
Computing Environments For Data Analysis
18 pages
Olympic Data Minor Project 5th Sem
No ratings yet
Olympic Data Minor Project 5th Sem
23 pages
Get (Ebook) Advanced Excel for Scientific Data Analysis by Robert de Levie ISBN 9780195152753, 9780195170894, 0195152751, 019517089X PDF ebook with Full Chapters Now
No ratings yet
Get (Ebook) Advanced Excel for Scientific Data Analysis by Robert de Levie ISBN 9780195152753, 9780195170894, 0195152751, 019517089X PDF ebook with Full Chapters Now
82 pages
Datasist: A Python-Based Library For Easy Data Analysis, Visualization and Modeling
No ratings yet
Datasist: A Python-Based Library For Easy Data Analysis, Visualization and Modeling
17 pages
7-Data-Analysis-Software-Applications-You-Need-to-Know
No ratings yet
7-Data-Analysis-Software-Applications-You-Need-to-Know
6 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Earn Over 100k Joe Verde
100% (1)
Earn Over 100k Joe Verde
335 pages
Bosch Camera Price
No ratings yet
Bosch Camera Price
30 pages
Instructivo BMW - N52 Engine
No ratings yet
Instructivo BMW - N52 Engine
63 pages
BRICET CME DR Srinivas Deshmukh
No ratings yet
BRICET CME DR Srinivas Deshmukh
45 pages
Auditing The Data Using Python
No ratings yet
Auditing The Data Using Python
4 pages
Water-Treatment Book PDF
No ratings yet
Water-Treatment Book PDF
42 pages
KKA Transient Stability of 2 Machine Transmission System Bachelors Thesis PPT
No ratings yet
KKA Transient Stability of 2 Machine Transmission System Bachelors Thesis PPT
29 pages
Using R For Data Analysis and Graphing in An Introductory Physics Laboratory
No ratings yet
Using R For Data Analysis and Graphing in An Introductory Physics Laboratory
10 pages
Preface: Goal of This Book
100% (1)
Preface: Goal of This Book
6 pages
Tarifa SD Mach
No ratings yet
Tarifa SD Mach
28 pages
Mushoku Tensei 24 - Conclusion Chapter PDF
100% (1)
Mushoku Tensei 24 - Conclusion Chapter PDF
76 pages
3.2 Pressure in Liquid
No ratings yet
3.2 Pressure in Liquid
12 pages
LP - Unit 2 - Listening - GS11
No ratings yet
LP - Unit 2 - Listening - GS11
11 pages
Mastering Scientifi C Computing With R
0% (1)
Mastering Scientifi C Computing With R
55 pages
Bpharm Kuhs
No ratings yet
Bpharm Kuhs
67 pages
The Staging of Memory Ars Memorativa and
No ratings yet
The Staging of Memory Ars Memorativa and
15 pages
Silverlight - Sales Calling Script-3
No ratings yet
Silverlight - Sales Calling Script-3
16 pages
HR Accounting
No ratings yet
HR Accounting
22 pages
laurie-anderson-s-heart-of-a-dog-a-post-cinematic-meditation-on-affection_Content File-PDF
No ratings yet
laurie-anderson-s-heart-of-a-dog-a-post-cinematic-meditation-on-affection_Content File-PDF
10 pages
Ponte Vikashand Vimolan
No ratings yet
Ponte Vikashand Vimolan
19 pages
Engr Aneel Manan: Environmental Engineering-II 7 Semester:01
No ratings yet
Engr Aneel Manan: Environmental Engineering-II 7 Semester:01
23 pages
Excel Preface
No ratings yet
Excel Preface
6 pages
QB - Aldehydes, Ketones and Carboxylic Acids
No ratings yet
QB - Aldehydes, Ketones and Carboxylic Acids
5 pages
An Introduction To Internal Combustion Engines
No ratings yet
An Introduction To Internal Combustion Engines
20 pages
Quick Installation Guide: Wired/Wireless IP Camera
No ratings yet
Quick Installation Guide: Wired/Wireless IP Camera
14 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
Sigmoid Function Approximation For ANN Implementation in FPGA
No ratings yet
Sigmoid Function Approximation For ANN Implementation in FPGA
5 pages
Business - Research.Chapter 4
No ratings yet
Business - Research.Chapter 4
2 pages
Isolasi Bakteri Asam Laktat Dari Tape Ketan Dan Potensinya Sebagai Agen
No ratings yet
Isolasi Bakteri Asam Laktat Dari Tape Ketan Dan Potensinya Sebagai Agen
7 pages
Draka Cat6a
No ratings yet
Draka Cat6a
3 pages
LECTURE_1.A_FORMAL_SPECIFICATION_outline
No ratings yet
LECTURE_1.A_FORMAL_SPECIFICATION_outline
2 pages
UP Scholarship-P - 240506 - 134125
No ratings yet
UP Scholarship-P - 240506 - 134125
2 pages
ACI Shear
100% (1)
ACI Shear
3 pages
Critical Analysis Template30565
No ratings yet
Critical Analysis Template30565
1 page
Leadership Pattern and Behaviors
100% (1)
Leadership Pattern and Behaviors
2 pages
From Zero to Market with Flutter: Desktop, Mobile, and Web Distribution
From Everand
From Zero to Market with Flutter: Desktop, Mobile, and Web Distribution
Viachaslau Lyskouski
No ratings yet
Efficient Data Processing with Apache Pig: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Processing with Apache Pig: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Julia for Scientific Computing: Julia in Production: A Data Science Journey
From Everand
Julia for Scientific Computing: Julia in Production: A Data Science Journey
Alexander Clifton
No ratings yet
From Zero to Market with Flutter
From Everand
From Zero to Market with Flutter
Viachaslau Lyskouski
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
From Everand
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Younes Hamdani
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Jupyter Environments and Workflows: Definitive Reference for Developers and Engineers
From Everand
Jupyter Environments and Workflows: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Manipulation with Python Step by Step: A Practical Guide with Examples
From Everand
Data Manipulation with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Efficient Workflow with RStudio: Definitive Reference for Developers and Engineers
From Everand
Efficient Workflow with RStudio: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Zeppelin for Interactive Data Analytics: Definitive Reference for Developers and Engineers
From Everand
Zeppelin for Interactive Data Analytics: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Guide to H2O.ai: Definitive Reference for Developers and Engineers
From Everand
Practical Guide to H2O.ai: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Racket Programming: From Basics to Expert Proficiency
From Everand
Mastering Racket Programming: From Basics to Expert Proficiency
William Smith
No ratings yet
Pentaho Solutions and Architecture: Definitive Reference for Developers and Engineers
From Everand
Pentaho Solutions and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C# OOP Step by Step: A Practical Guide with Examples
From Everand
C# OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Essential .NET Framework Technologies
From Everand
Essential .NET Framework Technologies
Pasquale De Marco
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
The Modern Engineer's Spreadsheet Toolbox
From Everand
The Modern Engineer's Spreadsheet Toolbox
Pasquale De Marco
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
LOTED: a semantic web portal for the management of tenders from the European Community
From Everand
LOTED: a semantic web portal for the management of tenders from the European Community
Francesco Valle
No ratings yet
Virtual Report Processing: The Mapper Story
From Everand
Virtual Report Processing: The Mapper Story
Louis Schlueter
No ratings yet
Python Data Persistence
From Everand
Python Data Persistence
Malhar Lathkar
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Racket Unleashed: Building Powerful Programs with Functional and Language-Oriented Programming
From Everand
Racket Unleashed: Building Powerful Programs with Functional and Language-Oriented Programming
Robert Johnson
No ratings yet
Power BI DAX: A Guide to Using Basic Functions in Data Analysis
From Everand
Power BI DAX: A Guide to Using Basic Functions in Data Analysis
Kiet Huynh
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
.NET Mastery: The .NET Interview Questions and Answers
From Everand
.NET Mastery: The .NET Interview Questions and Answers
Chetan Singh
No ratings yet
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
From Everand
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
Malcolm Coxall
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
From Everand
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
Fouad Sabry
No ratings yet
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)

94-977-1-PB

Uploaded by

94-977-1-PB

Uploaded by

Journal of Farrell, D 2016 DataExplore: An Application for General Data Analysis in Research and

DataExplore: An Application for General Data Analysis in

Keywords: python; scientific plotting; table analysis; pandas

Figure 2: Application interface.

3D plots Current case studies

Quality control Additional system requirements

Submitted: 15 September 2015 Accepted: 08 March 2016 Published: 22 March 2016

Journal of Open Research Software is a peer-reviewed open access journal published by

You might also like