Programming-data

Concepts

· Data organization

When the amount of data collected in experiments is too large to be organized manually, computer programming becomes a necessary tool. If we can introduce the concept of database, selecting, updating, and analyzing large dataset will be much faster. This frees the researcher from the tedious job of sorting data, and makes new analysis possible.

· Data analysis

Data analysis is at the center of experimental research because it is the step to extract the scientific information from the raw data. Efficient data analysis realized by good computer programming can help data analysis in many ways: 1) it saves time, 2) it makes analysis of many different ways possible; 3) it makes checking for artifacts easier.

· Data visualization

Visualizing data in different ways may provide hint of hidden information. This is why there exists many presentations of data: 1D lines, 2D images, 3D fields, etc. A good data visualization can highlight trend of data, and provide both local and global views.

Programs

Software

The following software is written in Python. An installation of Enthought Canopy full version is recommended to run them (except for Xpy Windows exe version).

· Xpy

Xpy is a software for the data analysis of optical spectroscopy. It employs the object oriented programming and incorporated the concept of data analysis, organization, and visualization.

  • Data organization: It adopts the concept of database, but does not adopts any mature model of database such as relation database or xml. The database is basically a Python hiearchical object and implements a few primitive database operation. The advantage is that the database can be saved to disk and retrieve later.
  • Data analysis: It deals with linear spectra, with extensive built-in data treatment tools, e.g. merging spectra, fix jump, fix glitch, etc.
  • Data visualization: It features quick plot of data and does not aim for publishable decorations.
  • Although this software has many imperfection in terms of software engineering, it is a pretty complete package. The source code, compiled .exe files for Window users, and a manual are available here.

· Pydao

Pydao is a software for more general purpose of data analysis, organization, and visualization.

  • Data organization: It implemented a relational database to organize data, which makes handling complex data much easier. The data can be saved to hdf file format, which keeps the hierarchy.
  • Data analysis: It is more diverse and includes support for linear spectra as well as the 2D images. In particular, it supports data used in many user facilities, e.g. APS and CLS.
  • Data visualization: It actually has not really implemented a unified graphical user interface. The visualization mostly remains in the level of plotting using matplotlib.
  • Numerical simulation: This is a new part that includes Lattice Dynamics Calculation, Electronic interaction in hydrogen-like orbitals.
  • Plugins: Pydao is designed as more like a library than a software. So you can write plugin software on top of Pydao. A good example is the Lattice Dynamic Calculation. It is a complete software by iself and contain good data organization and visualization (using Mayavi).

In general, Pydao is still in its developmental stage. Since it is designed as a flexible package, every time we cope with a new problem, we will try to generalized the problem and add a few objects into Pydao. So the chances are, it will remain to be in the developmental stage for a long time. That said, one can use the core of Pydao as a library, i.e. the object hierarchy, the database, and the image and math tools. In addition, the Lattice Dynamic Calculation plugin is also complete. The current version of Pydao can be found here. No manual or .exe file are available yet.

An Introduction to Python