Python (and Fortran and C) as used in large-scale number crunching and scientific programming
By Catherine Moroney

The combination of Python (and a bit of Fortran and C) is incredibly powerful and can be used to great effect in scientific programming and massive-scale number-crunching. The key is using the best parts of each language and getting the mixture just right. I have seen our legacy code shrink by a factor of 10 in size and still be stable and fast enough for production work.

Saturday 3 p.m.–3:45 p.m.

Slides available here.

Python is an incredibly powerful and versatile language, but it is not practical for lots of deeply nested loops because it's not compiled, and a lot of the big scientific libraries only have C interfaces. But, if you bring Fortran (via f2py) and C (Cython) into the mix then you can use the best features of all three languages and shrink your development/maintenance time and resulting code size by almost an order of magnitude.

My satellite has a lot of legacy code in a Perl/C/Fortran mixture that is incredibly difficult to read and maintain, but our new software is in a Python/C/Fortran mixture (with Python being the bulk of the code) that is a fraction of the size of the old code, much much easier to develop and maintain, yet it has the speed and stability to be used in our production system.

I use Python for the upper and middle layers, Fortran for the few places where I need to do lots of looping that cannot be transformed into whole-array operations, and then C to interface with the various 3rd party libraries. I can wrap Fortran and C using f2py and Cython respectively, call them directly from the Python, and pass numpy arrays back and forth seamlessly.

I also have layers of scripts: a Python code to run a single orbit's worth of data through a single stage of the processing pipeline, a second Python script to run multiple orbits through a single stage, and then a third script that runs the entire processing chain through many orbits, thereby using Python's numerous scripting capabilities.

By using all three languages as described above, I truly get the best of all worlds: dramatically reduced development time, a code base that is a fraction of the size of the old legacy system, yet all the speed and stability I need to process huge amounts of data.

Catherine Moroney

I am a scientific software engineer at the Jet Propulsion Lab, and I study clouds and their impact on the climate system by analyzing satellite (from the MISR instrument on the EOS Terra platform) data for over 20 years.

I have a BSc (McGill University, Montreal) and MSc (University of Toronto) in Physics and a second MSc in Computer Science (University of Southern California).

I do the research to develop the algorithms that take the raw satellite data and generates the actual scientific products, and then write the production software.

Sponsors