Wednesday, December 28, 2011

Personal view on PyPy's future

Hello.


PyPy has seen some dramatic progress in the past year with each release
delivering roughly 30% speed improvements over the previous one. I think
with a roughly 3-month release cycle, this is a pretty good achievement and
you haven't seen all of it yet :) We have quite a few improvements in process,
with the current trunk already showing faster performance than 1.7, even
though only a month has passed.


It might be worth mentioning that PyPy has many facets and progress is done
by multiple people in various directions. Examples include:



  • STM work done by Armin, read details

  • NumPy speed improvements, done mostly by Alex Gaynor and myself

  • NumPy completeness, done by a lot of people, including myself, Alex, Matti
    Picus and some others, if you're interested in this progressing faster, feel
    free to donate towards NumPy on PyPy

  • Specialized object-instances, worked by Carl Friedrich Bolz and
    Lukas Diekmann and various other specialized types

  • ARM JIT backend, work by David Schneider and Armin

  • PPC JIT backend, work by Sven Hager and David Edelsohn, with help from
    David Schneider

  • Speed improvements, done by everyone


Overall this is about 30 commits daily -- quite a bit of activity.



Things I can do professionally that can help the community


There is however another aspect of PyPy development, which is things that are
interesting, would work but potentially are useful only to some, but are
out of the current focus of core developers. Personally, I'm working full
time on PyPy, even though I did not receive any compensation for it for the
last half a year. I hope to make working on PyPy my full-time job from
both the donation-based income like numpy donations and pypy consulting.
I promise not to spend all the numpy-related donations on surfing addiction :-)


Examples of what I can professionally do include, but are not limited to:



  • Faster json decoder. Fast json encoder was done and details can be found.

  • Finshing matplotlib hacks to support full matplotlib and scipy.

  • Make your favorite module X work on PyPy.

  • Make your favorite module X faster on PyPy.


Additionally, I recently started offering consulting services for PyPy,
please consult my website for details.


Cheers,
Maciej FijaƂkowski


Monday, December 12, 2011

Talent shortage? What talent shortage?

I've been recently reading a lot of articles like talent shortage in Austin
or even more ridiculous ideas, like an offshore office platform on the coast of
california to battle American immigration law.


This all sounds ridiculous to me, since I don't think there is any shortage of
talent and I know plenty of software developers who are more than capable and
will be willing to do work for you, provided few simple things on your side.
The crucial part is understanding that productivity differs vastly between
different people and sometimes people are productive not in a way that you
think they are. They also usually tend to have experience how to make
themselves productive.


Let me explain a bit where I come from. I'm one of the core developers of the
PyPy project and I consider myself relatively competent at least in several
fields of software engineering. My work on PyPy is not only technical -- there
is a lot of helping people, reviewing patches, communicating with
the team, whcih is entirely distributed across the world and timezones. The main
project contributors are or have been living in Germany, United States, Poland,
France, Sweden, South Africa, Italy and whatever I forgot. It does not really
matter. Personally, I claim PyPy has been a very successful project -- it works
reasonably well and it's a good way to speed up most of Python programs. Consult
the speed website for benchmark details. And on top of that there are
constantly very bright people coming and contributing in a non-trivial way,
even though PyPy has next to no money.


My experience with PyPy makes me quite competent in all-things-Python,
open source project management and understanding of nitty gritty details
of performance for the entire software stack, but also made me next to
unemployable for most companies. Let me explain few simple things you
can do to your company
that would make it a dream place to work for me and people like me. This list is
pretty personal, but I suppose the amount of flexibility is pretty universal
for a lot of smart geeks out there.


Cool technologies. PyPy is a really cool technology and I want to stay
bleeding edge with such stuff. That usually require at least two
working days a week to work on open source stuff. Your company probably
uses tons of open source, so why not contribute back? Your geeks would very
likely boost their productivity that way anyway, since most of the time
they will be working on tools they use for themselves.


Tooling. In PyPy we spend about 20-40% of our time on tooling, using
my very unscientific measurments. It's all, from buildbot maintenance,
testing tools to things like jitviewer. Overall it pays off, but it does
not show up in the near term productivity reports. Also, geeks love tools.


Telecommuting. Moving to a new place is hard for many people, especially
when you have to deal with all the immigration b*shit. Personally, I'm also
an outdoor person, so living in Cape Town is pretty ideal, I would not change
that to downtown New York for example. This is a deal breaker for many people,
but even if you don't know how to make people who are 100% telecommuting
productive, chances are they know and they already have years of experience
doing that. It also cuts a lot of costs and noone is thrilled spending
extra 2h a day driving somewhere.


The list of those three items was pretty much a dealbreaker for most of the
companies I chatted with over the past few years. There are companies who
hire remotely, but don't allow you to work on open source or the other way
around. Personally, if someone satisfies all three, I would be even willing
to have a serious paycut, just so I can live the way I like it. Living in
a cheaper place also simplifies a lot in this regard. I also don't fully
understand, why not. If today your company will start working with geeks that
way, it would be able to attract top talent, even if the end product of yours
is not very exciting. It's also a self-fulfilling prophecy -- chances are,
even if your product is not exciting, the work will be because you have
the best people on board. I would welcome some comments from
actual employers why they don't want to do all of that. If someone is
willing to do it, I'm also available for a chat, write to me at fijall at gmail
(yes, double l, gmail has 6-characters-minimum limit) or find me on twitter.


Cheers,
fijal

Thursday, November 17, 2011

Analysing python's performance under PyPy

The traditional model of analysing performance of Python programs has been
"run the profiler, find your bottlenecks, optimize them or move them to C".
I personally find this approach grossly insufficient in many cases, especially
in the context of PyPy. Particular problems are:



  • In many large applications, the profile is flat: PyPy's own translation
    toolchain, Twisted or any modern web servers are good examples.

  • Once you have found a bottleneck, it's not entirely clear what's slow inside
    that particular function or set of functions. The usual set of common
    knowledge about what's slow and what's fast is a moving target even in the
    case of CPython. In the presence of a JIT, the situation is even more complex.
    A look at how the JIT compiled a particular piece of code becomes crucial.

  • Performance problems, especially GC-related ones might not show up in the
    profile, they might be just equally spread around many functions.


PyPy comes with several tools at different levels of maturity that should help
you identify problems. I'll outline in a few simple steps how I approach
performance analysis of programs. Remember, these are just guidelines and
there is no silver bullet. If your application is complex enough, you might need
lots and lots of lead bullets :-)


This post, already pretty lengthy, comes with no examples. I'll try to provide
one with examples in the near future.



Create tests


This might come as a surprising point in this blog post, since it's not about
quality, but tests improve the possibility to experiment with your code. If
you have lots of automated tests, chances are you'll be able to refactor your
code in a more performance-oriented manner without actually breaking it.




Write some benchmarks


This is an absolutely crucial starting point. You have to be able to measure
the impact of your changes by running a single script, preferably with few
arguments. ab is not good enough.


If your application is not a once-off script, you should be able to measure
how the JIT warmup time affects your performance by running the same test
repeatedly. It also helps to visualize how it varies between consecutive runs.


My usual benchmarks, unless there are other reasons, run between 0.2s and 5s
per step. This helps to minimize the impact of random variances. The JIT warmup
time varies vastly depending on your code base. It can be anything from
unnoticable to a minute. Measure before making judgements.




Glance through cProfile results


I personally use a modified tool called lsprofcalltree.py that runs Python
profiler (CProfile) and provides
output compatible with an awesome tool called kcachegrind. This might
or might not provide any useful info. If there are functions that stick out
of the profile, glance through them. Are they clearly inefficient? Do they
use inefficient algorithms? Don't bother micro-optimizing them yet.




Check the GC/JIT/other ratio


There is a very useful tool in the pypy codebase to do that. Assuming you are
running your program under a pypy virtualenv, simply run:



PYPYLOG=log ./test.py

and then from pypy checkout:



pypy/tool/logparser.py print-summary log -

you can even look at the pretty graph by doing:



pypy/tool/logparser.py draw-time log out.png

This should give you a rough overview of how much time is spent doing what.
Included times are GC, JIT tracing (that's a warmup phase) and
other which includes running JITted code.




Use jitviewer


JitViewer might be very confusing but it gives you very rough overview of
what is going on in your code. See the README for more details about how it
works, but in general you can see how your Python code got compiled to the
intermediate representation for the JIT (and to the assembler). It's not
that interesting
to see precisely which part got compiled how, but it's important to look how
many intermediate-representation instructions (resops) are created per
piece of Python. Also some stuff, like various kinds of call and allocations
new_xxx are more costly than others. Track those carefully.




Think about potential modifications


Try running through different ways to express the same thing in places that
show high up on profiling or in the jitviewer. Don't take anything for granted
-- only trust measurments. Most of the time there is "ah that's nonsense"
feeling that leads to some improvements.




That's all


This is pretty much it -- as I said before there are no hard rules. Know your
tooling and try various things. The faster you can iterate, the more options
you can try. Understanding details is usually crucial and can lead to
interesting improvements. There is a non-exhaustive and not-always-up-to-date
list of things JIT likes more to be used or at least tried.


In the next episode I'll try to walk over an example of improvements based
on an actual project.


Cheers,
fijal


Thursday, October 27, 2011

PyPy and the road towards SciPy

Hello


Recent PyPys effort to bring NumPy and the associated fundraiser
caused a lot of discussion in the SciPy community regarding PyPy, NumPy,
SciPy and the future of numeric computing in Python.


There were discussions on the topic as well as various blog posts
from the SciPy community who addressed few issues. It seems there was a lot
of talking past each other and I would like to clarify on a few points here,
although this should be taken as my personal opinion on the subject.


So, let's start from the beginning. There are no plans for PyPy to
reimplement everything that's out there in RPython. That has been pointed
out from the beginning as a fallacy of our approach -- we simply don't plan
to do that. We agree that Python is a great glue language and we would like
to keep it that way. PyPy can nicely interface with C using ctypes with
a slightly worse story for C++ (even though there were experiments).
What we know by now is that CPython C API is not a very good glue for PyPy,
it's too tied to CPython and it prevents a lot of interesting optimizations
from happening. The contenders are a few with Cython being a favorite
for now, however for Cython to be usable we need to have a story for C++
(I know Cython does have a story but it's unclear how that would work with
the PyPy backend).


Which brings me to second point that while a lot of code in packages like
SciPy or matplotlib should be reusable in PyPy, it's probably not in
the current form. Either a lot of it has to move to Cython or some other
way of interfacing with C will come across. This should make it clear that
we want to interface with SciPy and reuse as much as possible.


Another recurring topic that seems to pop up is why we just don't reuse Cython
for NumPy instead of reimplementing everything. The problem is that we need
a robust array type with all the interface before we can start using Cython
for anything. Since we're going to implement it anyway, why not go all the way
and implement the full NumPy module? And that is the topic of the current
funding proposal is exactly that -- to provide full NumPy module. That
would be a very good start for integrating the full stack of SciPy and
matplotlib and all other libraries out there.


But also the trick is that a robust array module can go a long way alone.
It allows you to prototype a lot of algorithms on it's own and generally has
it's uses, without having to worry "but if I read all the elements from the
array it's going to be dog slow".


The last accusation is that we're trying to split the community. The answer is
simply no. We have a relatively good roadmap how to get to support what's out
there in scientific community and ideally support all people out there. This
will however take some time and the group of people that can run their
stuff on top of PyPy will be growing over time. This is indeed precisely what
is happening in other areas of python world -- more and more stuff run on PyPy
and people find it more and more interesting to try and to adapt their
own stuff to run.


To summarize, I don't really think there is that much of a gap between us
and SciPy people. We'll start small (by providing full NumPy implementation)
and then gradually move forward reusing as much as possible from the entire
stack.


Cheers,
fijal

Monday, October 17, 2011

Wikipedia, tag clutter, pypy and the dangers of bureaucracy

So, the PyPy article on wikipedia first got tagged with primary sources, then
after not so civil discussion from my side with potentially not notable.
This tags are gonna stay for the time being until someone will go ahead and
laborious work of going and trying to prove that PyPy either is notable or
will try to delete it. As far as I'm concerned the discussion is largely
irrelevant -- PyPy is a fairly notable subject to me personally and it likely
won't change because of the wikipedia article. I did make contributions to
this precise article in the past, mostly trying to be up to date, bumping the
release numbers, correcting links etc.


The reason why the article got tagged is silly -- the grand general notability
guidelines are not cut for open source projects. Indeed, there are no books
written or anything, even though on most python conferences everyone knows
what PyPy is and people are using it quite a bit. For all I know PyPy seems
not notable according to the guidelines written on wikipedia. I would put
it up for deletion myself if I were to follow the rules exactly.


But this is precisely the problem here -- putting rules, which I presume
I called guidelines for a reason, without thinking. For anyone living in
the open source world, it's relatively clear what considers "notability" and
it would be something else than for most wikipedia articles. For some
information, like compiler optimizations, the best source I can find is a
post on Lua mailing list, by Mike Pall. You can't change it - no book published
will change it. This is the original research performed and done somewhere
outside of the academia, yet pushing the boundaries of human knowledge forward.


The solution doesn't seem to be to simply establish rules for Open Source in
general. In my opinion the problem is with people who are not understanding
or refusing to understand and trying to stubbornly adhere to written rules.


What do you think?


Cheers,
fijal

Saturday, October 8, 2011

PyPy's future directions

The PyPy project was long criticised for being insufficiently
transparent about the direction of its development. This changed
drastically with the introduction of the PyPy blog, Twitter stream,
etc., but I think there is still a gap between the achievements
reported in the blog and our ongoing plans.


This post is an attempt to bridge that gap. Note, however, that it is
not a roadmap -- merely a personal opinion about some interesting
directions currently being pursued in the PyPy project. It is not
intended to be exhaustive.



NumPy for PyPy


Even though people might not quite believe that we can deliver it,
there is an ongoing effort to bring NumPy to PyPy by reimplementing
the interface pieces originally written in C in RPython. A lot work
has recently been done by Justin Peel and Alex Gaynor, and there have
been many smaller contributions from various volunteers.
This is very exciting, since PyPy is shining in numerics, which means that
with the full power of NumPy, we can provide a good alternative to
Matlab, etc.. We also have a vague plan to leverage platform-level vector
instructions like SSE to provide an even faster NumPy. Stay tuned!




Concurrent GC


There is a branch where Armin is experimenting with a simple
concurrent GC. This will offload your GC work to another thread
transparently in the background. Besides improved performance, this
should also remove GC pauses which is crucial for real-time
applications like games.




JSON improvements


There is ongoing work to make JSON encoding fast. We aim to beat the C
extension in CPython's standard library by using only pure Python.
Stay tuned, we'll get there. :-)




GIL removal


There is another branch and an advertised plan to remove the GIL using
software transactional memory. While implementing an STM inside a
dynamic language with lots of side effects is clearly a research
project, the prospects look promising. There is a risk that the
overhead per thread will end up fairly high, but we hope to avoid this
(the JIT may help here) -- and Armin Rigo is well known for
delivering the impossible.




Minor improvements left and right


Under the radar, PyPy is constantly improving itself. Current trunk is
faster than 1.6 and has fewer bugs. We're always looking at bug
reports and improving the speed of various common constructions, such
as str % tuple, str.join(list), itertools or the filter
function. Individually, these are minor changes, but together they
speed up applications quite significantly from release to release.


All of the above is the ongoing work. Most of it will probably work out
one day, but the deadline is not given. It's however exciting to see so
many different opportunities arising within the PyPy project.


Cheers,
fijal


Wednesday, July 6, 2011

How fast PyPy really is?

Martijn Faassen used to ask "how fast is PyPy", so we added the --faassen
option to compile toolchain to enable all the optimizations we had. Back then,
we didn't have a JIT and PyPy's interpreter was quite a bit slower than
CPython's (it still is), but the situation has changed quite drastically
since the introduction of the JIT. We even had to remove --faassen command
line option!


So, let's repeat Martijn's question: how fast PyPy is these days?
According to the speed website it's
3.9x times faster than CPython, but there are benchmarks where it's 6-12x
faster, 3x faster, 20% slower and so on. In addition people are asking
and asking again when PyPy will be as fast as V8 (Google Chromium's
JS engine) or Tracemonkey (Firefox's JS engine).


To answer this question really, we have to consider various categories of
benchmarks/applications we're running. I'll try to pinpoint some main groups
of such, as well as PyPy's status and approaches.



Fibonacci


To compute fibonacci numbers in a very inneffective way is the world's most
famous benchmark and everybody bases their opinions on top of this. In python
it goes like this:



def fib(n):
if n == 0 or n == 1:
return 1
return fib(n - 1) + fib(n - 2)

I would not comment much on this, good news however:
PyPy trunk is 9x faster on this
benchmark than PyPy 1.5 and we finally beat CPython!




Algorithmic benchmarks


This is the broadest category, but also the easiest one to compare various
languages in, hence websites like computer language shootout or
attractive chaos include mostly those. This is not really a bad thing
- it's just plain impossible to implement the equivalent, say, web server in
various languages, so we stick with simple, yet necessary things. Those
benchmarks tend to put a very high pressure on numerical operations.


This is also the area where traditionally Python did not perform well
and Python programmers tended to say that you should not benchmark Python
implementations on numerics since this is not how you use Python. This is
also an area where PyPy really shines compared to CPython, often featuring
10-100x speedups and where PyPy sometimes gets to speeds of C.


From my own perspective, I disagree it should not be considered - Python
is not traditionally used in
this area (because of poor performance), but maybe it's time to move on
and start using it? The real-time video processing demo at Europython (sorry
no link yet) is one example where Python can be used where it was not feasible
before. However, this is not the only area where people should be concentrating
their efforts.


This is also the only area where PyPy can be compared
against V8 and indeed it is usually slower (but it's also usually
not slower than tracemonkey). [citation needed ;-)]




Everything else


This is the category of everything else. It can be anything - template
engines, network libraries, django, our own translation toolchain -
seriously anything. There is a very common historical misconception here -
if CPython runs fibonacci 60x slower than C, rewriting for example twisted in C
would lead to 60x speedups. Going further, since PyPy speeds up twisted
by at most 2x, there is 30x to go.


It sounds simple, but it's probably very untrue. There are operations that
in C take absolutely the same time than in Python, like dictionary lookups,
or even less, because you're very likely not going to come with a more
efficient dict implementation than CPython or PyPy.


In this area using PyPy is usually a win, but nowhere near close to what
you get in algorithmic code. This is also the area where a bottleneck
can be anything or it might even not exist in a single place at all, hence
rewriting to C or shedskin or Cython might simply not be infeasible.


This is also why writing a fast interpreter in Python that speeds up everything
over CPython is hard - the bottleneck might be in string processing, regular
expressions, bz2 module, file reading, executing complex functions
with *args and **kwds, json importing or even deepcopy module. Every single
aspect of this should be fast enough or preferably even faster. This is simply
something that's in my opinion unprecedented - people don't only want a fast
language that's pretty rich on it's own - they also want a very vast and fast
standard library.




So, how fast PyPy really is?


The answer is - it depends. You should go and measure yourself and most
certainly you should come to us and complain if it's too slow. We love
slow benchmarks and we'll be happy to help you.


Cheers,
fijal


Wednesday, June 22, 2011

Making things happen one unittest at a time

Hello.


There was a lot of discussions about our (PyPy's) plan with regard
to reimplementing Numpy. I would like to give a slightly more personal view
on things as they go as well as arguments about the approach in general.


Maybe let's start with a bit of background: the numpy effort in PyPy is the work
of volunteers who either need to extend it a little or find it fun to work on.
As of now it implements a very small subset of numpy – single dimensional float
arrays with a couple of ufuncs to be precise – and is relatively fast.


There are two obvious questions: (1) whether the approach of reimplementing numpy
might potentially work and (2) whether it makes sense from a long-term perspective.


The first part I'll leave alone. I would think that we have enough street cred
that we can build things that work reasonably well, but hey, predicting the future
is hard.


To answer the second part, there are two dimensions to the problem. One is the
actual technical perspective in short-mid-long term, the other being how
likely are people willing to spend time on it. It's actually pretty crucial that
both goals are fulfilled. Creating something impossible is hard
(has been tried before), while creating something that's tedious from
the start makes people not want to work on it. It's maybe less of a problem
in a corporate environment, but in open source it's completely crucial.



Technical part


Everyone seems to agree, with varying degrees of trust, that the JITted numpy
is the way to go in the long term. What can a JIT give you? Faster array
manipulations (even faster than numpyexpr) and most importantly faster
array iterations without hacks like using cython or weave. This it the
thing you get for free when you implement numpy in RPython and you don't
get at all when using cpyext. Note that it'll still reuse all parts of numpy
and scipy that are written in C -- this is most of it. The only part requiring
rewriting is the interface part.


With cpyext:



  • short term: nothing works, segfaults

  • mid term: crappy slow numpy, 100% compatible

  • long term: ? I really don't know, start from scratch?


With reimplementing parts in RPython:



  • short term: nice, clean and fast small subset of numpy

  • mid term: relatively complete numpy implementation, not everything though,
    super fast, reusing most parts of pure C or Fortran

  • long term: complete JITted numpy, hopefully achieving a better split
    of numpy into those parts that are CPython-specific and those that actually implement functionality.


If you present it like that, there is really not that much choice, is there?


To be fair, there is a one missing part, which is that the first approach
gives you a much better cpyext, but that's not my goal for now :)




Social part


The social aspects are quite often omitted. How pleasant is it to work on a problem
and how reasonable is it to expect achieving one's goals within the foreseeable
future. This is a really tricky but important part. If your project is run by volunteers
only, it has to have some sort of "coolness" factor, otherwise people
won't contribute. Some people also won't contribute if the intermediate result
is not useful to them at all, so we want something usable (albeit limited)
from the very beginning. There is a great difference here between cpyext
approach which is either adding boring APIs or fixing ugly segfaults (with the
latter being more common and more annoying) and writing RPython code,
which is a relatively pleasant language. With PyPy's numpy we had already
quite good success with people popping up from the blue and implementing
a single piece that they really need as of now. In my opinion this is how
you can make things happen - one unittest at a time.




Personal part


I plan to spend some time in the near future working on making numpy on PyPy
happen, without any other day job. If you have a thing that requires numpy
and will greatly benefit from having a fast python interpreter with a fast
numpy, this is the right point to contact me, money can make some APIs
appear faster than others :)


Cheers,

fijal


Friday, January 7, 2011

Really long term support

Recently, I began working on building the MeerKAT radio telescope and
possibly also the SKA radio telescope if South Africa wins the bid. There is
a whole variety of analog and digital hardware running or planned for this
telescope as well as quite a bit of software, with major part being written in
Python.

One of the major requirements, which is quite different than anything I have
worked on so far, is the lifetime of a telescope - something in range of 20-30
years from the time of commissioning (not from the time when decisions were
made). And this is not a fire-and-forget project where you just want software
to work over the entire lifespan: software as well as hardware will be updated
to take advantage of new available technologies. Nevertheless some pieces won't
or can't be updated and they'll have to stay operational anyway.

This leads to certain restrictions and requirements on software being written.

First is that the software (either written in-house or written by a 3rd party)
has to be maintainable. It doesn't have to be maintained, but it has to be
maintainable by in-house team in case some obvious bug appears late in
development.

This usually means that not only should source code be available (open source is
a plus, but not necessary), but also full revision history as well as unit
tests have to be present. One example here is Python. Even if we stick to
Python 2.x, for which the end of life is scheduled somewhere less than 30 years
from now, it still has source, tests and history so at worst it can be
maintained if needed (and if we stick to it, which is unlikely).

One very worrisome example here is Flex, which is used by the telescope GUI. If
adobe stops supporting flex (there's no reason to believe they'll support it for
30 years), we're stuck with GUI running only on old machines, and spare parts
for those machines will be increasingly hard to get.

Another important part is open standards for inter-device communication.
Fortunately this is a no-brainer, since for most part devices speak katcp
protocol already, which is well documented and has open source implementations.

Does anything else come to mind?

Cheers,

fijal