Bull strong in supercomputers as result from a planned effort

4 Jun 2010 Hamburg - Today, Bull is the only European supercomputer manufacturer, delivering also TOP10 machines. We talked to Fabio Gallo to learn it is the result of a strategy started several years ago.

Primeur magazine: At this time it would be nice to get a general feeling. What is the idea behind the extreme computing part of Bull and what are your focus markets, what is your idea about Exascale?

Fabio Gallo: Let me try to cover a little bit of that. I would perhaps start with a sort of brief executive summary of where Bull has been and where it is today in High-Performance Computing (HPC). Although you probably know the history. I try to put it into perspective and tell you how we see what has happened and what is in the company, and behind it and where we are today, and then I will tell you a little bit of where we are going.

Bull has started investing into high-performance computing and has decided that this was a strategic segment, roughly six, seven years ago, say 2002 and 2003. That is when the activities were kicked off. But the company had a long history of large systems, systems for enterprise use before that. The company still has proprietary main frames, which are called GCOS. These are complex systems for complex enterprise implementations. The company is particularly good at implementing large complex IT projects integrated into complex systems. These skills are actually quite handy when you try to address the high-end of the numerical simulation market. There, you also have to deal with complex projects, bringing in several technologies, several different needs, and skills.

So there were some good ingredients, and some of the company's DNA was in the right place to do something on large computing systems at that point in time, because there was a clear need for the market, and also a bit because CEA was actually about to issue the Tera10 project and was clearly interested in having a European and, moreover, a French company involved in it. Competing for it, Bull decided to kick off the activities and put together a very strong Research and Development and service team to implement what eventually became the largest supercomputer in Europe, by the time of its installation: the Tera10, back in 2005.

That started the whole thing: a very large project ended up to be the largest system in Europe. That is where Bull started: at the top, not at the bottom with Bull systems and then gradually moving over time. It started out with a very, very big project, which the company was lucky to win, and that kick started the whole range of activities. In the following two to three years, every year, essentially the company was winning sizeable deals, with perhaps a few more small deals around it, and gradually, Bull started to have a fairly wide spread presence in the segment. When I picked up, the responsibility for this segment, in 2008, there were about 150 customers in 15 countries. So a fairly global presence, mainly through deals and contacts with the research, government, academia, that type of organizations, and for sizeable supercomputing projects. So this is how the business essentially came through, initially it was based on Itanium platforms, and then in 2007, we also had an IX86 product line, and in more recent times the decision was made to try and break out of the pure government and research focus, and strengthen the international focus, and this lead to two acquisitions: one in 2007 in France, Serviware (Het is Serviware, ik heb het opgezocht) which was the main HPC integrator in the country, at that point in time, and focusing mainly on industrial HPC. So that was pushing in one direction, which had been identified as a key direction.

The other acquisition, a year later, was Science and Computing in Germany. Science and Computing business is services, but it is services around numerical simulation, HPC, to industrial customers in Germany: the likes of BMW, Daimler. They are running their operations around the HPC infrastructure. It is a sizeable company with 250 people roughly. So these acquisitions pushed in the directions of having a stronger industry presence: one was in France and the other one in germany, so also breaking out of the geographical focus on France.

I would now “fast forward” to today. High performance computing is the fastest growing business. It has been very high double digit growth in average over the past four years, and it represents roughly 15% of the company's revenue. So it is not an exclusive focus, but HPC is getting a bigger and bigger chunk of the company's business, and again it is the fastest growing business within Bull. I would stop here for the history. This is a very, very quick glimpse of what has happened so far.

Another important thing that happened is that three years ago roughly, the decision was made, for Bull to develop some HPC products from scratch, essentially. Some of the products that Bull had used in the segment had been developed with broader utilization in mind, not purely HPC. Some of them had been acquired, let us say though OEM agreements, but more or less at that time, say 2007, the company decided that it needed to develop HPC based on IX86 technology, using commercially available components, and large amounts of open source, and packaging is one of those features. All of this is complemented special glue by Bull. Bull has special developed components, but everything that we do, is based on commercial available technology, IX86 processors, high-speed networking, for storage and so forth and so forth. With additional glue at many different levels. But at that point in time - 2007 - the company decided that its know-how in systems design, and packaging also needed to come in place in HP. We needed to have a highly differentiated product line, still based on IX86 and largely adopting packages from the Open Source community. That resulted ultimately in the Bullx announcements we made last year, right around ISC, in the June timeframe.

Our blades, for instance, are exclusively designed for the HPC market. We did not try to serve more than one market, and we do not sell them anywhere else then in HPC as a matter of fact. They are exclusively HPC products with embedded Infiniband, very high density, and with a very high memory configuration. So it is a product exclusively designed for this market. The typical HPC customer basically finds everything he needs in this machine. An enterprise customer would probably find it overkill for its needs.

This new product, actually got two awards at SC09 in Portland: one for best HPC server technology, and one for the Top 5 technologies in HPC. So it is widely recognized as a good product, not only by analysts, but also by customers. For instance, we issued a press release from AWE. The AWE procurement was based on this blade. Those blades also have a GPU flavour, they use exactly the same infrastructure, but with embedded GPU's. The GPUs when we developed them , were the Tesla from NVIDIA, but given their form factor they can also take the new Fermi whenever that comes available. It could also take other types of GPUs as well, so we made it extremely adaptable. But it is a very, very dense form factor. In a double blade, we can have two GPUs. We strongly believe in the opportunity for this type of technology.

Recently we announced a large SMP we call Bullx Supernode. The common brand is Bullx. This is the one that is used for instance to put together the TERA 100 that was announced at the event. It is a 1,25 Petaflop/s system that is going to be installed at the CEA in the military division,. Unfortunately it was powered up a little bit too late to have a LINPACK at the full machine for the current TOP500 list, but it will definitely be there in November. Now a question that may come to mind is why do we have so many product lines? Well not so many, we have a Scale-out product line with thin nodes: very dense thin nodes. We have a GPU implementation, and we still have a product line, the Bullx. They can be twin servers with different flavours. So we have many different incarnations of systems, for different needs.

It is our belief is that one size does not fit all. Different applications have very different requirements. There are applications that are perfectly OK on a Scale-out platform with thin nodes. An application perhaps developed with MPI could be perfectly OK on thin nodes and probably have the best price/performance using Scale-out. But when you have for instance very complex 3D imaging and post processing where you have to handle a very large model, and you need to handle it in memory to be able to play with it, play with the domain decompositions, you really need to have a large shared memory available.

Post processing is also an application where you need to do this. And there are some applications, that still have a strong legacy towards SMP platforms. For instance, some of the ab-initio computational chemistry codes. So there is a portion of the market which cannot be addressed by thin nodes. And incidentally, there is a use for fat nodes in highly scalable systems, which is the one CEA is doing, beyond the need of applications. The fat modes allow you to reduce the number of nodes and therefore of operating systems instances in the system, so it simplifies deployment, it simplifies management. That is why CEA, amongst other reasons, wanted fat nodes for their Petascale system. So we see a use for large SMPs. And thin nodes cannot not fulfill all the needs. Obviously for applications, that are suitable for that technology, you can reach performance per watt and performance per dollar which is unrivaled. Applications are still relatively tough to port, so it is not a trivial investment to go to fat nodes. The software environment, the development environment, is one of the things we have to catch up quickly for commercial applications to really be ported to this kind of environment, but we believe that the machine is in movement.

You will see a larger portion of the application being ported to systems with accelerators, with GPU use. At this point in time we have a fairly comprehensive set of products, covering all the different architectures, and technologies. We have a set of competences and skills and services that go with that. We developed software ourselves, including the management software for these systems. It is a very comprehensive management suite, with management and monitoring tools. For the very high end scalable systems, we have for instance a specialized MPI library that optimizes performance on very large scale parallel systems.

We also have the ability to do hosting and deliver on demand. That is an additional dimension to our offerings for HPC customers. Sometimes, customers need the flexibility of for instance deploying faster additional data centre space or resources. We can fill the gap by providing hosting facilities for instance. Bull also recently introduced a containerized solution called Mobull, which is basically a supercomputer centre on wheels. There are two things one should keep in mind about Mobull: like the other products we have in this space, this one was designed for supercomputing, not for generic data centre use.

What this means is that you can put in that container the highest density solutions that are available today. You can go up to 40 kW per rack. The racks are standard 19 inch racks with a Bull water cooled server. You can put any kind of equipment in there. Anything that will go into a 19 inch rack can be installed in this. It is not a solution that is designed exclusively for container use. You can put everything that you have in a data centre in it: you can have the compute parts, you can have the I/O part, the storage, and the communications equipment. You can have service nodes, login nodes. You can build literally a total complete supercomputer centre within one of these containers.

Eventually, the building factor is a limiting factor for many supercomputer installations. You need to adapt the size of things. Often the building takes much longer than the deployment time of the supercomputer. We can offer a solution whereby customers can use a container, while the restructuring of their data centre is going on, and then move the equipment in the data centre once that is ready. So we offer a degree of flexibility, which is becoming more and more important for customers these days.

Primeur magazine: The on-demand services, are there a lot of people doing that? Do you have a lot of customers for that?

Fabio Gallo: We have customers who do that. The customers we have in that space are actually industrial customers. And for a number of reasons,we are actually not at the liberty to use their names, because of the confidentiality and the nature of the customers.

Primeur magazine: Are they using it as a kind of test, and in an initial period, or are they also doing that continuously?

Fabio Gallo: We have several situations, usually it is for two main reasons. One is deployment: the situation where the customer does not have in-house the possibility because the data centre is full or maybe there is difficult access to the data centre for geographical reasons. So we can come in to make sure that in this situation there is a solution available.

Sometimes what we do it just to absorb peaks. You can have a situation where your computing capacity is designed not on the peak but on the average usage, and when you have a peak you need to be able to run it on a different computer system and that is what we provide. We see different reasons and different modes of utilization for the on-demand services.

Primeur magazine: And it is all Bull equipment in those centres?

Fabio Gallo: For the HPC part it is. Bull has extensive hosting offerings in enterprise, and there is a blend of products that is used. Bull products obviously, but sometimes also non-Bull products. In HPC it is only Bull products.

Primeur magazine: The development of the machines, as you said, started off with a kind of government project. So if you could tell a little bit about the future, and then especially, whether the Exascale developments would also need a kind of government injection or push, and could the European Commission or PRACE play a role in that?

Fabio Gallo: Perhaps the link between what I have said thus far, and what is coming, could be the TERA 100 project. The TERA 100 is a Petascale project of CEA. But the project started out a couple of years ago. First it was actually a joint research and development project between Bull and CEA. So CEA set out to first of all go through a research and development phase to make sure the technology that was needed to manage and operate and the performance they wanted to get out it, would be available. It was a research and development cooperation between Bull and CEA and the cooperation is still ongoing although the system has been powered up at this point.

But the cooperation is not over yet, and it is a cooperation to develop a number of technologies again for Petascale systems. Now we believe that the strong association of a technology provider, vendors, and the final users for the technologies, is going to be needed more and more. More so than in the past. The model in the past in a way - and it is a little bit of a caricature, but it is not far from the truth - was that the technology provider would develop the technology, and then hand it over to the user. They would say: now I have got this technology available, and they would build it. Obviously it was not a random effort, they had some ideas in mind on how this could be used. But it was only the technology that was available where the users could put their hand on, and then figure out how to use it, and figure out what was good about it. The users could provide feedback, provide input to the technology developers, that would eventually be incorporated into the next generation and then given back to the users. It was a long cycle basically. First development, and then usage. The progress goes with technology and architecture, generations, essentially.

Now, although the growth of the technology, if you take Moore's law as an example, has been exponential, it was still in a way in its infancy. The complexity of the systems has been relatively modest. And when I say relatively modest, I mean with respect to what is going to happen, within the next seven to ten years to get from the current situation with Petaflop/s performance to 3 orders of magnitudes more performance in the Exascale.

It is a paradigm change, in a way although you might say there is continuity, in the growth rate, but you are actually on an exponential curve. You are getting to a point where the increase is so massive, that things will need to be done in a different way. What is true today will not be true seven or eight years from now. The way to do things is not going to be the same. When you do back of the envelope calculation you see the technology and the machines we have today, is not going to be practical, when you are in the Exascale ball park.

But without dwelling into what is going to change: all bets are open at this point in time. There have been a lot of very intense discussions at ISC here in Hamburg on this. One point that is going to be certain, is that the cycle distance between technology providers and final users has got to be much smaller. This is mainly because one of the biggest challenges that is facing the community at large is how you can exploit systems of the complexity that is needed to reach Exascale computing. How can applications scale? On what technology? How can you program applications for that scale of performance? And all that given the level of complexity that is needed in the number of components, and in the systems of that order of magnitude of power. So the loop must be much shorter. Which means that the proximity of users and technology providers, must increase dramatically. It has to be cultural proximity and geographical proximity. Users and their technology providers need to work together to build viable Exascale technology before the end of the decade. The model of the cooperation is the only way forward in my opinion. This should go at all levels, including the funding of the research and development. This is not technology that you can develop and hope someone will use. It is way too complex for that. You can not just have a, “if you build it they will come” type of approach. This means that it needs to be encouraged, stimulated, at the government level.

And I need to open up a parentheses here which is important, If you look at the market today, it is clear that it is dominated by technology produced by North-America. Now this is the result of decades of investments by the US government into this technology. Investments aimed at the US market. Investments aimed at facilitating the development of new technologies by US technology providers. I have to say it the way it is. Also a certain level of protection against foreign providers. This has gone on for decades. We basically see the results, because at the same time in Europe there was not a similar level of investment and certainly there was not anything to protect the local industry. Now some people may say this is strange for Bull, a company which comes from France, and was a government owned company. I will not comment on this. Bull was a government company, several years ago, it has not been for a long time now.

But the fact is, in supercomputing, no-one has made any special efforts in Europe for a long, long time. Europe has a more complex geography, of course. The European Union is a relatively recent creation, and the national differences, still, unfortunately, play a role. The net result of this is that in Europe, there were much smaller investments, and virtually no protection for the local industry. I am not saying that protection is a good thing, by the way. I do not think it is, but the point is, that if one continent protects and the other does not, it is not fair. For decades, literally decades, on one side of the Atlantic, there were investments and protection, and on the other side of the Atlantic there were much smaller investments, and total openness. Europe has found itself, ultimately, in a very weak position in terms of its own intellectual assets. The market is dominated by US vendors today.

Some people probably think that this is an irreversible situation, or they might think this is not worth focusing on. Some people might even think, that it is bad to focus on this, because if we just leave it up to, let us say, the survival of the fittest approach it is probably good for the industry. I think first of all, that would be a bit shortsighted, because this technology is absolutely key for the progress of humankind in the end. The progress of science is linked to this technology, the process of technology in general is linked to this technology. The competitiveness of organizations is linked to this technology. It is a strategic technology. It would be again, I think, shortsighted, for Europe, not to try and play a larger role in the whole value chain. I am saying this, to underscore the differences with just being strong in software for instance. I heard several times: Europe can play a role in software. And of course, it can. Probably in that area you have no delay and no handicaps.

But it does not only have to be software. Things are going to be done in a very different way in the coming 7 to 10 years. It is a discontinuity. It is a disruption. It is a paradigm change. Whenever there is an opportunity like this, things can shift and the proximity with the end users, plays a key role in here. So European users are better off working with European technology providers. Because the loop will be much closer. It will be closer geographically, it will be closer culturally and philosophically, and may ultimately result in Europe coming back to the race, and owning this kind of technology as opposed to have to borrow it so biased from outside the continent. As you might tell, I have very strong feelings on this, as a European citizen. Obviously working for a European company, adds a little bit, but ultimately, it is really about Europe, playing a role in here. I think there is a role to play for Europe, and I think the European Union needs to recognize this. You know, things are happening, do not get me wrong. You see initiatives like PRACE, and there is clearly a strong interest in doing something around Exascale. So things are happening, but I think they need to accelerate.

Because this is really an opportunity. The market is moving to something that is completely different, that will require different skills, different technologies, paradigm shifts, and obviously as the computer power of supercomputers or high-performance computers, increases, additional, extremely complex problems are going to be targetable by these technologies. Which is good news at all levels. Good news for the target application such as a medicine to cure complex disease, or to model the behaviour of very complex organisms. All of this is virtually impossible today and it is going to be durable. Things we cannot dream of are going to be accessible, and the same goes for many other fields of applications. It are very exciting times: the disruption and the sheer scale that we are reaching right now, make it such that I think there is a chance for relabeling of the market place. An opportunity for new old players to come back in, and play an active role.

Bull is an end-to-end IT company comparable to the North-American ones. But in Europe you have other specialized players in different areas. If you look at the individual technologies, we have memory, and processing manufacturers in Europe, they are just not active in either general purpose or HPC specialized equipment. We have got companies that design processors for embedded applications. We have semi-conductor design facilities in Europe. We have companies that design processors, memory. We have got obviously big software possibilities, and there are many companies active in there. We have a few integrators. If you start scanning the landscape, you will find that there are many companies that can play a role. So I think we should not be fooled by the fact that Bull is the only end-to-end full-sized IT company in Europe. Other portions of the industry could benefit from the focus of this. From dedicated investments to foster research and development.

Primeur magazine: Thanks for sharing your thoughts with us.

Ad Emmen