Free download:

Software Wars, the Movie

Soundtrack for the book:

If you enjoyed the free download, a donation of the cost of a newspaper would be appreciated.

The best book explaining free market economics:

The best explanation of how we can build a space elevator in 10 years:

Book explaining a political solution to our economic problems:

Why the US Military Shouldn't Give Amazon $10 Billion

The U.S. Department of Defense has more than 3,000 datacenters and countless applications. The US military smartly want to consolidate to save money. In March, the DoD put out a request for a proposal about a compute cloud infrastructure. Business Insider reports that Amazon is currently a shoe-in for the $10-billion contract. One reason is because Amazon has already signed a $600M contract with the CIA to run a more secure datacenter in Washington, D.C.

Bryan Crabtree wrote an article talking about why it is a mistake for the U.S. military to choose Amazon. Many don’t realize there is already a free solution to this problem, called OpenStack.

One of the benefits of the free software movement is that you can download the code and run it on your own computer. You can leverage the advancements of industry and academia inside your datacenter.

There are great free codebases for managing a private cloud, such as OpenStack. It is a system written in Python that NASA helped create.

With OpenStack, the U.S. military could cheaply and quickly consolidate its existing computing datacenters into a more reasonable number. Presumably, some of the remaining ones will be under mountains.

Linux, and free and open-source software are taking over the world. It is a better way to write software, in the tradition of science. It is actually an idea both Marxists and libertarian economists can agree on.

The problem is that Amazon is mostly a parasite of the free software movement. For example, look at Amazon’s repositories for Alexa. What you see are free Javascript samples to plug into their proprietary system. The underlying code and data to understand your speech, and predict what you wanted, are not built in an open way. This is ironic because Amazon Web Services is largely a packaging of free software.

In the case of the cloud, Amazon’s open-source repositories mostly contain ways to connect to its servers. It’s primarily a free SDK, in many different programming languages, to talk to Amazon’s proprietary infrastructure.

People have built cross-platform cloud APIs. There’s a popular library written in Python called libcloud, which supports 30 cloud providers. Amazon on its website recommends the custom ones they built.

Moving applications to an external cloud is a security risk. Even if the data is encrypted over the wire, and on the hard drive, the application server processing the data has all the information un-encrypted. Putting all servers in one external cloud means one bad computer or software virus could steal all secrets.

The Defense Department can always afford soldiers to guard and maintain its computers. It’s easy to run a datacenter compared to an aircraft carrier. A modern server can handle 1000s of users. It’s easier to take apart a server than an M-16. The military would never outsource basic gun maintenance.

OpenStack is a cloud operating system, so not trivial to setup. However, there are plenty of resources in Cyber Command, and expertise at NASA. The NSA knows the backdoors to defend against. It created many of the exploits released in Wikileaks Vault 7. None of the innovations to make computers secure were pioneered by Amazon. The federal government has the collective expertise to build a far more secure cloud.

The RFP mentions a requirement for 50 petabytes (or 50,000 terabytes) of online storage. This is a huge number to anyone in the computer industry, who usually deals in megabytes or gigabytes, but you can buy enterprise-grade 6T drives for $300. You could buy all the storage for $2,500,000.

Note that 8,300 hard drives is probably not enough. You will want more for redundancy, and even more for performance. Some drives could be so busy that you want to make copies or split the data up to handle the load. So you could triple the cost to be safe. It’s still far less than one F-35 airplane.

The RFP mentions needing 46,000 compute cores. On Dell’s website, you can purchase a 16-core server for $4,000. It would take 2,800 of those servers, and cost about $11 million. That is still less than an F-35.

These are just partial estimates, but it does put the $10 billion in perspective.

A datacenter in D.C. means that the application servers that used to be close to the soldiers will now go much farther. What used to be milliseconds to a server nearby will take 100 milliseconds one way.

Even assuming that there are no security risks by sending all traffic this extra distance, it will definitely slow down performance, no matter how fast the servers. Instead of having local servers and small pipes to talk between datacenters, they will have to build big pipes all around the country.

Past experience with cloud shows that it only makes sense in some workloads. Amazon’s strength is elasticity, which is great for social media apps and games but overpriced for stable enterprise apps. In addition, there are other reasons not to trust Amazon as a company.

You don’t even have to move all servers to a central location to manage them centrally. It’s only in the proprietary world where you get locked into the mindset that you need to run on someone else’s hardware to get ongoing security maintenance for the software.

Moving to a commercial cloud won’t even save that much money on I.T. The big expenses of an I.T. budget have to do with the number and complexity of the applications and the people to maintain them.

The U.S. Army has more than ten command and control systems. It also presumably has ten companies or teams it’s paid to create and maintain the software. It could move all its servers to the cloud, and it’ll still have ten proprietary command and control systems to maintain. The U.S. military could give Amazon $10B and still have a broken computing infrastructure.

PyTorch Should Be Copyleft

Neural networks have started to take off since AlexNet in 2012. We don’t have to call it a software war, but there’s a competition for mindshare and community contributors in neural networks.

Of course, AI needs more than a neural network library, it needs the configuration hyperparameters, training datasets, trained models, test environments, and more.

Most people have heard of Google’s Tensorflow which was released at the end of 2015, but there’s an active codebase called PyTorch which is easier to understand, less of a black box, and more dynamic. Tensorflow does have solutions for some of those limitations (such as Tensorflow-fold, and Tensorflow-Eager) but these new capabilities remove the need for other features and complexity of Tensorflow. Google built a high-performance system for doing static computation graphs before realizing that most people want dynamic graphs. Doh!

And how much do you trust Google, anyway?

PyTorch was created by people from Idiap Research Institute in Switzerland, who went to Facebook and Google. Doh!

I posted a bug report on the PyTorch license, asking for a copyleft one:

I think you should consider a copyleft license. I realize it’s a pain to change the license, but it never gets easier. I read the license and it’s mostly a disclaimer and a warning. There’s nothing in there about protecting the freedom of the users.

There are lots of projects with lax licenses that are successful, so maybe it will work out okay, but the Linux kernel took off because of the copyleft license. It nudges people to give back.

Lax licenses let companies take advantage of the individual contributors. I don’t understand how someone who believes in free software also believes letting big companies turn it back into proprietary software is fine.

I realize lawyers might like that, and proprietary software companies might want it, but this group is more than just those people. It’s great you’ve got 100s of contributors already, but if you know the way corporations work, you should be pushing for copyleft.

My bug was closed within 8 hours with the following response from a Facebook employee:

we’ve definitely thought about this in the past. We have no plans of changing our license.

The bug was closed but I could keep commenting:

When you say “we”, are you talking about Facebook or the random smaller contributors? Given you work for a large company, I hope you realize you could be biased. At the same time, you should know the way large corporations work even better. You won’t be there forever. Copyleft is stronger protection for the software and the users, do you disagree?

When you say “thought”, have you written any of it down with a link you can post for archival purposes? That way if others come along, they’ll have a good answer. I may quote your non-defense of your lax license in my writings if you don’t mind, but I’d prefer if you gave me a bit more.

I just spend several minutes looking for a discussion on PyTorch license, and came up with nothing except another bug report closed with a similar short answer.

Your last dismissive answer could motivate people to create a copyleft fork!

I got one more response:

We = the authors of the project.

“thought” = this is a topic that came up in the past, we discussed it among ourselves. I don’t have it written down, we don’t plan to have it written down.

I wrote one more response:

It don’t know any of these names:

I don’t know who the authors are of this project, and how much is big companies versus academics and small contributors, how much interest there is in making a copyleft version, etc.

BTW, relicensing would get you plenty of news articles. It’s also tough because Facebook doesn’t have the same reputation as the FSF or EFF for protecting user’s freedom. The Tensorflow license is lax also so you don’t have that competitive advantage.

To some it’s a disadvantage, but it did make a difference in the Linux scheme, and you would hope to have your work be relevant for that long, and without a bunch of proprietary re-implementations over time that are charged for. The lax license could also slow software innovation because everyone is mostly improving their secret code on top.

LibreOffice was able to convince a lot of people that a copyleft license was better than the OpenOffice scheme, but I don’t know what people here think. One interesting data point would be to find out what percent of the patches and other work are by small contributors.

Anyway, you’ve got a cool project, and I wish you the best, partially because I don’t trust Google. Tensorflow is just some sample code for others to play with while they advance the state of the art and keep 95% proprietary. It also seems they made a few mistakes in the design and now will carry baggage.

There is a deep learning software wars going on. It’s kind of interesting to almost be on the side of Facebook 😉

It’s a shame that copyleft seems to be losing mindshare. If the contributors who like copyleft lit some torches, and created a fork, or threatened to, it could get the attention of the large corporations and convince them to relicense rather than risk the inefficiencies, bad press, slower progress and loss of relevance. Forks are a bad thing, but copyleft can prevent future forks, and prevent people from taking but not giving back.

Whether a PyTorch fork makes sense depends on a number of factors. The LibreOffice fork was created because people were unhappy about how Sun and then Oracle were working with the community, etc. If the only thing wrong with PyTorch is the lax license, it might become successful without needing the copyleft nudge, but how much do you trust Facebook and Google to do the right thing long-term?

I wish PyTorch used the AGPL license. Most neural networks are run on servers today, it is hardly used on the Linux desktop. Data is central to AI and that can stay owned by FB and the users of course. The ImageNet dataset created a revolution in computer vision, so let’s not forget that open data sets can be useful.

A license like the GPL wouldn’t even apply to Facebook because the code runs on servers, but it would make a difference in other places where PyTorch could be used. You’d think Facebook could have just agreed to use a GPL or LGPL license, and silently laugh as they know the users don’t run their AI software.

Few people run Linux kernels remotely so the GPL is good enough for it. Perhaps it isn’t worth making a change to the PyTorch license unless they switch to AGPL. Or maybe that’s a good opening bid for those with torches and pitchforks.

I posted a link to this on the Facebook Machine Learning group, and my post was deleted and I was banned from the group!

I posted a link to the Google Deep Learning group and got some interesting responses. One person said that copyleft is inhibiting. I replied that if keeping free software free is inhibiting, there isn’t a word to describe the inhibitions with proprietary software!

One of the things I notice is that even though many people understand and prefer copyleft, they often encourage a lax license because they think other people want that also. There are a lot of people pushing for lax licenses even though they actually prefer copyleft.

People inside Facebook and Google know the pressure to write proprietary code better than those outside. They should be pushing for copyleft the most! On Reddit, someone suggested the MPL license. It does seem another reasonable compromise similar to LGPL.

AI in Microsoft Office

This is an open letter to the LibreOffice-Discuss alias.

Hi all,

I came across a Microsoft AI video that I thought was interesting and food for thought here. The entire video is long, the link takes you directly to a demo of AI features in Office:

It shows an auto-designer, a better grammar checker, Intranet search and easy copy/paste, nice pen gestures, and analysis of spreadsheets (trends and outliers). He also references some other AI-powered features (quickstarter, researcher, my analytics, ink to math, ink to shape, math assistant)

There’s been a lot of progress in deep learning in the last few years. It is arguably overkill in many situations, but there are a lot of people working on it, and they are making continuous breakthroughs. There are some powerful Python libraries to consider integrating better into LibreOffice (perhaps via extensions) such as scikit-learn, nltk, PyTorch and Keras. There are C/C++ machine learning libraries that could be worth putting into LibreOffice also, and perhaps it has some already with the Calc solver.

It would be interesting to see a Deep Lightproof or other intelligent machine learning features one day. I tried to get the Java-based Language Tool working on my machine and didn’t manage, and it made LibreOffice stutter, and the UI was tiny / unreadable, etc. I don’t know if anyone has talked to them about the benefits of Python 😉