We Need To Talk About StackOverflow


You can't carelessly copy code from StackOverflow.

No, really, regardless of all the funny memes that are being shared on LinkedIn and other places, sometimes even by staff from StackOverflow.


All source code that is non-trivial comes with a license. What is non-trivial isn't always easy to determine, but read it as: It solves a problem and it's not super obvious at first glance.

If no license is given, it automatically falls under the copyright of the author. That means, you can't use it at all. It's the author's and theirs only.

Good news is that StackOverflow has a different default license. When you signed up to StackOverflow (and all its network sites), you agreed to their terms and conditions and over there in section 6 it says:

You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0)

So, everything that you publish on StackOverflow falls under the Creative Commons license "CC BY-SA 4.0".

You can always find the link to the current legal terms at https://stackoverflow.com/help/licensing

Creative Commons Attribution Share Alike

Let's understand this license here, the CC BY-SA.

The Creative Commons project has a summary and the folks at FOSSA also created their own summary on TL;DR Legal.

The important part is:

you must distribute your contributions under the same license as the original.

Yes, the "Share Alike (SA)" part of the Creative Commons license has a so-called 'copyleft' effect: You must put your own work under the same license. This is very similar to the GPL, the GNU General Public License.

Let me repeat that: When you copy code from StackOverflow, you must open-source your own code under the same license!

Indirect Problems

As if the copyleft effect of the default license at StackOverflow weren't enough, a lot of people blindly copy code from somewhere else and paste it on StackOverflow. You can't know. The poster doesn't give you any guarantee.

Why This Is Important

If you're a software engineer or an engineering manager, you're responsible for the compliance of the code you write. There's really two aspects here:

  1. You build your software by using freely available code and as a civilized person you respect the intent of the original author. You give credit where it's required. You don't use GPL code in commercial software that you ship to clients. In the same way, you don't use CC BY-SA-licensed code in your project. This is a matter of respect between developers.
  2. There are several scenarios leading to your team and your software coming under due diligence. Maybe you got new investors, or you go public, or you got acquired by a larger company. I've been there several times. It may happen that the process scrutinizes all of your source code. Every single line. These examinations are hard. You want your and your team's code to be as clean as possible. You don't want those scrutinizers find copyleft code in your code base.

Obviously, this doesn't really apply if you develop an open-source project anyway. You need to care about the compatibility of licenses maybe, but you're not too far off.

The Situation is Bad

The situation about this today is really bad. Way too many developers don't even know about this issue. StackOverflow employees actively promote the code copy culture by sharing funny memes how copying code from the StackOverflow site saved their life or made their work so much easier. There's even research backing this up: https://arxiv.org/abs/1806.08149

There's also a much longer paper describing the whole situation in much more detail over at https://empirical-software.engineering/assets/pdf/emse18-snippets.pdf. That paper includes again a survey on developers now knowing about this and also some analysis how much code was copied.

How You Can Help

When you publish code on those sites, you can add some remark that you publish it under different licenses in addition to the default license. How to exactly do that will depend on the license you choose. Make it visible in your post.

Future due diligence processes may still flag it but at least the users of your code have your permission to use it.

If you find such code and want to use it, put the URL into your own source code. It'll make it easier to find the original source later.

Some History

How did we end up in this situation? Well, when SO started, they considered themselves a platform for questions and answers. And for that purpose the platform is just excellent. Many people share their knowledge. And for such content, the CC BY-SA is a really good choice.

Just not for source code.

The people from the Creative Commons project themselves don't consider the CC licenses to be useful for software:

We recommend against using Creative Commons licenses for software.

After a while, the people at StackOverflow also came to the conclusion that the situation is bad. If you have too much time, read the thread from 2015 when StackOverflow wanted to just relicense all existing content under the permissive (and non-copyleft) MIT license. That fired back.

Later large discussions happened around just updating the version of CC BY-SA. See