Re: Why 80% of the projects on Github have no license?
Truth is, GitHub has NO CLUE how the license situation looks.
The license detection methology is pretty poor. There is no real standard way for projects to declare a license. LICENSE and COPYING are pretty common, but so is writing it directly into the readme file. The conclusions drawn here are false.
The real fact here is that 20% of projects have been DETECTED to have a license. We can only infer from that that AT LEAST 20% of the projects have a license, because there are always some projects which specify the license in a “non-standard” way and therefore were not detected. So the actual number is higher. Also, this detection algorithm fails to scan for the pretty common COPYING file, which is a very, very bad oversight. Readme files are, as far I know, not scanned either but this wouldn't be easy to implement, I fear (too hard for machines to interpret).
It does NOT follow that 80% of projects have no license. Actually, we can only infer that AT MOST 80% projects have no license, so the headline should be “0-80% of GitHub projects have no license”.
Yes, it's that bad, GitHub does not really have a clue about the license situation.
The license detection methology is pretty poor. There is no real standard way for projects to declare a license. LICENSE and COPYING are pretty common, but so is writing it directly into the readme file. The conclusions drawn here are false.
GitHub {l Wrote}:To detect what license, if any, a project is licensed under, we used an open source Ruby gem called Licensee to compare the repository's LICENSE file to a short list of known licenses.
The real fact here is that 20% of projects have been DETECTED to have a license. We can only infer from that that AT LEAST 20% of the projects have a license, because there are always some projects which specify the license in a “non-standard” way and therefore were not detected. So the actual number is higher. Also, this detection algorithm fails to scan for the pretty common COPYING file, which is a very, very bad oversight. Readme files are, as far I know, not scanned either but this wouldn't be easy to implement, I fear (too hard for machines to interpret).
It does NOT follow that 80% of projects have no license. Actually, we can only infer that AT MOST 80% projects have no license, so the headline should be “0-80% of GitHub projects have no license”.
Yes, it's that bad, GitHub does not really have a clue about the license situation.