Occasionally I make digital maps for customers based on OpenStreetMap data. Sometimes they request the maps in form of bitmaps, and sometimes they need a vector format like SVG or PDF in order to be able to edit the maps in Adobe Illustrator. Of course, the issue of license always pops up and often have to explain the stipulations of CC BY-SA 2.0 to them.
Soon OSM will switch to a new ODbL license. I have to admit I mostly stayed away from the numerous legal talks that were going on various OSM channels, simply because I feel much more productive coding than participating in endless strings of emails. But now that the new license is here, I need to get acquainted with it from the perspective of someone trying to make (some) living out of OSM data. “I’m not a lawyer” is the usual phrase you can see in OSM legal discussions, but waiting for one to give you some solid information is like waiting for Godot, so I’ll make judgements based on my own understanding and some common sense instead, and simplify things when I feel like it. If anyone objects, they can twitter me with their objections and I’ll try to correct things.
So let’s say a customer requests an SVG map of my home town and I decide to use OSM data for it. For the sake of simplicity the map will be based purely on OSM data, so no other sources. Let’s first look at some of the important definitions in ODbL (emphases are mine):
“Database” – A collection of material (the Contents) arranged in a systematic or methodical way and individually accessible by electronic or other means offered under the terms of this License.
“Derivative Database” – Means a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database or of a Substantial part of the Contents. This includes, but is not limited to, Extracting or Re-utilising the whole or a Substantial part of the Contents in a new Database.
“Contents” – The contents of this Database, which includes the information, independent works, or other material collected into the Database. For example, the contents of the Database could be factual data or works such as images, audiovisual material, text, or sounds.
“Produced Work” – a work (such as an image, audiovisual material, text, or sounds) resulting from using the whole or a Substantial part of the Contents (via a search or other query) from this Database, a Derivative Database, or this Database as part of a Collective Database.
“Substantial” – Means substantial in terms of quantity or quality or a combination of both. The repeated and systematic Extraction or Re-utilisation of insubstantial parts of the Contents may amount to the Extraction or Re-utilisation of a Substantial part of the Contents.
So the first open question: is an SVG map a Produced Work or a Derivative Database? Or both? SVG map is an XML file that contains projected geographical data (together with visual styling attributes). OSM XML file can safely said to be a database. If you say SVG is not a database, where do you draw the line? What about KML or GML files?
This question is important because of the next clause:
Access to Derivative Databases. If You Publicly Use a Derivative Database or a Produced Work from a Derivative Database, You must also offer to recipients of the Derivative Database or Produced Work a copy in a machine readable form of:
a. The entire Derivative Database; or
b. A file containing all of the alterations made to the Database or the method of making the alterations to the Database (such as an algorithm), including any additional Contents, that make up all the differences between the Database and the Derivative Database.
If the SVG map file is not considered a Derivative Database, then you have an option of supplying the original OSM data (OSM XML file, PBF file or even a database snapshot) together with the SVG file or providing a description of how you derived the Derivative Database.
On the other hand, I can argue that SVG is a Derivative Database because it is “arranged in a systematic or methodical way and individually accessible” and “includes any translation, adaptation, arrangement, modification, or any other alteration” of the original OSM data. So in that case simply publishing the SVG file (and only that file) would cover the license requirements.
I should note that the SVG map has to be released under the ODbL or a compatible license.
Now let’s go one step further. Let’s assume (as I do) SVG is a Derivative Database. What if I then generate a PNG bitmap (or a Web map, for that matter) from the SVG file using Adobe Illustrator and want to publish that, too? One could argue that a bitmap is a Produced Work and since we already published the Derivative Database that produced this Work, we are covered.
But what if I didn’t generate the Web map from the SVG, but used a tool like Mapnik or Maperitive and generated it directly from an OSM extract instead? Let’s say that for practical purposes I don’t want to publish 1 GB of OSM data and I choose to go down the path of describing the “method of making the alterations” I did to generate the bitmap. What are the options here?
- I could write a detailed description of steps I performed to generate the Web map. Osmosis, Mapnik with all the batch scripts etc. I could even post the source code of the program(s) I used.
- On the other hand, I could just describe the process in a sentence or two. I could also say I used a special filter in Photoshop.
I can partly understand the spirit of the “method” clause - to enable access to the interesting derivations of the original data. But I see several holes in the “method” definition:
- What if I produced the map by arranging a lot of the map elements manually, by hand? This is quite a common case when you have to place map labels in order to avoid label conflicts. How would I describe the “method” other than saying that I did it by hand? How would that help anyone?
- What if I used an expensive proprietary software (like Illustrator or Photoshop)? Or even a piece of code that I haven’t released to anyone else? In that case nobody else would be able to reproduce the method. Does the “Contents” cover source code as well? It doesn’t mention the source code explicitly. If it does, then that implies you can only use open source software with OSM data, which would be silly.
- What about complex algorithms? How detailed the description would have to be for someone to be able to reproduce the algorithm? I’ve tried reproducing various algorithms from long scientific articles and I can tell you it’s not an easy task even if you have a detailed description.
Frankly, I don’t see how the “method” clause could be enforced in practice.
(UPDATE: now that I thought about it once more, the clause only talks about describing the method of arriving to the Derivative Database and not to the Produced Work itself. So I could just say “I downloaded the OSM extract from Geofabrik” and that would be it.)
One final question: does extracting OSM data for a city amount to a “substantial part” of the original OSM database?