Hei ber was member of the ombudsman commission.

Some thoughts on the privacy policy

Refers to the first draft June 2008, but still applies:


As Ombudsman I have worked with the privacy policy on a regular basis. I found some points somewhat unclear and I saw that in practice, the privacy policy left room for interpretation that was extended too much. Therefore I do encourage a re-write of the policy.

I find it quite difficult to give a profound statement on a rather short notice seven days ago, but after all it's a wiki and I would like to contribute at least some thoughts and give a short comment, before the board deals with it today:

I do not mind that the policy is quite verbose and gives a broader background to the topic. This background information is important to convince the users about the importance of privacy issues and to show, how theses issues are embeded in the core values of our community. I do expect from any volunteer, who is supporting the project by becoming CheckUser or Steward to take the responsibility and read the privacy policy carefully, be it short or verbose. For the common user, we should find ways to transport the basic issues on privacy in another form. This was also adressed in the mailing list: There are many IP-contributors who do not know, what information is saved for the public - despite of a clear warning at the top of each edit-window for IPs.

Some issues have been adressed and I will not get into details concerning localisation and focus on the large variety of projects. These points can be transferred without any problems. Some major remarks follow, sorted by importance.

Section XI

Personally, I have always had some concerns about point 5 in section XI, where one case is stated, when personal information is "colleced or released":

The old text is:

"5. Where the user has been vandalising articles or persistently behaving in a disruptive way, data may be released to assist in the targeting of IP blocks, or to assist in the formulation of a complaint to relevant Internet Service Providers"

Now formulated as:

"5. Where the user has been vandalizing articles or persistently behaving in a disruptive way, data may be released to a service provider, carrier, or other third-party entity to assist in the targeting of IP blocks, or to assist in the formulation of a complaint to relevant Internet Service Providers."``

I find it a strong improvement, that it is specified more clearly, to whom data may be released in this case. However, I find "third party entity" is somewhat fuzzy. I read this as official or semi official places outside the Wikimeda Foundation projects, as schools or universities. I would prefer to name more directly, who is meant by "third party entity" and to specify more clearly, that this paragraph refers to wiki-outside entities, not to wiki-administrators without CheckUser privileges or comparable status.

There have been many discussions, what a "Vandalism" constitutes of. I find it somewhat problematic that in the mentioned paragraph, this questions remains open. It should be noted, that the vandalism / project disruption must be of a certain severeness, before a release of data can be considered.

Section VIII
"For example, when investigating abuse of a wiki, including the suspected use of malicious “sockpuppets” (duplicate accounts), vandalism, harassment of other users, or disruption of the wiki, the IP addresses of users, derived either from those logs or from records in the database may be used to identify the source(s) of the abusive behavior. This information may be shared by users with administrative authority who are charged by their communities with protecting the projects."

I assume that "users with administrative authority who are charged by their communities with protecting the project" refers to ArbCom-Members, CheckUsers or Stewards performing CheckUser - and not the the "standard" administrators, which have not identified themselves to the Wikimedia Foundation. This should be worded more clearly:

This information may be shared by users with administrative authority who are charged by their communities with protecting the projects and are authorized by the Wikimedia Foundation (for example, CheckUsers or Stewards).

If the intention of this paragraph is that IP information should also be given out to other administrators, the policy should take into consideration the individual amount of private information that is represented by the IP-adress (open proxy, large provider, company, or even identifiable person) and the extend of the disruption. IP-Information can be abused also by trusted members of the comunities. Therefore, they should by liable (i. e. at least identifiable by the foundation) and therefore, the circle of persons with whom the data is shared should be restricted.

Section IX

Current wording:

"The raw log data is kept indefinitely, but is not made public."

New text:

"the raw log data is not made public, and is normally discarded after about two weeks."

It has been noted elsewhere, that the new wording reflects more accurate the actual practice. However, in a policy, we should word the ideal state and the goals. When we are talking about data retention, a statement as "kept indefinetely" should not be made. The old text gives a least a time-goal without preventing longer saving of logs for special cases like bug-tracking or actual statistics on a larger scale. The time frame should be adapted to a practical values (1 months) and the developers should be asked to regulary check the log length and remove data that is not needed any longer. This is especially important for servers that are hosted outside the US. In some European countries, there have been court decisions about the preservation time of logfiles of webservers.

Besides of these specific points I named, some practical aspects are not adressed in the policy:

  • CheckUser results are in many projects published as far as usernames are concerned, often usernames are published that belong to an IP or a IP-range. Sometimes, such a result will link an account, which contains personal identifiable information (as the real name as accountname) will to a pseudonymous account or an IP. They might have done edits that are considered harmful to the editor, when liked to his real name. In such cases, care should be taken to protect real names without preventing the project from protecting against vandalism.
  • Not much is written about apropriateness: I wrote it abouve: The release of personal information or the linking of on identifiable account to an IP or another username might have severe implications for a person. Therefore, it should be always checked, if an access to logged user data is apropriate. Only if the disruption and/or the vandalism to fight is sufficiently large and only if the access to the data will potentially allow actions against the disruptor, such an access to data should be permitted.
  • Meta:Right to leave also deals with privacy considerations. I ask the board to consider to increase the support for users to have personal identifiable information removed that they themselves provided in the past.
  • We do have a severe problem with individual users that release private information against the will of the owner. This is not an direct issue of the privacy policy which focus on data that is collected by servers of the foundation. However, I do see a need for action. As ombudman, I received several requests by users, who were identified by others (because they sent e-mail to them or because they were not carful with their IP-edits or because of personal information they provided on user pages) and needed help. It would be desireable to have a anonymity policy that gives administrators guidelines in dealing with "outings" of personal information by other users.

All in all I do encourage to go on with new draft with the proposed changes, although I do see important issues that still need to be adressed. --Hei ber 01:39, 21 June 2008 (UTC)