Approval for Google search history analyzer

Thanks @wolfgang8741!

Good points regarding the additional warnings, I’ve now added them in our front page.

Regarding your second point, @gedankenstuecke , who is co-running the project, and I feel there is a misconception about the point of releasing the data: Uploading the data to Open Humans does not release the data to the public and the google search history upload project here does not do any analysis of the archive contents itself. Instead there is a Jupyter notebook provided to analyse the data through the Personal Data Notebooks - and here people are in full control over the code and outputs. As a result of not wanting to snoop on the data that’s uploaded, there are no tools for scrubbing the data included in this project.

1 Like

I’m approve on the private jupyter notebook exploration via jupyter notebook, but deny of the ability of the raw zip to have the option to be released on OH unless sufficient risk mitigation can be formally implemented. Such as a tool to scan the search history for information at high risk for abuse. The temporal nature of the data doesn’t allow recall of sufficient quality to guarantee that sensitive information isn’t disclosed by accident.

1 Like

To follow up there is high value of levels of disclosure, thus sharing with trusted parties, but not the general public. Means to grant access to the data through data sharing agreements tracked to users may be a way to enable broader sharing with researchers (professional and individuals) if desired while lowering risk to the individual sharing. Given the sensitive nature of this data - stronger account identity verification for those wishing to access the stored data might be worth considering on the OH side not project side.

1 Like

Catching up – I want to first capture some conversation we had with @wolfgang8741 in Slack so it’s recorded here.

Data scanning and filtering

@wolfgang8741 was concerned about filtering/scanning the incoming search data to render it more safe. In our discussion, I clarified that I consider this to be outside the scope of what the Open Humans will do – but that a project itself could be performing these scans.

To TLDR Open Humans policy on this:

It is the responsibility of a data source project – not Open Humans – to do data scanning/filtering prior to placing the data in a member account.

And the situation here is:

This project currently does not perform any scanning or filtering of the data file.

Public data option

Another major (and interrelated) concern @wolfgang8741 had was related to the ability of Open Humans members to make data public on the site. His view was that members were being exposed to too much risk of inadvertently making data public – i.e. they would not have done so, had they better understood the content of the data and associated risks.

My addition to this: it’s worth noting that anyone could in theory publicly share their search history data – at issue here is the extent to which Open Humans is making it “easy” and whether this has been sufficiently balanced with a process that provides sufficient understanding.

Additional confirmation for public data sharing

In Slack, I noted we might imagine a pop-up confirmation on public data sharing for high sensitivity data sets that alerts a member to unusual sensitivity. At the time I envisioned it as an extra field description that the project could optionally fill in, resulting in an extra confirmation step.

Another option is to disable this option entirely (easier to implement). Philosophically, the latter feels like it’s at odds with the ethos of Open Humans allowing individuals to choose to share.


I think I have some concerns that stem from the above, but for this post I’m going to try to stick to a recap of what was discussed in Slack. So I’ll pause, and ask for others to please add comments if I missed anything. Thanks!

1 Like

Thanks @madprime, yes our Slack conversation clarified the burden is on the project not OH to ensure data is safe for release.

TLDR

  1. Accept project for personal exploration - Temporarily disable public sharing of search history data until a review of design for handling informed consent of elevated risk data is conducted and community concerns are addressed.
  2. Develop criteria in for which data is deemed elevated risk.
  3. Identify what additional details and process should be presented to the user before the feature of public access is granted for elevated risk data. (I don’t want to block this feature, but want to make sure people are aware of the risk known and unknown).

Yes, the design on how OpenHumans can facilitate informed disclosure should be discussed further. I think for now the search history sensitivity should be considered great enough that public disclosure be temporarily disabled for a design review of the OpenHumans platform stewardship for access of elevated risk data. I don’t find in this case a simple pop-up is sufficient, but maybe with a at least one quiz question about possible risks is a step in the right direction. I’d suggest questions to be suggested by project, and reviewed by the OpenHumans community. Following best informed consent I’ve seen - Harvard Personal Genome Project Quizzed consent.

My justification for suggesting temporarily blocking only the public access to the search data until a review of design is conducted and if necessary corrected stems from the nature of behavioral information such as Internet search history. The dangers lay in not only the points made about direct easily accessible sensitive information, but also once disclosed it cannot be revoked. The currently available and future developments of techniques and and tools yet to be developed that create new inferences and compiled profiles about data not directly disclosed in the search raw data are reason enough to take a cautious approach to the ease of disclosure through OpenHumans. It will help to demonstrate how OpenHumans can be a trusted platform for sharing of data with informed consent.

We can look to the public access practices at current qualitative information repositories such as Qualitative Data Repository at Syracuse, Inter-university Consortium for Political and Social Research (ICPSR), etc (https://www.ukdataservice.ac.uk/get-data/other-providers/qualitative/international) for other design suggestions. These implement a trusted level of access in between private and public disclosure. This though can be discussed elsewhere and link.

1 Like

@wolfgang8741 that’s an interesting feature idea – a data type could have an associated quiz, custom to the data type!

While “public data sharing” currently has a learning & quiz process, a la PGP, it isn’t something that will re-occur or be specific to data types, and this is running into limitations. Based on PGP & other evidence in consent processes: quiz questions seem to be the most effective way of getting someone to pay attention. Or perhaps, the only method that has solid evidence – simplified language and other techniques often have no effect. (Which is why Genevieve Genome Report also implements a quiz.)

But #1 is simple and easy to do.

The ramification for #1 is #2, yes. “How is a decision reached regarding data sensitivity?” In particular where do we place responsibility for the decision (and thus, liability). If the responsibility for the decision were placed on projects, I’m concerned they would universally claim high risk to avoid liability.

But slippery slope concerns aren’t good justifications for refusing to draw lines.

To float an answer #2: I think the decision making process is … here. The same community review process.

To whit: is anyone here arguing members need the ability to make this data public? If so, speak now!

If not, I think the decision for #1 is reasonable. We’ll have to add a feature to the main site, and I propose this policy for it:

Open Humans will have a field that can be set by admins, to prevent a data source from being publicly releasable. This will be set if a project approval process determines it should be in place.

#3 is more complicated and related to the quiz idea. I like separating that idea out (which is a nice one!) from a simple solution that makes it easy to move forward here.

How do people feel about the above decision & policy proposal?

1 Like

re:#2 I agree the community review process is what should trigger the elevated risk and consent verification.

1 Like

Thanks @wolfgang8741!

As there has been no further feedback, I think a consensus has been reached. (If you feel otherwise, please let us know in the #project-approval channel in Open Humans Community Slack!)

I consider this project approved, but enacting that approval is postponed until we have deployed and implemented a feature that will be used to prevent public data sharing for this project’s data.

And I’ve gone ahead and added an issue for this feature to the main repo here:

1 Like

Following up again to complete this… The relevant update to code has been merged and deployed.

And now, this project has been modified to preventing public data sharing, and I’ve marked as approved.

Thanks everyone, especially to @wolfgang8741 for lots of thought on this!

1 Like