Approval for Researchers mobility data analyzer

Yes, from my understanding the goal is to also return some of the processed data :slight_smile:

1 Like

Thanks for the question!

Yes, the goal of the project is to collect both:

  • data from long-distance flights (extracted from the google geolocation data)
  • metadata (the researchers travelling) collected from the google form.

I can change it and make it clear in the project description.

Hi, everyone,

we made final changes to the project:

  1. we changed the survey form
  2. we also added more description to the project and how we plan to analyze the data. Please have a look at:
    https://www.openhumans.org/direct-sharing/projects/on-site/mobility-data-of-researchers/

Thank you for your time! :seedling:

I have an open-ended question I feel should be asked… Google Location History is pretty rich data, and potentially sensitive, isn’t it? What’s your intuition with how this data is going to be helpful beyond simple surveys of where people travel?

Also, @gedankenstuecke you’re mentioned as a project leader, is that correct? I find the long list of “leaders” confusing since my impression is that this is led by @liubovv …it’s good to know “who is in charge” for accountability (if there’s a data breach, who’s fault was it?), especially if asking for sensitive data. thanks!

Thank you @madprime for the question.

Regarding your first question about:

What’s your intuition with how this data is going to be helpful beyond simple surveys of where people travel?

The initial idea of collecting the data of researchers mobility was to explore the untapped opportunities of traveling researchers in order to help to connect them to local science outreach and citizen science projects (as we do with “Lecturers without borders” www.scied.network )

From previous small scale survey with Marie-Curie Alumni researchers we found that there are many researchers who travel between European and Asian and this can lead to new essential connections between science and science outreach projects in these countries. So, yes, we plan to go beyond the simple surveys of where people travel.
Hope I understood the point of your question.

Regarding the other one:

Also, @gedankenstuecke you’re mentioned as a project leader, is that correct? I find the long list of “leaders” confusing since my impression is that this is led by @liubovv …it’s good to know “who is in charge” for accountability (if there’s a data breach, who’s fault was it?), especially if asking for sensitive data.

Yes, we discussed the data collection for this project with @gedankenstuecke who is starting soon as CRI, also we did common jupyter notebook for future analysis of mobility data. But I agree, that I will put myself then as the main project leader, who has the responsibility for any questions and issues with this project and others I will put as collaborators, with whom we will discuss the project.
Hope that this is ok for you.

Let me know if there are any other unclear points.
:writing_hand:

One piece of feedback; this text would be much better on the project page itself (currently it only appears after you click “Join”)

Why might I want to join this project? What are its goals?

The project is investigating the main research questions:

  1. What is the potential of the untapped connectedness and the connectivity of traveling researchers in the world?
  2. How to improve the communication between researchers and general public, using the “data analysis for social good” and knowing that researchers travel from place X to place Y?

Re: Mad’s point about Google data:

I have an open-ended question I feel should be asked… Google Location History is pretty rich data, and potentially sensitive, isn’t it? What’s your intuition with how this data is going to be helpful beyond simple surveys of where people travel?

I feel like it is probably much finer-grained than needed for this project as I understand it but still may be useful, and project members should be allowed to share it if they think it will help the project… With the caveat that the project page clearly explains just how detailed Google Location History data is.

…or maybe that’s something that should happen in the description of the data source so that all projects that request it have the same level of warning, @madprime?

@beau I think it’s good to want an explanation of data sensitivity at both stages – data-source & data-recipient. Reminders are good, and data-recipient projects should demonstrate awareness of what they’re asking for.

I asked @gedankenstuecke about it: “Can’t you just ask someone to fill out a form about where they’ve been?” and he explained how hard it can be to remember all travel, and dates of travel. So, point taken. :slight_smile:

(This reminds me that we had an updated layout we would like our in-house data source projects to use to help with clarity for things like this, but not all the projects were updated to use it.)

@madprime @beau
yes, this is good point about people forgetting where they have been.
we did small studies about Marie-Curie researchers about where they travel, but it was quite small and not using any data collection methods from geolocation:

we also did notebook with @gedankenstuecke on mobility analysis which can be used for people who would like to analyse more “how” they are traveling. this notebook can be used for analysis of individual trajectories since i applied some stochastic methods for analysis of open data from openhumans.org

https://github.com/Liyubov/mobility_analysis/blob/master/Analysis%20of%20human%20mobility%20trajectories%20%23%20open%20humans%20data.ipynb

@beau I agree that it should be much more finer-grained, but somehow because there is no app (at least I do not know any, maybe we should create one:writing_hand: ) which records “To Which conferences or just countries you went to?” so I cannot really ask for such data.

if there is more explanation needed on “how we are going to analyze the data and use it for social good” I can certainly add it to the project description!

thank you for your time!:herb:

@liubovv I think you could address @beau’s concern by expanding information in the section: What data will you have access to?

I agree that a reminder is a good idea – in case people forgot, they should be reminded that this is detailed location data (GPS) and it is sensitive data. I don’t think you plan to use it for these purposes, but it can be used to infer the addresses of someone’s home and workplace, and the address of any other locations they visited, and the times they visited those places.

I encourage you to say this explicitly – make it clear what the data could be used for, even though you don’t plan to use it in that way. If you think this makes it “scary” you need to also explain why people should trust you. Then their decision to share it is an informed one. :slight_smile:

Dear Mad,

thank you for your answer.

I will update the text on the project in the section “What data will you have access to?”

We will analyze travel/movement history from Google Location History, along with the survey data from google-form, we will collect the meta information in google-form survey. Google form survey is optional, it adds meta information for the data provided.

The detailed location data (GPS) is a sensitive data, for this reason we will take special care of your data. The data won’t be shared with any third parties and all analysis will be done by researchers who are responsible for this project (see below). Moreover, we specify that the results of analysis of geolocation data won’t mention any information about people contributing to the project.

The results of the project “Researchers’ mobility” will help researchers and educational NGO projects (such as “Lecturers without borders” and local NGOs in developing countries), to identify places around the globe, where knowledge (lectures or seminars) can be delivered. Since most of educational NGOs worldwide do not have access to such mobility data, the project on “Researchers’ mobility” on Openhumans can help them with providing this essential additional information based on the depersonalized data analysis.

Thanks for your comments on the updated project https://www.openhumans.org/direct-sharing/projects/on-site/mobility-data-of-researchers/

Thanks @liubovv but I apologize if this wasn’t clear, I’m suggesting you should explicitly say what the data could be used for. Which is to say, include a statement like this:

This data is potentially sensitive, as it could theoretically be used to infer the addresses of your home and workplace, and the address of any other locations you visited, and the times you visited those places.

or something like that. I don’t think it’s enough to say it’s “sensitive”.

Hi, Mad,
thank you for clarification. I corrected the information in the project description and wrote there.

’ Geolocation data is potentially sensitive, as it could theoretically be used to infer the addresses of your home and workplace, and the address of any other locations you visited, and the times you visited those places. We state here that the materials here are not used for estimation of personal information about the most visited cities. The project aims at collection of the information of general travel trends of people (in particular, researchers).’

Let me know if this is not enough and some other information is needed.
I also had some ideas about inserting notebooks to the project description to demonstrate how I will analyze the data, but maybe this is advanced and I can do it later separately by sharing the notebooks with Openhumans community.
Thank you!

that seems good to me @liubovv, thanks! I think approve is reasonable.

any votes from others e.g. @beau? :slight_smile:

1 Like

Responding to @bastian’s request to review. I think the project is reasonable and the dialog here has been productive. My vote is to approve. (@beau’s advice about sensitive data warning - what is being asked for and what risks does it pose - seems important to me and glad this was addressed.)

Alright it took a while to finally catch upon the state of the conversation for approval, but here are my thoughts. Overall I am favorable of the project for approve with minor refinement of details for “analysis” statements. These could be more specific at reflecting information in the project proposal regarding resolution of information of interest, what kind of analysis might be expected, and what outputs are expected rationalizing the collection of the data.

  1. @madprime Regarding the sharing of the fine grain GPS data from the Google Location Upload - This highlights that an enhancement to the upload would be to provide a data processor on the raw upload to offer levels of granularity for apps - isn’t a solution for now, but a possible modification of the openhumans Google Location Project to address access to only access to the level of granularity necessary for an app or study. The added possible benefit of preprocessing of timelines would be offering turnkey data for an analysis at new resolutions. Also offering limits to the needs of projects, ie what timeframe is necessary for sharing (last week, last month, a user’s entire digital life?).

Since this project is only interested in scientists, what metadata would be necessary to determine the start date of an analysis for example even though the data doesn’t go far enough back for all, some are young enough scientist where the data may go back before they are a scientist. I don’t see in the survey or other collection how that may be determined in an analysis, but that is just something to think about.

  1. Regarding the Overall project - the survey currently is a bit odd in that it either combines Asian and Australia in the locations where you travel private and for conferences or missed that continent as an option… Antarctica I guess should be included too to be inclusive of continents though other is an applicable space for these too.

  2. The details of the analysis and product are very vague leading to me to ask why share this data and what is going to be done with it specifically. In many places analysis is being used where a more specific thought on what is being looked at could be placed. Even though this is not a study, some bounds or descriptions of the interest / types of analysis could be specified to give some detail on what data is actually being worked with at what resolution. For example local is a relative term and the GPS data could be much finer than neighborhood, city, region, country level processing for analysis. What level does the project agree will be the limit to their resolution of analysis?

What would the “general results” look like as a product of one of the “analyses” and what “additional information” is expected from this dataset so we could identify what analysis is/may be done before sharing? While I understand this review is not a study, but there should be some presentation of what types of analysis will be done before a user consents to sharing the data inclusive of what resolution the analysis is intended. In the proposal the project states interest in the city level analysis, but the project join page mentions the fine grain GPS access is granted, but does not state that the project’s interest is only at the city level. Add what level the analysis intends to limit the GPS resolution…

past and upcoming travels from cityA -> cityB of a scientist
metadata: scientific field.

Additionally the project proposal draft stated it needed an ORCID and planned to look at frequency of visits to a location. The “analysis of the frequency of trips to conference destinations” is a more useful specification of what analysis may be done rather than only stating “analysis”):

On a more nitpicky side -

  1. How does one judge the “potential of the untapped” in this project?
  2. Where is the ORCID collected? (pardon if I missed this)

analysis of our travels and meta-information of who is traveling can provide more information about how we can use it for social good

This is an odd statement the way it is currently worded. I think you mean "By being able to understand the movement patterns of scientists we can more effectively target efforts to link scientists with socially beneficial activities and organizations.

Overall approve with some considerations for the above statement of additional details about the analysis intended and descriptions of intended products of the research that follows a more definable structure than “provide more information”.

Hi,
thanks for reviewing and suggestions, @wolfgang8741, I made various changes and I tried to answer to you and explain more in details something. Let me know if something is still needed to be clarified.

In general, one of the biggest interest for collecting and studying this data is that we can not just study the data about the mobility of mass of people (as it has been done in the past by big telecom.companies etc.), but look at patterns of particular mobility, e.g. how researchers or educators are traveling around the world.

Why these categories? First, because in general, people of these professions are responsible for knowledge transfer (conferences participation, workshop for doctors etc.) and it would be very important to learn how they are traveling and what could we do together to use this knowledge, which belongs to society.

The added possible benefit of preprocessing of timelines would be offering turnkey data for an analysis at new resolutions. Also offering limits to the needs of projects, ie what timeframe is necessary for sharing (last week, last month, a user’s entire digital life?).

Indeed, I can clarify the the timeframe - usually it starts with one week of stay and is unlimited. But information about any stay in the place is relevant, in general.
Another important point is how to collect the information about places, where people stayed and had free time (which could be later used for social engagement or citizen science projects, which is essentially one of the goals of the project).

Since this project is only interested in scientists, what metadata would be necessary to determine the start date of an analysis for example even though the data doesn’t go far enough back for all, some are young enough scientist where the data may go back before they are a scientist. I don’t see in the survey or other collection how that may be determined in an analysis, but that is just something to think about:…

The problem here is that there is no real basis (except Easychair data which is not open, as far as I know, maybe @madprime @gedankenstuecke know some other databasis like this). Although indeed I could have suggests to submit Easychair data additionally, but this may be a bit difficult since not every conference is using Easychair.

How does one judge the “potential of the untapped” in this project?

There are many ways. For example, there is one project, which is mapping places around the world, where people try to identify places needed for volunteering e.g. https://www.hotosm.org/ and this may be useful for identifying untapped opportunities for visits by people.

@wolfgang8741 I also adjusted the question in the questionnaire:
“What are the countries are you working with? (but maybe have not visited yet)”

Where is the ORCID collected? (pardon if I missed this)

We collect Openhumans ID, and if the person participates in the questionnaire and leaves the information about his/her research institute, we deduce that a person is a scientist. This is good point, I also included ORCID.

I also changed the google form for questionnaires to make it more precise with countries (not continents you visited). Thank you!

analysis of our travels and meta-information of who is traveling can provide more information about how we can use it for social good

Yes, I am going indeed to change it, since this connotation has been miss-used recently and can be therefore understood in too many different ways. I changed it, thank you for the comment!!
:seedling:

@liubovv Sorry for the reply delay - response was buried in my inbox. The changes are moving this forward, its looking much clearer. Most of this is comments on the survey revisions to ensure you’ll get the data you need and to clarify previous points.

Yes and what I also was trying to get at here is given you’re giving the entire Google Location history how do you know when a person is classified as a “scientist” is it a date after they earned a degree or is it the date they entered a job or role as a “scientist”. This was a prompt to see if you have a measure to cut off what part of the location data timeline for the analysis. To look over an entire timeline it would include more than just their “scientific” career in some cases for those just earning their degrees may have longer history on Google than their scientific career. It would be good to think through possible analysis for what data is needed. While the ORCID is being collected it is not required that the education or other relevant information for this would be included for this threshold of where to stop analysis of the location history and it would be easier to identify this threshold up front than have to get that in the future. This is about how are you scoping the location analysis to not include irrelevant times or if you’re planning a full timeline analysis that you’re transparent why the entire timeline is being used.

What additional information might be helpful in looking at travel may be asking “What conferences and workshops do you attend?” so they can be compared to locations traveled which may allow predictive travels and identification of possible connections. Some of these can be inferred from publication, presentation, and other history, but explicit statements might include those not included in publications. This would also allow for identifying the conference geographic presence and identify potential future locations that may be visited as well as expose bias in the responses to certain researcher communities.

What I’m trying to get at with the prompt was that question 1. “What is the potential of the untapped connectedness and the connectivity of traveling researchers in the world?” is overly broad and hard to say it has been answered since we don’t know if we reached the potential (maybe the question just needs wording clarification). Did you mean? “What is the potential of the untapped connectedness of traveling researchers in the world?” Even with this revision this is more an exploratory statement which maybe stated as “What opportunities exist for new connectedness and knowledge sharing in the global traveling behavior of researchers?” in which you’re trying to identify the opportunities based on the connectedness instead of measuring the potential

Comments based on this new survey revision:

How can we benefit from our travels? (analysis of depersonalised information about our travels and meta-information of who is traveling can provide more information about how we can use it for social good)

Who is “we” referring to in the above question in the investigated? I would reword to state who “we” is or restate the question. I think “we” is referring to scientists, but is this the case? I may be over interpreting.

Authorization for adding data.
This project plans to add data to your Open Humans account, described below.

Not all persons will have an ORCID thus you may want to state “if provided” for the ORCID in the returned data. For those who are Openhumans members but don’t want to disclose their identifier I wonder if Openhumans has a way to provide different datasets. One with the ID and one without and tag

Example of the dataset collected: travel cityA → cityB of a scientist (metadata: ORCID of a scientist who travels).

Errors with the Google Form:
Currently the question wording needs adjustment and selection type need fixed: “What are countries you visited for CONFERENCES this year?” -> “Which countries have you visited for CONFERENCES this year?”
Selection of this question is currently uses a radial selection which is limited to one, but the way the question is stated suggests it should use a multi-select checkbox.

Currently the question wording needs clarification and the selection type need fixed: “What are countries you have been with PRIVATE VISITS this year?” -> “Which countries have you visited for PRIVATE VISITS in the past year?” Also I’m not sure what private visits are. Do you mean personal travel or non-conference travel or invited talks or do you mean something else?

Given both of the above questions are interested in the past year are you using the submission timestamp as the estimate of the past year on a rolling basis for the analysis or are you interested in a common time period between all respondents or is the intention to collect those in 2018? Did you mean “in the past 12 months”?

Currently the question wording needs clarification to be clearer on the desired format and response will provide the data you’re looking for. This reads as though it is double barreled asking what countries someone works with which would seek a comprehensive list of countries with working ties. While the following parenthesis suggests that this may only be of interest of those not traveled to should be listed. Is the intention with this question to collect all possible travel destinations or to gather those not visited, but are potential destinations? This is referencing: “What are the countries are you working with? (but maybe have not visited yet)” -> I think you’re trying to ask "Which countries are you working with

  • Not sure if this clarifies what you’re looking for - What I think you’re asking is what countries does someone have working relationship with regardless of travel which I would state as: “Which countries do you have a working relationship with regardless of having traveled there?” This still leaves “working relationship” to interpretation. Is this a working relationship with the community, study site, a co-author/institution, other? Also in asking this question do you also want to know what countries they worked with in the past year and may have ended before taking the survey or other timeframe or just currently? Might also be of benefit to specific (separate countries with a comma) or other delineater.

You may want to use validation for the ORCID response and at very least add a suggested format in the question description (ie https://orcid.org/0000-0003-0668-0089 or 0000-0003-0668-0089)

The travel questions are not required and it may be unclear if people miss the second question by scrolling too fast. You may want to split the lists of country questions to new pages or make them required and add the option “I have not traveled to any of the listed countries” if you think your list is comprehensive enough.

If this hasn’t been pilot tested it may be worth seeing if there are of the survey questions or have you tested these to see if they get the responses you’re looking for?

Hope these help, I’ll keep an eye out for a response for any timely reply necessary. (I’m still for an approve, but hoping to help produce the necessary data for the project to produce the most useful data for analysis).

Hi, @wolfgang8741
Thanks for comments. I reply to each of them separately, since this can be easier to read :wink:

So, for now we assumed that the survey is taken by people who are actually involved in scientific work.
We do not need more precise date of “when a person became a scientist”, since this sounds a bit strange right:)? But what we do care about is the trajectory of a person, who is related to scientific knowledge production. Why?
First, because the person can be involved in master then phd, then postdoc… etc. and this all is qualified as person involved in knowledge dissemination process.
Second, initially, the idea was to explore trajectories of such people (and not just scientists) since this could help to give more ideas about how knowledge exchange process could work in general.

So, to summarize the answer to your question, we do not need here information about the trajectory of a scientist, but we would like to understand the trajectory of a person, who is related to scientific knowledge flow around the world. I also tried to make it clear in project description. Thank you!

Tnx for the comment.
As we also already discuss with @gedankenstuecke and @veg some of such studies about analysis of publications have been done.
In fact we already cited them in the project and in our publication about multilayer network analysis for LeWiBo project [1,2].
So, what we could analyze here is complementary information to what we can get from analysis of publications from conferences proceedings.
I like your idea @wolfgang8741 to ask about conferences, which person visited, I should include it also in the google form, as optional question. This can be additional valuable information. Tnx!

[1] A. Mannocci et al. http://oro.open.ac.uk/id/eprint/54310 (2018)

[2] L.Tupikina, D. Zemp et al. NetSci Proceed. (2018)

1 Like

Great, I just changed the form with adding additional options and I am also adding regular expressions for making ORCID collection properly. Tnx a lot for the comment!

1 Like