Approval for Researchers mobility data analyzer

Hi, Mad,
thank you for clarification. I corrected the information in the project description and wrote there.

’ Geolocation data is potentially sensitive, as it could theoretically be used to infer the addresses of your home and workplace, and the address of any other locations you visited, and the times you visited those places. We state here that the materials here are not used for estimation of personal information about the most visited cities. The project aims at collection of the information of general travel trends of people (in particular, researchers).’

Let me know if this is not enough and some other information is needed.
I also had some ideas about inserting notebooks to the project description to demonstrate how I will analyze the data, but maybe this is advanced and I can do it later separately by sharing the notebooks with Openhumans community.
Thank you!

that seems good to me @liubovv, thanks! I think approve is reasonable.

any votes from others e.g. @beau? :slight_smile:

1 Like

Responding to @bastian’s request to review. I think the project is reasonable and the dialog here has been productive. My vote is to approve. (@beau’s advice about sensitive data warning - what is being asked for and what risks does it pose - seems important to me and glad this was addressed.)

Alright it took a while to finally catch upon the state of the conversation for approval, but here are my thoughts. Overall I am favorable of the project for approve with minor refinement of details for “analysis” statements. These could be more specific at reflecting information in the project proposal regarding resolution of information of interest, what kind of analysis might be expected, and what outputs are expected rationalizing the collection of the data.

  1. @madprime Regarding the sharing of the fine grain GPS data from the Google Location Upload - This highlights that an enhancement to the upload would be to provide a data processor on the raw upload to offer levels of granularity for apps - isn’t a solution for now, but a possible modification of the openhumans Google Location Project to address access to only access to the level of granularity necessary for an app or study. The added possible benefit of preprocessing of timelines would be offering turnkey data for an analysis at new resolutions. Also offering limits to the needs of projects, ie what timeframe is necessary for sharing (last week, last month, a user’s entire digital life?).

Since this project is only interested in scientists, what metadata would be necessary to determine the start date of an analysis for example even though the data doesn’t go far enough back for all, some are young enough scientist where the data may go back before they are a scientist. I don’t see in the survey or other collection how that may be determined in an analysis, but that is just something to think about.

  1. Regarding the Overall project - the survey currently is a bit odd in that it either combines Asian and Australia in the locations where you travel private and for conferences or missed that continent as an option… Antarctica I guess should be included too to be inclusive of continents though other is an applicable space for these too.

  2. The details of the analysis and product are very vague leading to me to ask why share this data and what is going to be done with it specifically. In many places analysis is being used where a more specific thought on what is being looked at could be placed. Even though this is not a study, some bounds or descriptions of the interest / types of analysis could be specified to give some detail on what data is actually being worked with at what resolution. For example local is a relative term and the GPS data could be much finer than neighborhood, city, region, country level processing for analysis. What level does the project agree will be the limit to their resolution of analysis?

What would the “general results” look like as a product of one of the “analyses” and what “additional information” is expected from this dataset so we could identify what analysis is/may be done before sharing? While I understand this review is not a study, but there should be some presentation of what types of analysis will be done before a user consents to sharing the data inclusive of what resolution the analysis is intended. In the proposal the project states interest in the city level analysis, but the project join page mentions the fine grain GPS access is granted, but does not state that the project’s interest is only at the city level. Add what level the analysis intends to limit the GPS resolution…

past and upcoming travels from cityA -> cityB of a scientist
metadata: scientific field.

Additionally the project proposal draft stated it needed an ORCID and planned to look at frequency of visits to a location. The “analysis of the frequency of trips to conference destinations” is a more useful specification of what analysis may be done rather than only stating “analysis”):

On a more nitpicky side -

  1. How does one judge the “potential of the untapped” in this project?
  2. Where is the ORCID collected? (pardon if I missed this)

analysis of our travels and meta-information of who is traveling can provide more information about how we can use it for social good

This is an odd statement the way it is currently worded. I think you mean "By being able to understand the movement patterns of scientists we can more effectively target efforts to link scientists with socially beneficial activities and organizations.

Overall approve with some considerations for the above statement of additional details about the analysis intended and descriptions of intended products of the research that follows a more definable structure than “provide more information”.

Hi,
thanks for reviewing and suggestions, @wolfgang8741, I made various changes and I tried to answer to you and explain more in details something. Let me know if something is still needed to be clarified.

In general, one of the biggest interest for collecting and studying this data is that we can not just study the data about the mobility of mass of people (as it has been done in the past by big telecom.companies etc.), but look at patterns of particular mobility, e.g. how researchers or educators are traveling around the world.

Why these categories? First, because in general, people of these professions are responsible for knowledge transfer (conferences participation, workshop for doctors etc.) and it would be very important to learn how they are traveling and what could we do together to use this knowledge, which belongs to society.

The added possible benefit of preprocessing of timelines would be offering turnkey data for an analysis at new resolutions. Also offering limits to the needs of projects, ie what timeframe is necessary for sharing (last week, last month, a user’s entire digital life?).

Indeed, I can clarify the the timeframe - usually it starts with one week of stay and is unlimited. But information about any stay in the place is relevant, in general.
Another important point is how to collect the information about places, where people stayed and had free time (which could be later used for social engagement or citizen science projects, which is essentially one of the goals of the project).

Since this project is only interested in scientists, what metadata would be necessary to determine the start date of an analysis for example even though the data doesn’t go far enough back for all, some are young enough scientist where the data may go back before they are a scientist. I don’t see in the survey or other collection how that may be determined in an analysis, but that is just something to think about:…

The problem here is that there is no real basis (except Easychair data which is not open, as far as I know, maybe @madprime @gedankenstuecke know some other databasis like this). Although indeed I could have suggests to submit Easychair data additionally, but this may be a bit difficult since not every conference is using Easychair.

How does one judge the “potential of the untapped” in this project?

There are many ways. For example, there is one project, which is mapping places around the world, where people try to identify places needed for volunteering e.g. https://www.hotosm.org/ and this may be useful for identifying untapped opportunities for visits by people.

@wolfgang8741 I also adjusted the question in the questionnaire:
“What are the countries are you working with? (but maybe have not visited yet)”

Where is the ORCID collected? (pardon if I missed this)

We collect Openhumans ID, and if the person participates in the questionnaire and leaves the information about his/her research institute, we deduce that a person is a scientist. This is good point, I also included ORCID.

I also changed the google form for questionnaires to make it more precise with countries (not continents you visited). Thank you!

analysis of our travels and meta-information of who is traveling can provide more information about how we can use it for social good

Yes, I am going indeed to change it, since this connotation has been miss-used recently and can be therefore understood in too many different ways. I changed it, thank you for the comment!!
:seedling:

@liubovv Sorry for the reply delay - response was buried in my inbox. The changes are moving this forward, its looking much clearer. Most of this is comments on the survey revisions to ensure you’ll get the data you need and to clarify previous points.

Yes and what I also was trying to get at here is given you’re giving the entire Google Location history how do you know when a person is classified as a “scientist” is it a date after they earned a degree or is it the date they entered a job or role as a “scientist”. This was a prompt to see if you have a measure to cut off what part of the location data timeline for the analysis. To look over an entire timeline it would include more than just their “scientific” career in some cases for those just earning their degrees may have longer history on Google than their scientific career. It would be good to think through possible analysis for what data is needed. While the ORCID is being collected it is not required that the education or other relevant information for this would be included for this threshold of where to stop analysis of the location history and it would be easier to identify this threshold up front than have to get that in the future. This is about how are you scoping the location analysis to not include irrelevant times or if you’re planning a full timeline analysis that you’re transparent why the entire timeline is being used.

What additional information might be helpful in looking at travel may be asking “What conferences and workshops do you attend?” so they can be compared to locations traveled which may allow predictive travels and identification of possible connections. Some of these can be inferred from publication, presentation, and other history, but explicit statements might include those not included in publications. This would also allow for identifying the conference geographic presence and identify potential future locations that may be visited as well as expose bias in the responses to certain researcher communities.

What I’m trying to get at with the prompt was that question 1. “What is the potential of the untapped connectedness and the connectivity of traveling researchers in the world?” is overly broad and hard to say it has been answered since we don’t know if we reached the potential (maybe the question just needs wording clarification). Did you mean? “What is the potential of the untapped connectedness of traveling researchers in the world?” Even with this revision this is more an exploratory statement which maybe stated as “What opportunities exist for new connectedness and knowledge sharing in the global traveling behavior of researchers?” in which you’re trying to identify the opportunities based on the connectedness instead of measuring the potential

Comments based on this new survey revision:

How can we benefit from our travels? (analysis of depersonalised information about our travels and meta-information of who is traveling can provide more information about how we can use it for social good)

Who is “we” referring to in the above question in the investigated? I would reword to state who “we” is or restate the question. I think “we” is referring to scientists, but is this the case? I may be over interpreting.

Authorization for adding data.
This project plans to add data to your Open Humans account, described below.

Not all persons will have an ORCID thus you may want to state “if provided” for the ORCID in the returned data. For those who are Openhumans members but don’t want to disclose their identifier I wonder if Openhumans has a way to provide different datasets. One with the ID and one without and tag

Example of the dataset collected: travel cityA → cityB of a scientist (metadata: ORCID of a scientist who travels).

Errors with the Google Form:
Currently the question wording needs adjustment and selection type need fixed: “What are countries you visited for CONFERENCES this year?” -> “Which countries have you visited for CONFERENCES this year?”
Selection of this question is currently uses a radial selection which is limited to one, but the way the question is stated suggests it should use a multi-select checkbox.

Currently the question wording needs clarification and the selection type need fixed: “What are countries you have been with PRIVATE VISITS this year?” -> “Which countries have you visited for PRIVATE VISITS in the past year?” Also I’m not sure what private visits are. Do you mean personal travel or non-conference travel or invited talks or do you mean something else?

Given both of the above questions are interested in the past year are you using the submission timestamp as the estimate of the past year on a rolling basis for the analysis or are you interested in a common time period between all respondents or is the intention to collect those in 2018? Did you mean “in the past 12 months”?

Currently the question wording needs clarification to be clearer on the desired format and response will provide the data you’re looking for. This reads as though it is double barreled asking what countries someone works with which would seek a comprehensive list of countries with working ties. While the following parenthesis suggests that this may only be of interest of those not traveled to should be listed. Is the intention with this question to collect all possible travel destinations or to gather those not visited, but are potential destinations? This is referencing: “What are the countries are you working with? (but maybe have not visited yet)” -> I think you’re trying to ask "Which countries are you working with

  • Not sure if this clarifies what you’re looking for - What I think you’re asking is what countries does someone have working relationship with regardless of travel which I would state as: “Which countries do you have a working relationship with regardless of having traveled there?” This still leaves “working relationship” to interpretation. Is this a working relationship with the community, study site, a co-author/institution, other? Also in asking this question do you also want to know what countries they worked with in the past year and may have ended before taking the survey or other timeframe or just currently? Might also be of benefit to specific (separate countries with a comma) or other delineater.

You may want to use validation for the ORCID response and at very least add a suggested format in the question description (ie https://orcid.org/0000-0003-0668-0089 or 0000-0003-0668-0089)

The travel questions are not required and it may be unclear if people miss the second question by scrolling too fast. You may want to split the lists of country questions to new pages or make them required and add the option “I have not traveled to any of the listed countries” if you think your list is comprehensive enough.

If this hasn’t been pilot tested it may be worth seeing if there are of the survey questions or have you tested these to see if they get the responses you’re looking for?

Hope these help, I’ll keep an eye out for a response for any timely reply necessary. (I’m still for an approve, but hoping to help produce the necessary data for the project to produce the most useful data for analysis).

Hi, @wolfgang8741
Thanks for comments. I reply to each of them separately, since this can be easier to read :wink:

So, for now we assumed that the survey is taken by people who are actually involved in scientific work.
We do not need more precise date of “when a person became a scientist”, since this sounds a bit strange right:)? But what we do care about is the trajectory of a person, who is related to scientific knowledge production. Why?
First, because the person can be involved in master then phd, then postdoc… etc. and this all is qualified as person involved in knowledge dissemination process.
Second, initially, the idea was to explore trajectories of such people (and not just scientists) since this could help to give more ideas about how knowledge exchange process could work in general.

So, to summarize the answer to your question, we do not need here information about the trajectory of a scientist, but we would like to understand the trajectory of a person, who is related to scientific knowledge flow around the world. I also tried to make it clear in project description. Thank you!

Tnx for the comment.
As we also already discuss with @gedankenstuecke and @veg some of such studies about analysis of publications have been done.
In fact we already cited them in the project and in our publication about multilayer network analysis for LeWiBo project [1,2].
So, what we could analyze here is complementary information to what we can get from analysis of publications from conferences proceedings.
I like your idea @wolfgang8741 to ask about conferences, which person visited, I should include it also in the google form, as optional question. This can be additional valuable information. Tnx!

[1] A. Mannocci et al. http://oro.open.ac.uk/id/eprint/54310 (2018)

[2] L.Tupikina, D. Zemp et al. NetSci Proceed. (2018)

1 Like

Great, I just changed the form with adding additional options and I am also adding regular expressions for making ORCID collection properly. Tnx a lot for the comment!

1 Like

Yes, exactly, here this question is about long-term scientific connections, and which can bring future trips since exchange trips are often based on this. (this is complementary and additional question)

I corrected the question in the google form. Thank you!

I come back to this comment. I added also additional question about date (year in this case since it may be hard to ask for a certain month) when people started to do research.

Hi @liubovv We may have a miscommunication here, I was not suggesting needing to know the trajectory of a person and used the “when did you become a scientist” as an example of a possible threshold. My interest was identification of the lower bounds of the analysis and how that point in time would be established from the information collected. I believe you addressed this threshold by adding the year as you stated in the post below. :grinning:

My main concern was that given Google Timelines go at least back to 2009 (at least mine does) for some that could include high school or other times I would not consider part of my scientific knowledge production/dissemination and would not be in scope to this project based on what I’ve read.

I agree, having a year that you started in Academia in some way (maybe the year you started your PhD? As most undergrad/master students don’t travel a lot for conferences etc?) would be the best way to limit the data frame to the years of relevance to this study.

1 Like

yes, I agree, we may indeed better ask about year of Phd (going to change that in googleform) @gedankenstuecke
thanks for the comment @wolfgang8741

1 Like

Just to be clear, this should be “Year you started your PhD” in the survey. :slight_smile:

yes, I already changed the google form accordingly. thank you all for the comments.

Hey, everyone,
thanks again you all for your feedback and for the ideas! :pray::brain:
We recently discussed with @wolfgang8741 and @gedankenstuecke
Based on that I made several final changes for the project description.

I made the following changes in the project in order to make it more clear for people.

  1. I removed the link to the google form in the first page of project description https://www.openhumans.org/activity/mobility-data-of-researchers/ ,
    since we put the link to the google form in the detailed project description page in the field with Post-sharing URL.

  2. I edited project long description and inserted additional description to the new project page
    https://sites.google.com/view/fellowshipresultsliubov/research-projects/mobility-of-researchers where it is more clearly explained about the project and research questions

  3. In the project description page I also simplified the research question about
    “What is the potential of the untapped connectedness and the connectivity of traveling researchers in the world?”
    “How can the study of global researcher mobility data lead to actionable positive impact (for example, organisation of outreach to remote communities or contribution to problem solving for local communities around the globe)?”

  4. I also included additional information in the project description about
    complementary analysis of data about ORCID. With ORCID information one can also analyze information about publications (and hence analyze information about affiliations with the universities, in which the researcher was working in the past).
    This can help to analyze the knowledge dissemination process on a wider time-range.

  5. At the same time since not each participant of the project has ORCID,
    I made the field ORCID in the google form as additional and non required.

  6. In the google form I changed the question about “the date of Phd start”:
    Which year you started your Phd? (which year you consider to start your Phd or equivalent research activity)

  7. I added additional section in the google form so that it makes it more clear to
    answer questions and see separate questions.

  8. I also explained the pronoun “We (as researchers and science community)” in the description of the project.

  9. I also wanted to thank a lot Jonathan, who spent sufficient amount of time of revising the project to become more clear.

Thank you for discussions again.
Hopefully this replies to all the questions we discussed these weeks.

3 Likes

Agreed. I think a “meta” is that it would be nice to capture all these thoughts in better ways – regarding the ways in which location data, in particular, might be processed and shared in ways that provide more options for sharing. In many ways it’s a broad design/architecture/ecosystem: we’d like the ecosystem (really broadly speaking) to understand and support the granularity people would like. (It’s hard! work to implement and maintain, potential liability in that, and also hard to fully anticipate desired uses – but there’s definitely a need/desire!)

Much thanks to @wolfgang8741 for what looks like further improvements to the project!

Do folks think this is a consensus for approval now?

1 Like

I am ready to give this an unconditional approve.

1 Like