Big Data Analytics Resit Assignment 2 – Asking good questions (Python代写,英国程序代写)

Businesses and public organisations are increasingly becoming aware that these sorts of insights can help support their decision making. For scientists, this data is also fascinating, providing measurements of human behaviour at a speed and scale which was previously impossible.

联系我们
微信: biyeprodaixie 欢迎联系咨询

本次CS代写的主要涉及如下领域: Python代写,英国程序代写

Big Data Analytics:

Resit Assignment 2 – Asking good questions

Neha Gupta [email protected]

Data Science Lab, Behavioural Science, Warwick Business School, The University of Warwick

Teaching materials designed by Prof. Suzy Moat and Prof. Tobias Preis

This coursework is worth 2 0% of your final m ark s for this course.

It is due in at 12 noon on the date suggested by programme team, along with Resit Assignment 3. Please get in touch with programme team to enquire about the due date.

In this course, we have been showing you how our everyday interactions with technology are creating huge amounts of data capturing human behaviour worldwide. We have begun to outline how this sort of data can help us measure what is happening in the world, and even make better predictions about how people might behave in the future.

Businesses and public organisations are increasingly becoming aware that these sorts of insights can help support their decision making. For scientists, this data is also fascinating, providing measurements of human behaviour at a speed and scale which was previously impossible.

To gain value from these data sources, you need knowledge of both programming and statistics. However

  • this is not enough! These datasets also present a dangerous trap – an infinite universe of uninteresting

questions, to which you can present answers that are quantitatively correct, but which nobody cares about. A lack of understanding of what data is actually available and what the limitations of these datasets are can also lead to very time-consuming attempts to answer inappropriate and impossible questions.

In this exercise, you will draw on your business school education and the insights you have gained from this course to try and identify a good question for your final Big Data Analytics project, which illustrates how online data can prov ide insights into human behav iour in the offline w orld. You should start by considering what data is available, and try to develop a question that strikes a good compromise between being interesting and being feasible to answer.

We are of course not looking for perfection – but we want to give you a chance to demonstrate your awareness of both what can be done and what is valuable, and use this to design an interesting project with which you can demonstrate your newly developed data science skills. Try brainstorming some ideas, and ask for each idea – can I find a question that is just as easy to answer, but even more interesting? Can I find a question which is just as interesting, but even easier to answer?

We do want you to make use of the new sources of data that are becoming available, as well as your new knowledge of R. Your question should therefore inv olv e one of the follow ing k inds of online data: Google T rends data, data on Wik ipedia page v iew s or data retriev ed from the F lickr A P I. Your question needs to link this online data w ith another source of data w hich reflects hum an behav iour in the offline w orld: for ex ample, financial data, national statistics, or any other data source y ou find interesting. See page 4 for some suggestions of where you can find various kinds of economic and social indicators.

For assessment, please submit a PDF providing answers to the questions on the next page.

You should k eep your answ ers to under 2 pages of A4, w ith borders of 2.54cm and a font size of 11pt. Read through all the questions first, as answering some of the later questions might make you realise you should modify your answers to earlier questions.

R E SI T : Note that you cannot use the question that you originally submitted for Assignment 2

or Assignment 3.

  1. What is the question you w ish to ask? ( 2 %)
In one sentence, state the question you wish to ask.
We will give you marks for stating an interesting question clearly.
  1. What w ould y our dream result be? ( 2 %)
Imagine you have finished your analysis, and you have found the best result you could dream of.
Describe your dream result in tw o simple sentences that a member of the general public would
understand.
Often, if your question is not interesting enough, it becomes difficult to summarise the findings in so
few words – so we want to make sure you can!
  1. What reason do you hav e to believ e y ou m ight find this result? ( 2 %)
In tw o to three sentences, explain why you might expect to find this result.
In your final project, you will not lose marks if you do not find a significant result (nor gain marks if you
do). However, you do need to provide a good explanation as to why you might expect to find the
result you describe – otherwise it wouldn’t be worth your time looking into this idea.
  1. What data w ill y ou use? ( 2 %)
Explain what Google Trends, Wikipedia page view or Flickr data you will use.
Provide a link to the source of the data that is not from Google Trends, Wikipedia or Flickr. Please
ensure that this link will take us directly to the data when we click on it. If there is a very good technical
reason for which you cannot do this, explain what this reason is and clearly describe the source of
the data.
  1. How w ill y ou read the data into R for analy sis? ( 2 %)
Outline what steps would be required to read both your online data and other data into R and carry
out any pre-processing required before your statistical analysis. We are looking for an understanding
of the basic steps you would need to carry out - you do not need to provide code.
6) What statistical m ethod w ill y ou use to analyse the data? ( 2 %)
Describe the statistical approach you will use to answer your question. Describe any assumptions of
this analysis.
Make sure that the approach you describe is capable of delivering an answer to your question in line
with the dream result you described in question 2.
  1. Which R functions w ill y ou use to carry out the statistical analy sis? ( 2 %)
Name the R functions that will allow you to carry out the statistical analysis (not the data pre-
processing), and which will allow you to check any assumptions of your statistical analysis. If these R
functions have not been used in the course, specify the R package that they are in.
  1. Describe the form of the data. Do you hav e enough data? ( 2 %)
Is the data daily, weekly, monthly, something else? How many data points will you be able to analyse?
Given the statistical approach you have described, is this sample size large enough to give you a
chance of uncovering a significant result? (If not, you need to rethink your question!)
  1. How w ould y ou describe y our dream result to a professional audience? ( 4 %)
Imagine you have finished your analysis and you have found your dream result. You have been asked
to write an executive summary of your results for a professional audience. What would you write?
Give some background motivation to your question, briefly describe your finding, and then indicate
what this finding might mean. Keep your summary under 125 w ords.

Further guidance on developing your question

It is likely that your question will fall into one of three categories. These are as follows:

  • Now casting offline behav iour w ith online data Tip: see examples of nowcasting that we covered in the lectures, in particular in Week 3.
  • P redicting offline behav iour w ith online data Tip: be careful to ensure that the statistics you are proposing will genuinely allow you to make predictions.
  • M easuring offline behav iour w ith online data, w here the offline behav iour w as prev iously difficult or impossible to m easure Tip: these can be tricky questions to set up correctly. Be careful to ensure that it really makes sense to use Google Trends, Wikipedia page views or Flickr data to measure the offline behaviour you are interested in. First, check that there is not a more obvious offline alternative that you could use. (If there is, you probably need to think of another question, as you do need to propose a question using Google Trends, Wikipedia page view or Flickr data.) Second, check that you can make a convincing argument that the measurements produced with online data are likely to be valuable, and not either too noisy or too biased in some way.

Finding inspiration for your question

In the course, we cover a number of examples of analyses using online data to provide insight into human behaviour in the real world.

Here is another example from the Bank of England, where they use data on Google searches to nowcast unemployment rates and house prices:

Using internet search data as economic indicators

Nick McLaren and Rachana Shanbhogue

https://www.bankofengland.co.uk/-/media/boe/files/quarterly-bulletin/2011/using-internet-search-data- as-economic-indicators.pdf

You might also be interested in this paper looking into the relationship between Bitcoin prices and Google Trends and Wikipedia page view data:

BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era

Ladislav Kristoufek

https://www.nature.com/articles/srep

You can find more examples on the course reading list:

https://rl.talis.com/3/warwick/lists/E1DCBC50-A278-379F-F255-3D29A326D2AB.html

To ensure that the question you propose is interesting, note that it should not simply replicate one of the analyses covered in the course (or it will be difficult to argue for the value of your analysis). If you are keen to propose a very similar analysis to one of these analyses, make sure your answer to question 2 makes it clear what insights your dream result would provide beyond the knowledge we already have from the previous analysis.

Data source suggestions

One of the challenges you have to address in this assignment is to find a source of offline data you can use in your project. There are many sources of fascinating real world data available online. Here are a few suggestions – but feel free to find others!

Open adm inist rative data – UK

  • data.gov.uk o UK Government project to make non-personal UK government data available as open data http://data.gov.uk/
  • London Datastore o Official site providing free access to a number of data-sets from the Greater London Authority http://data.london.gov.uk/
  • Office for National Statistics o Access to economic and social data for the UK, including Census data https://www.ons.gov.uk/

Open adm inist rative data – USA

  • US Census data o Data on economy, population and society at national and local level. Summaries and detailed data releases are published free of charge. http://www.census.gov/
  • data.gov o Official U.S. government site providing increased public access to federal government datasets. http://www.data.gov/
  • NYC Open Data o NYC Open Data makes the wealth of public data generated by various New York City agencies and other City organizations available for public use https://nycopendata.socrata.com/

Open adm inist rative data – w orld and E urope

  • World Bank Open Data o Free and open access to data about development in countries around the globe http://data.worldbank.org/
  • CIA World Factbook o U.S. government profiles of countries and territories around the world. Information on geography, people, government, transportation, economy, communications... https://www.cia.gov/library/publications/the-world-factbook/
  • Eurostat o Detailed statistics on the EU and candidate countries http://epp.eurostat.ec.europa.eu/

WB S P lagiari sm P oli cy

Please ensure that any work submitted by you for assessment has been correctly referenced as WBS expects all students to demonstrate the highest standards of academic integrity at all times and treats all cases of poor academic practice and suspected plagiarism very seriously. You can find information on these matters on my.wbs, in your student handbook and on the University’s library web pages:

https://warwick.ac.uk/services/library/students/referencing

The University’s Regulation 11 (see link below) clarifies that “...’cheating’ means an attempt to benefit oneself or another by deceit or fraud. This includes reproducing one’s own work...” It is important to note that it is not permissible to reuse work which has already been submitted by you for credit either at WBS or at another institution (unless you have been explicitly told that you can do so). This is considered self- plagiarism and could result in significant mark reductions.

Upon submission of assignments, students will be asked to agree to one of the following declarations:

I ndiv idual w ork subm issions:

"I declare that this work is entirely my own in accordance with the University's Regulation 11 and
the WBS guidelines on plagiarism and collusion. All external references and sources are clearly
acknowledged and identified within the contents. No substantial part(s) of the work submitted
here has also been submitted by me in other assessments for accredited courses of study, and I
acknowledge that if this has been done it may result in me being reported for self-plagiarism and
an appropriate reduction in marks may be made when marking this piece of work.”

Group w ork subm issions:

"I declare that this work is being submitted on behalf of my group, in accordance with the
University's Regulation 11 and the WBS guidelines on plagiarism and collusion. All external
references and sources are clearly acknowledged and identified within the contents. No
substantial part(s) of the work submitted here has also been submitted in other assessments for
accredited courses of study and if this has been done it may result in us being reported for self-
plagiarism and an appropriate reduction in marks may be made when marking this piece of work."

By agreeing to these declarations you are acknowledging that you have understood the rules about plagiarism and self-plagiarism and have taken all possible steps to ensure that your work complies with the requirements of WBS and the University.

You should only indicate y our agreem ent w ith the relev ant statement, once y ou hav e satisfied y ourself that y ou hav e fully understood its implications. I f y ou are in any doubt, y ou m ust consult w ith the NI E of the relev ant m odule, because once y ou hav e indicat ed y our agreem ent it w ill not be possible to later claim that y ou w ere unaw are of these requirem ents in the ev ent that your w ork is subsequently found to be problem atic in respect to suspected plagiarism or self-plagiarism.

Regulation 11: http://www2.warwick.ac.uk/services/gov/calendar/section2/regulations/cheating