Big Data Analytics: Resit Assignment 3 – Your data science project (Python代写,英国程序代写,Warwick Business School代写)

Your aim is to carry out your analysis in a way that third parties could easily replicate it and verify your findings.

联系我们
微信: biyeprodaixie 欢迎联系咨询

本次CS代写的主要涉及如下领域: Python代写,英国程序代写,Warwick Business School代写

Big Data Analytics:

Resit Assignment 3 – Your data science project

Neha Gupta [email protected]

Adapted from original assignment created by Prof. Suzy Moat and Prof. Tobias Preis. Data Science Lab, Behavioural Science, Warwick Business School, The University of Warwick

1. Overview

This project is worth 80% of your final marks for this course.

SUBMISSION DEADLINE 20:00 11 AUGUST 2020

In this course, you have been learning how our everyday interactions with technology are creating huge amounts of data capturing human behaviour worldwide. You have learned how this sort of data can help data scientists measure what is going on in the world, and even make better predictions about how people might behave in the future.

In this final assignment, you are asked to pose an interesting question that can be answered using these new datasets and the data science skills you now possess. You then need to acquire the relevant data, process it into a form that you can analyse, carry out the statistical analysis, and produce relevant visualisations to illustrate your results. You also need write up your results in a clear and engaging style.

Note that for the resit version of this assignment you cannot use the same question that you decided to use in your original (earlier submitted) assignment.

The aim of this project is for you to have an opportunity to apply your skills to a question that you are interested in, and at the same time, produce a document that you can use to demonstrate your skills to future employers. Good luck and have fun!

2. What to submit

i. Please submit your final write-up as a PDF.
ii. Please also submit any datasets you have used in your analysis, and the R code which you have
written to process this data. You should provide clear comments in your code so that it is easy
to understand what it does. Save your code in a script. Do not submit your R workspace, your
command history or your RStudio project. You should also provide a PDF document explaining
what data is contained within the dataset files. Combine all your files (R script, the supporting
pdf document, data files) into a zip file for upload to my.wbs. If you expect your zip file to be
greater than 20MB, please speak to us about this at least one week in advance.

iii. Further guidance

Your question

Your question should involve one of the following kinds of online data: Google Trends data, data on Wikipedia page views or data retrieved from the Flickr API. Your question needs to link this online data with another source of data which reflects human behaviour in the offline world: for example, financial data, national statistics, population data or any other data source you find interesting. In choosing your question, please refer to the guidance we provided for the “Asking good questions” assignment of this module.

Your analysis

Your aim is to carry out your analysis in a way that third parties could easily replicate it and verify your findings. You should therefore write your code in as clear a style as possible, with comments to help explain what your code does where necessary. You should also provide clear documentation of the data sets which you have used and which you submit: for example, what data is contained in each file and where this data was acquired or downloaded from.

To develop and demonstrate the skills you have acquired during this course, you should carry out your analysis in R.

Your write-up

Remember that the goal of this assignment is to carry out the analysis required to evaluate an interesting question. As long as your question is well motivated, do not worry if your results do not turn out as you hoped. Just make sure that, in your write-up, you provide a clear motivation for your question; a clear argument for why one might expect to find the result you hypothesised may hold; a clear description of the analysis you carried out; and a clear evaluation of your findings, including why you may not have found what you expected. If there was a good reason to suppose you might find evidence for your hypothesis, it is useful to discover no evidence for the hypothesis too.

Your write-up should be no longer than 3,000 words , and should be structured as follows:

 Title
o Your title should convey the main thrust of your analysis and results, but crucially should also
catch the reader’s attention.
o Your title should have a maximum of 15 words (although good titles are normally shorter).

 Abstract o In your abstract, you should briefly explain the problem your question is addressing, and the opportunity you have identified to address this problem. o You should then clearly state what your question therefore is. o You should give an overview of the analysis you are carrying out to address this question, and you should then explain the results of your analysis. o Finally, you should describe the conclusions of your analysis. In other words, what do your results mean? What is the takeaway message from your analysis? o Your abstract should be no longer than 150 words.  Introduction o The main goal of your introduction is to motivate your question and introduce your analysis. o You should therefore provide enough background to make the value of your analysis clear. Who does the problem you are addressing affect? o You should cite between 5 and 10 scientific papers that are related to your analysis. (For example, there are a number of papers on the course reading list that explore the relationship between online data and offline behaviour, or that give a broader background to analyses of human behaviour with big data. We have discussed a number of them in the lectures.) o You should then clearly explain what your analysis sets out to do. What is your question? What do you expect to find? Why do you expect to find this? o You may wish to give an initial indication of the results you uncover, but this is a stylistic decision. o There is no word limit for your introduction, but make sure your writing style is concise.  Methods and results o In the methods and results section, you should very clearly explain what analysis steps you carried out, and what the results were. o As a guide to the level of detail required, you should include enough information in this section to enable someone else to reproduce your analysis without access to your code or the data you downloaded. o To achieve this, you should make the source of your data clear, including providing references for websites from which you have downloaded the data. You should also clearly describe any calculations you carried out on the raw data you downloaded to reach your final results. You do not need to refer to the specific R functions that you used to do this, however. o All statistical tests should be reported appropriately, including at least details of the sample size (or degrees of freedom), the value of the test statistic calculated and, where calculated, the p - value.

o You should also describe any assumptions of the analyses you carried out (e.g., should your data be normally distributed?) and show how you checked that these assumptions hold. o You should provide at least two figures that visualise your findings. We will give you 20% of your marks for visualisation as detailed below. o Figures should always have appropriately labelled axes, with the units of measurement specified. Legends should be provided to explain different colours or line types used, and font sizes should not be too small. As a guide, ensure that any text in your figures is at least as big as text used in the body of your assignment. Check that this is still the case when you have included the figure in your assignment. Make sure that your figure does not get stretched horizontally or vertically when you add it to your assignment. o If appropriate, you can provide up to four figures. (Do not provide more than four figures.) You can also construct figures which contain multiple subfigures. However, only include important figures which help you tell your story. You need to be as concise with your figures as you are with your words. o Under each figure, provide a caption which clearly outlines to the reader what data the figure shows, and what patterns the reader should note in the data. Each caption should be no longer than 350 words. o To capture the attention of busy readers and to help them understand your analysis, you should produce figures and figure captions that convey the basic story of your analysis on their own. o There is no word limit for your methods and results, but make sure your writing style is concise.  Discussion o The discussion should briefly summarise what you have done, and discuss what your findings mean. o To make your document as accessible as possible to busy readers, it is a good idea to ensure that your discussion would make sense if the reader had not read the rest of the document. o You may wish to begin by briefly summarising the motivation for your study once again. What is the problem you are addressing and what is the opportunity you have identified to address it? You can then restate your research question. o Next, give a brief indication of the nature of your analyses and summarise what your analyses found. o Indicate which answer to your research question your findings provide support for. Is this what you expected?

o Try to offer a potential explanation for your findings. If you have found the pattern you
expected, you may have already hinted towards this explanation in your introduction. If you
did not find what you expected, why do you think this is?
o It is not a problem if you are not sure why you found a particular pattern – simply suggest some
possible ideas. It is very important that you are careful not to overstate your case. In particular,
be aware that most investigations do not “prove” anything on their own, but you may have
found new strong or weak support for a given idea.
o Indicate what the implications of your investigation are. For example, have you highlighted a
new opportunity to use a certain dataset to measure or forecast a certain type of behaviour?
Have you provided evidence of an interesting behavioural pattern? Have you helped explain a
previously observed behavioural pattern? Have you provided evidence that a particular line of
enquiry may not be worth following further? What might people be able to do once they have
read your results that they might not have been able to do before?
 References
o You should provide full references for all papers you have cited. Please use the Harvard style
of referencing for this assignment. You can find more guidance here:
https://www2.warwick.ac.uk/services/library/students/referencing/referencing-styles

iv. How marks will be allocated

You will receive marks for the following:

 Quality of question
o This area is worth 20% of your final mark for the module.
o You will be awarded marks for choosing a question which was interesting and feasible to
answer.
o You can emphasise how interesting your question is by stating your question clearly and
motivating it well in the abstract and introduction. Who would be interested in the answer, and
why? You may be able to provide more evidence of the value of your question in the discussion
as well.
o Again, if you have provided a good motivation for why your question was worth investigating
and why you believed you might find an interesting answer, do not worry if your results do not
turn out as you hoped.
o You can emphasise how feasible your question was to answer by completing an appropriate
analysis in the methods and results, and crucially, not overstating your findings in the
discussion. Your assignment as a whole need to provide clear evidence that the question you

proposed could be answered from the data you identified and the analysis methods you chose, without a leap of faith.  Quality of analysis

o This area is worth 20% of your final mark for the module. o You will be awarded marks for choosing an analysis method appropriate for answering your question; verifying that assumptions made by this analysis method hold (e.g., should your data be normally distributed?); carrying out the analysis correctly; and correctly interpreting the results of the analysis. o You will also be assessed on whether you have motivated any pre-processing steps well (e.g., you have not left out half of your dataset without explaining why). o Finally, you will be awarded marks for clearly documenting your code, and providing clear pointers to where the data you analyse can be obtained, in order to support replication of your study. o You can make it easier for your analysis to be correctly assessed by providing a clear and concise description of your analysis in the methods and results.  Quality of visualisation

o This area is worth 20% of your final mark for the module.
o Crucially, you should provide visualisations which tell the story of your analysis in a clear,
concise and engaging fashion.
o You will be awarded marks for choosing appropriate visualisations for your data and analysis.
Remember, you should only include the visualisations which help tell your story. Do not simply
include every possible visualisation you can think of. Make sure you include at least two figures
and no more than four.
o You will be awarded marks for providing legible visualisations, and labelling your visualisations
well (e.g., all axes are labelled, including units of measurements, legends are provided to
explain different colours or line types used, and font sizes are not too small).
o You will be awarded marks for creating an attractive visualisation. The base level of plots
generated by the ggplot 2 library is good, but it will also allow you to change many different
aspects of your visualisation where you feel this is appropriate, from colours, to line thickness,
to font used, and more.
o For the purposes of this assignment, please make all changes to your figures by writing code
in R, apart from assembly of multi-panel figures which you can do in an external program (e.g.,
Word). You should not postprocess your figures in Adobe Illustrator or similar programs.
o You will also be awarded marks for good figure captions. Do your figure captions meet the
specification detailed in the structure above, describing the data shown in the figure and
highlighting the key patterns that readers should note in the data? Do your figures and figure
captions together successfully tell the main story of your analysis?
 Quality of written description
o This area is worth 20% of your final mark for the module.
o You should provide a clear, concise and engaging written description of your investigation.
o You will be awarded marks for using the structure described above and covering all the points
highlighted in the structure description.
o Within individual sections, you will be awarded marks for structuring your writing well, to make
your arguments and descriptions easy to follow.
o You will be awarded marks for the style of your writing. Is it clear, concise, and engaging? Have
you kept your sentences short where possible? Have you used correct grammar and
appropriate vocabulary? (Simple vocabulary is often easier to understand – do not use
complicated words for the sake of it.)
o You will be assessed on whether you have correctly observed conventions for reporting
statistical results, including formatting.
o Finally, you will be assessed on whether you have correctly integrated references into your
writing, and listed all references correctly at the end of your assignment. This will again include
the formatting of your references.

v. Final note

Please make sure you observe the WBS plagiarism guidelines to ensure you do not needlessly lose marks. You can see these in full on the next page.

In particular, it is extremely important that you do not copy text from existing sources or your classmates. For this assignment, you are also strongly recommended to avoid including any quotes – this should not be necessary. Write everything in your own words and provide clear references where you refer to ideas and results you have read about elsewhere.

We have seen some great work and great questions on this course. We are looking forward to you submitting some excellent data science projects!

WBS Plagiarism Policy

Please ensure that any work submitted by you for assessment has been correctly referenced as WBS expects all students to demonstrate the highest standards of academic integrity at all times and treats all cases of poor academic practice and suspected plagiarism very seriously. You can find information on these matters on my.wbs, in your student handbook and on the University’s library web pages: https://warwick.ac.uk/services/library/students/referencing

The University’s Regulation 11 (see link below) clarifies that “...’cheating’ means an attempt to benefit oneself or another by deceit or fraud. This includes reproducing one’s own work...” It is important to note that it is not permissible to reuse work which has already been submitted by you for credit either at WBS or at another institution (unless you have been explicitly told that you can do so). This is considered self-plagiarism and could result in significant mark reductions. Upon submission of assignments, students will be asked to agree to one of the following declarations: Individual work submissions: "I declare that this work is entirely my own in accordance with the University's Regulation 11 and the WBS guidelines on plagiarism and collusion. All external references and sources are clearly acknowledged and identified within the contents. No substantial part(s) of the work submitted here has also been submitted by me in other assessments for accredited courses of study, and I acknowledge that if this has been done it may result in me being reported for self-plagiarism and an appropriate reduction in marks may be made when marking this piece of work.” Group work submissions: "I declare that this work is being submitted on behalf of my group, in accordance with the University's Regulation 11 and the WBS guidelines on plagiarism and collusion. All external references and sources are clearly acknowledged and identified within the contents. No substantial part(s) of the work submitted here has also been submitted in other assessments for accredited courses of study and if this has been done it may result in us being reported for self-plagiarism and an appropriate reduction in marks may be made when marking this piece of work." By agreeing to these declarations, you are acknowledging that you have understood the rules about plagiarism and self-plagiarism and have taken all possible steps to ensure that your work complies with the requirements of WBS and the University. You should only indicate your agreement with the relevant statement, once you have satisfied yourself that you have fully understood its implications. If you are in any doubt, you must consult with the NIE of the relevant module, because once you have indicated your agreement it will not be possible to later claim that you were unaware of these requirements in the event that your work is subsequently found to be problematic in respect to suspected plagiarism or self-plagiarism.

Regulation 11: http://www2.warwick.ac.uk/services/gov/calendar/section2/regulations/cheating