As a data scientist you will be required to manipulate very large datasets in order to answer specific questions. The data that you will be manipulating will often be available in a raw format that have to be cleaned before being suitable for analyses. Due to the size of these datasets it is important to know how to choose the most appropriate algorithms and data structures to answer specific types of questions. A single dataset might have to manipulated in a variety of data structures during analyses. This assignment requires you to recommend appropriate data structure choices for the analyses of data that was collected from Twitter in order to answer several questions.
The assessment tasks described in the PDF, The dataset contains 1600000 tweets extracted using the Twitter API and saved in a CSV file which is big and cannot be uploaded here but can be send over email.
THE ASSESSMENT IS TO ONLY ANALYZE AND RECOMMED THE APPROPRIATE DATA STRUCTURE TYPE FOR ANALYZING THE DATA !! Please refer to the pdf.
Note: the data csv file to be shared in a link, since its a big file.