## Statistical Analysis Help

Ask questions about projects relating to: biology, biochemistry, genomics, microbiology, molecular biology, pharmacology/toxicology, zoology, human behavior, archeology, anthropology, political science, sociology, geology, environmental science, oceanography, seismology, weather, or atmosphere.

Moderators: MelissaB, kgudger, Ray Trent, Moderators

### Statistical Analysis Help

Hello, I need some help analyzing my data.

I tracked the the top 15 tweets for each day and did a scatter-plot. My graph (attachment) seems to show large differences between the top tweets (Tweets 1-5) each day and the points are spread out, while as the tweet number decreases (Tweets 6-15), the difference between them decreases and the the points are all close together (see the attachment). I'm trying to show there is a consistent significant difference between the top tweets and the lower ranked tweets. Is there a statistical way to show or analyze this? I downloaded a trial version of the IBM SPSS program.

Also, what statistical test can I use to find the value that I would expect most of the data points to fall within (for example, what value would I expect the Tweet 15 datapoints to fall under)?

Thank you!
Attachments
Graph.xlsx
PhilY

Posts: 3
Joined: Mon Jan 28, 2013 7:58 pm
Project Question: Analysis of Twitter Networks
Project Due Date: 2/1/2013
Project Status: I am finished with my experiment and analyzing the data

### Re: Statistical Analysis Help

Hi PhilY,

It looks like you have collected a lot of interesting data regarding your Analysis of Twitter networks project. In order to better guide you in your statistical analysis more information would be helpful. What hypothesis are you testing and what kind of tweet data have you collected? Specifically what are the axes on your scatter plot?

Cheers,
Emily
Limeybean
Expert

Posts: 26
Joined: Mon Nov 19, 2012 4:38 pm
Project Question: Science Fairs are Awesome!
Project Due Date: n/a
Project Status: Not applicable

### Re: Statistical Analysis Help

Thank you. My question is: In a protest environment, how does the number of influencers and the magnitude of influence (amount of influence needed to recruit people) change as the protest develops?

Briefly, my hypothesis is that as the protest develops, there will be more influencers (people whose tweets are retweeted to others at a significant level above others); however, in the later stages of recruitment, the number of influencers will remain constant until the climax of the protest. Also, I think that over time, the amount of influence needed and shown will cycle up and down.

The graph is my raw data and the x-axis is the dates that I recorded data for, and the y-axis is the number of retweets for each of the top 15 tweets (the 15 tweets that had the greatest number of retweets) by any person for that day. I didn't show this but I also recorded the username of the tweeters.

I attached a new spreadsheet, with a line graph connecting the points as well, if it helps. I'm thinking of finding the range/mean/median of the number of retweets for each of the top 15 tweets and comparing them. Is there a way to compare this over time? Also, what test should I use to compare the groups (Tweet 1 to 2 to 3 etc.) to see if there is a significant difference among them (e.g. a lot of the bottom Tweet levels (6-15) seem to cluster together along the same line over time but the top ones separate out more)?

Thank you very much!
Attachments
Graph2.xlsx
PhilY

Posts: 3
Joined: Mon Jan 28, 2013 7:58 pm
Project Question: Analysis of Twitter Networks
Project Due Date: 2/1/2013
Project Status: I am finished with my experiment and analyzing the data

### Re: Statistical Analysis Help

Hey PhilY,

I am sorry for the delay in responding but I needed to seek some help. I cannot take credit for the following, but I think that it should be helpful.

Let me check if I'm understanding your question correctly. It sounds like you want to know if the number of "highly influential" tweets is changing (and perhaps increasing) over the course of a protest. I'm not certain of an exact way to test this question but a potential strategy is to figure out on each day of the protest the average amount that tweets are retweeted and then identify which/how many tweets for that day are retweeted significantly more than that average (these would be your "highly influential" tweets). If you do that for each day of the protest you can then evaluate if the absolute number of influential tweets shifts over the course of the protest.

To do this for each day I would probably want more to look at more than just the top 15 tweets. Ideally if you could get all the tweets related to the protest that would be a better data set. But with whatever data you have I would calculate the mean and standard deviation for number of retweets for a unique tweet. Then you can make a decision of how far away from the mean an influential tweet should be (95% of the distribution would be within 3 SDs of the mean) and you can then add SDs to the mean to identify how many retweets such an outlier would receive. Then count how many tweets on that day have more than that threshold number of tweets.

You can then take that number of influential tweets and plot it against time to see if there is a temporal pattern.

Cheers,
Emily
Limeybean
Expert

Posts: 26
Joined: Mon Nov 19, 2012 4:38 pm
Project Question: Science Fairs are Awesome!
Project Due Date: n/a
Project Status: Not applicable

### Re: Statistical Analysis Help

Thank you for your reply! I appreciate all of your feedback; it's helping me a lot. Unfortunately, it's really hard for me to get every retweet, since I'm recording all the data manually into Excel and the process is time-consuming since the time frame is four months.

In addition to finding the number of significant influencers there are each day, I'm also trying to quantify the amount of influence each tweet has. Do you think I could do this by finding how many Standard Deviations away a certain tweet is from that day's mean to see who has more influence (ex. a tweet that is 3 standard deviations away would have more influence than one that is 2.5 standard deviations away)? Do you think this compares over time? For example, if two tweets on different days are both 3 standard deviations away, but one has more retweets than the other I think they would still have the same degree of influence even though the number of retweets is different.Would you agree?

Thank you again so much!
PhilY

Posts: 3
Joined: Mon Jan 28, 2013 7:58 pm