StatCrunch logo (home)

Report Properties
Thumbnail:

from Flickr
Owner: wpd1728
Created: Mar 28, 2010
Share: yes
Views: 7562
Tags:
 
Results in this report
 
Data sets in this report
 
Need help?
To copy selected text, right click to Copy or choose the Copy option under your browser's Edit menu. Text copied in this manner can be pasted directly into most documents with formatting maintained.
To copy selected graphs, right click on the graph to Copy. When pasting into a document, make sure to paste the graph content rather than a link to the graph. For example, to paste in MS Word choose Edit > Paste Special, and select the Device Independent Bitmap option.
You can now also Mail results and reports. The email may contain a simple link to the StatCrunch site or the complete output with data and graphics attached. In addition to being a great way to deliver output to someone else, this is also a great way to save your own hard copy. To try it out, simply click on the Mail link.
2008 Money-making movies
Mail   Print   Twitter   Facebook

William Denno

Description of the Data: “2008 Money-making movies”

Eighty movies round out the list of a random sample of in-theater movies that generated revenue in 2008.  The list of movies includes movies released prior to 2008 that had 2008 in-theater revenue.

 

Source of the Data:

The source of the data originally comes from http://www.the-numbers.com/.  The-Numbers.com is a website that was officially launched on October 17th, 1997 as a free resource for industry professionals, the investment community, and movie fans to track business information on movies.  The site has grown to become the largest freely available database of movie industry information on the web. 

 

Description of the Variables:

Nine variables including four quantitative and five qualitative variables make-up the random sample chosen in this study. 

Four quantitative variables include:

  • 2008 Rank of in-theater movies
  • 2008 Gross Sales
  • 2008 Number of tickets sold
  • 2008 Inflation-adjusted gross sales. 

Five qualitative variables include:

  • Movie name
  • Genre
  • Release date
  • MPAA Rating
  • Distributor of the movie.  The distributor of the movie is an important variable in that investors may use that information to help make a good investment decision. 

 

The objectives of this project include:

  • Display/summarize the data to better understand successful in-theater movies during 2008. 
  • Provide a framework for investors to consider when investing in certain movie distributor companies, investing in future releases, or to create a business plan for independent movies.
  • Provide information that can be used to identify distribution strategies that are appropriate for a particular movie or type of movie.

 

Analysis and Charts:

     Below you will find several charts and accompanying analysis to help explain specifics about the 2008 Movies studied.

 

The below pie chart displays the relative frequencies of the movie Genres in this study.  Note how Drama and Comedy movies represent over 64% of the movies.

Result 1: 2008 In-Theater Movie Genres   [Info]
Right click to copy

 

The In-Theater Movie Ratings (MPAA) chart below displays the groups of ratings the movies represented in this study.  Movies that have an "R" rating or "Not Rated" are amoungst the most common movies in this study.

Result 2: 2008 In-Theater Movie Ratings   [Info]
Right click to copy

 

The below chart shows the breakdown of the Distributors that were tied to the movies in this study.  It is important to note that five of the most well known distributors represent the top five on the list, and represent over 91% (or $1.6 billion) of the gross sales during 2008.

Generally speaking, based on the relative frequency seen between gross sales and tickets sold, all movie tickets were sold at an average price of $7.18 for allmovies.  Realistically, people do use discount tickets when they go to the movie theater.  Either gross sales is a true number or number of tickets is a true number, and the other one was backed into.  Additionally, based on the study, the inflation-adjusted gross sales mirrors the gross sales, so no inflation was factored in.

Result 3: 2008 In-Theater Movie Distributors   [Info]
Right click to copy

 

The below Histogram dipicts the 2008 Gross sales in this study.  The following three histograms are all skewed to the right. 

Result 4: 2008 Gross Sales Histogram   [Info]
Right click to copy

 

The Ticket Sales Histogram (below) tells virtually the same story as the previous Histogram, both visually and based on the information shared earlier.

Result 5: Histogram-Ticket Sales   [Info]
Right click to copy

 

The Inflation-Adjusted Gross Sales Histogram (below) tells virtually the same story as the previous 2 Histograms, both visually and based on the information shared earlier.

Result 6: Histogram-Inflation Adjusted Gross Sales   [Info]
Right click to copy

 

The following charts detail the Statisics for Gross Sales, Ticket Sales and Inflation-Adjusted Gross Sales.  It's important to note that based on the results of the histogram and the fact that the results are skewed right, the median is a better measure of central tendancy (vs the mean).  As such, the 5-number summary is a better tool to use when you want to analyse these distributions.  The distribution is not bell-shaped, so it is not symmetric.  Therefore, the standard deviation is not the best measure to use to describe the distribution.  Standard Deviation works well with symmetric,bell-shaped curves.

Result 7: Column Statistics-Gross Sales   [Info]
Summary statistics:
Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3
2008 Gross 79 2.2767278E7 4.8997846E15 6.9998464E7 7875442 487215 5.31001152E8 424 5.31001568E8 44852 1.2024598E7

Result 8: Column Statistics-Ticket Sales   [Info]
Summary statistics:
Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3
Tickets Sold 79 3170930.2 9.5044741E13 9749089 1096858.2 67857 7.3955592E7 59 7.3955648E7 6247 1674735

Result 9: Column Statistics-Inflation Adj Gross Sales   [Info]
Summary statistics:
Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3
Inflation-Adjusted Gross 79 2.2767278E7 4.8997846E15 6.9998464E7 7875442 487213 5.31001152E8 424 5.31001568E8 44853 1.2024597E7

 

Below you will see a relative frequency table which displays the share each Genre has from the movies in this study.  This table supports the first graph in this report.

Result 10: Frequency Table-Genre   [Info]
Frequency table results for Genre:
Genre Frequency Relative Frequency
Action 3 0.03846154
Adventure 7 0.08974359
Comedy 17 0.21794872
Concert/Performance 1 0.012820513
Documentary 10 0.12820514
Drama 33 0.42307693
Horror 2 0.025641026
Musical 1 0.012820513
Romantic Comedy 2 0.025641026
Thriller/Suspense 2 0.025641026

 

The frequency table below reflects the (MPAA) Ratings that the movies represent.  This table supports the second graph in this report.

Result 11: Frequency Table-Ratings   [Info]
Frequency table results for MPAA:
MPAA Frequency Relative Frequency
G 5 0.06410257
Not Rated 26 0.33333334
PG 8 0.102564104
PG-13 17 0.21794872
R 22 0.2820513

 

The frequency table below reflects the Distributors associated with the movies in this study.  The data supports the third graph in this report.  This report would be useful if a group of investors wanted to see how many movies  key distributors have financed in 2008.  If key distributors invested in several successful movies, they might consider investing in them given the opportunity.

Result 12: Frequency Table-Distributor   [Info]
Frequency table results for Distributor:
Distributor Frequency Relative Frequency
20th Century Fox 8 0.101265825
Abramorama Films 1 0.012658228
Alliance Atlantis 1 0.012658228
Anchor Bay Entertainment 1 0.012658228
Blue Water Entertainment 1 0.012658228
Buena Vista 3 0.03797468
Cinema Guild 1 0.012658228
Emerging Pictures 1 0.012658228
Eros Entertainment 3 0.03797468
Film Movement 1 0.012658228
IFC Films 5 0.06329114
IFC First Take 1 0.012658228
Indican Pictures 1 0.012658228
International Film Circuit 1 0.012658228
Kino International 1 0.012658228
Link Productions Ltd. 1 0.012658228
Lionsgate 1 0.012658228
MGM 2 0.025316456
Magic Lamp 1 0.012658228
Maya Releasing 1 0.012658228
Miramax 2 0.025316456
Mitropoulos Films 1 0.012658228
Music Box Films 1 0.012658228
National Geographic 1 0.012658228
New Yorker 1 0.012658228
Newstyle Releasing 1 0.012658228
Oscilloscope Pictures 1 0.012658228
Paramount Pictures 1 0.012658228
Peace Arch Releasing 1 0.012658228
Picturehouse 1 0.012658228
Priority Films 1 0.012658228
Regent Releasing 3 0.03797468
Rialto Pictures 1 0.012658228
Riverrain 1 0.012658228
Roadside Attractions 3 0.03797468
Self Distributed 1 0.012658228
Senart/Scranton-Lacy 1 0.012658228
Sony Pictures 2 0.025316456
Sony Pictures Classics 2 0.025316456
Strand 2 0.025316456
ThinkFilm 2 0.025316456
Typecast Releasing 1 0.012658228
UTV Communications 1 0.012658228
Universal 1 0.012658228
Vitagraph Films 1 0.012658228
Warner Bros. 5 0.06329114
Weinstein Co. 2 0.025316456
Yari Film Group Releasing 1 0.012658228
Zeitgeist 1 0.012658228

 

Simple Linear Regression results:

Generally speaking, the stats below include a dependent variable and an independent variable.  The dependent variable can be considered the “outcome” variable, and the independent variable can be considered the “predictor” variable.  Sample size for the regression results is 79.

 

Rank and Gross Sales regression and scatter plot analysis:

The coorelation coefficient is -.4676.  This indicates a moderately negative relation between the two variables.  The r-squared value is .21862683.  The r-squared value is a good indicator of strength of the relationships.  That r-squared value indicates that if we know the movie rank, we can predict 21.9% of the variants in gross sales.

The P-value is a null hypothesis significance test for each coefficient.  Since the P-value here is less than 05, there is a low probability of getting something like this through random sampling when there is no effect or the coefficient is 0 in the population.  The P-value here lets us know the negative coeffient is a reliable number. 

Note: The model P-value and the slope P-value is the same because there is only one predictor variable.

In the parameter estimates section, you’ll notice the slope (-155,288) and intercept (7.42M).

Result 13: Simple Linear Regression: Rank and Gross Sales   [Info]
Simple linear regression results:
Dependent Variable: 2008 Gross
Independent Variable: Rank
2008 Gross = 7.4230448E7 - 155258.17 Rank
Sample size: 79
R (correlation coefficient) = -0.4676
R-sq = 0.21862683
Estimate of error standard deviation: 6.2275852E7

Parameter estimates:
Parameter Estimate Std. Err. DF T-Stat P-Value
Intercept 7.4230448E7 1.3115717E7 77 5.659656 <0.0001
Slope -155258.17 33449.27 77 -4.641601 <0.0001


Analysis of variance table for regression model:
Source DF SS MS F-stat P-value
Model 1 8.3555496E16 8.3555496E16 21.544462 <0.0001
Error 77 2.98627702E17 3.87828165E15
Total 78 3.82183198E17

Result 14: Scatter Plot: Movie Rank and Gross Sales   [Info]
Right click to copy

 

Gross Sales and Tickets Sold regression and scatter plot analysis:

The con coefficient is 1, which indicates a perfect positive linear relation between the two variables.  The r-squared value is also 1 which means with this data, we can predict 100% of the variants in tickets sold. 

The P-value is the same value as in the Rank and Gross Sales regression analysis.  Since the P-value here is less than 05, there is a low probability of getting something like this through random sampling when there is no effect or the coefficient is 0 in the population. 

In the parameter estimates section, you’ll notice the slope (7.18) and intercept (-.44534105).

Result 15: Simple Linear Regression: Ticket Sales and Gross Sales   [Info]
Simple linear regression results:
Dependent Variable: 2008 Gross
Independent Variable: Tickets Sold
2008 Gross = -0.44534105 + 7.18 Tickets Sold
Sample size: 79
R (correlation coefficient) = 1
R-sq = 1
Estimate of error standard deviation: 2.883

Parameter estimates:
Parameter Estimate Std. Err. DF T-Stat P-Value
Intercept -0.44534105 0.3412979 77 -1.3048456 0.1958
Slope 7.18 3.3483687E-8 77 2.14432768E8 <0.0001


Analysis of variance table for regression model:
Source DF SS MS F-stat P-value
Model 1 3.82183198E17 3.82183198E17 4.5981413E16 <0.0001
Error 77 640 8.311688
Total 78 3.82183198E17

Result 16: Scatter Plot: Ticket Sales and Gross Sales   [Info]
Right click to copy

 Simple Linear Regression results (after outliers have been removed):

 After removing outliers from the data, we show the following changes to the linear regression results.  In the Rank and Gross Sales results (below), the r value (coorelation coefficient) changed to a factor of -.6751 which indicates a stronger negative linear relation (vs. the relation we have with the outliers included; -.4676).  By omitting the outliers, the highest ranked movies are not accounted for (included) in the analysis.  There is a group of ranked movies we are not accounting for.  The highest ranked movies are more of an anomolie  are compared to so many "average" ranked movies.  The greatest movies (highly-ranked movies) usually cost the most to make.  There are very few movies that can be made with such a high price tag.  When you take those "few"movies out of the mix, you begin to look at the many remaining movies that potentially have similar rankings based on a smaller price tag and success in the theater.  The r-squared value also changed to a .4557867 from a .21862683 factor.  This means, if we know the movie rank, we can predict 45.6% of the variants in gross sales; a higher percentage than when the data has outliers.  The outliers are certainly skewing the results.  I expect to see this type of change after the outliers are removed.  You'll also notice that the P-values have no change.

Result 17: Simple Linear Regression-Rank and Gross Sales No Outliers   [Info]
Simple linear regression results:
Dependent Variable: 2008 Gross
Independent Variable: Rank
2008 Gross = 1.0063407E7 - 19039.344 Rank
Sample size: 66
R (correlation coefficient) = -0.6751
R-sq = 0.4557867
Estimate of error standard deviation: 3817854.2

Parameter estimates:
Parameter Estimate Std. Err. DF T-Stat P-Value
Intercept 1.0063407E7 1113764.1 64 9.035492 <0.0001
Slope -19039.344 2600.5532 64 -7.3212667 <0.0001


Analysis of variance table for regression model:
Source DF SS MS F-stat P-value
Model 1 7.8128804E14 7.8128804E14 53.60095 <0.0001
Error 64 9.3286475E14 1.45760117E13
Total 65 1.71415272E15

In the Ticket Sales and Gross Sales results, there is no change to the coorelation coefficient or r-squared values.  I would not expect a change here because, as mentioned earlier in this report, the price of a ticket for every movie in this study is exactly $7.18.  Whether the movie did very well or sold little tickets, the relationship between ticket sales and gross sales was the same.  Taking out the outliers had no impact on the r and red values.  The new r-value is 1: a perfect positive linear relation.  We had this same r value when the outliers were included in the analysis.  You'll notice the slope and intercept are exactly the same (when compared to the results that include the outliers).

Result 18: Simple Linear Regression-Ticket Sales and Gross Sales No Outliers   [Info]
Simple linear regression results:
Dependent Variable: Tickets Sold
Independent Variable: 2008 Gross
Tickets Sold = 0.07323955 + 0.13927576 2008 Gross
Sample size: 66
R (correlation coefficient) = 1
R-sq = 1
Estimate of error standard deviation: 0.29656646

Parameter estimates:
Parameter Estimate Std. Err. DF T-Stat P-Value
Intercept 0.07323955 0.041213583 64 1.777073 0.0803
Slope 0.13927576 7.163038E-9 64 1.9443672E7 <0.0001


Analysis of variance table for regression model:
Source DF SS MS F-stat P-value
Model 1 3.32506868E13 3.32506868E13 3.78056376E14 <0.0001
Error 64 5.6289062 0.08795166
Total 65 3.32506868E13

The following two Summary Stats tables reflect the data without the outliers.  When you compare them to the original summary reports (which include the outliers), it is clear that the stats have changed.  By omitting the outliers, your maximum number changes which will effect other stats of the data (median, Q1, Q3, std deviation).  This is true for gross sales and ticket sales. 

Result 19: Column Statistics-Gross Sales No Outliers   [Info]
Summary statistics:
Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3
2008 Gross 66 2670660.8 2.63715815E13 5135327 632115 192259 2.0982054E7 424 2.0982478E7 30316 1470856

Result 20: Column Statistics-Ticket Sales No Outliers   [Info]
Summary statistics:
Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3
Tickets Sold 66 371958.4 5.11549014E11 715226.56 88038.305 26777 2922292 59 2922351 4222 204855

Note: A new Dataset (StatProj1) was added to reflect the original data minus the outliers.

Phase 4: Testing a Hypothesis and Confidence Interval

Note: For checking purposes, calculations were done in a TI-83 calculator, manually and in Statcrunch.

Testing a Hypothesis

1.   Hypothesis: 

According to the widely-used movie data website www.the-numbers.com/market, the population (number of tickets sold) mean of all in-theater movies during 2008 is 1,876,928.711.  I believe a subset of that data (data which my report is based upon: 2008 Money-Making Movies) which represents a random sample of the movies across all genres will have a mean that is lower than the total.  I believe the mean will be lower because there were a total of 738 movies that had theater ticket sales in 2008.  The subset I used for my project represents 79 movies.  I believe a 10.7% sample is too small to align to the total mean. 

Formally,

    Ho: μ = 1.9M tickets

    H1: μ < 1.9M tickets

 

2.  Assumptions:

The sample is obtained by using simple random sampling.  The sample size >30.  The population mean, µ, is the parameter which is being tested.  This is a left-tailed test. 

 

3.  Test Statistics 

      x-bar = 3,170,930.2

      σ = 9,749,089

      n = 79

      µ0 = 1,876,928.711

      Calculator:  Test, #1:  Z-Test

Result: z: 1.1797345

To find the test statistic, z0, Use the equation z = (xbar - µ) /(σ /√n).  

( 3,170,930.2 – 1,876,928.711 ) / ( 9,749,089 / √79 ) = ( 1294001.489 ) / (1,096,858.208 ) = 1.1797345

      Normalcdf( -9999, 1.1797345, 0, 1 )

      p = .8809

 

4.  P-Value.  The probability of obtaining a sample mean of less than 9,749,089 from a population whose mean is 1.9M  is .8809.  This means that approximately 88 samples out of 100 will give a mean as low or lower than the one obtained if the population mean was 9,749,089. 

 

Statcrunch Calculations:

Result 21: One sample Z statistics with data-Phase4 #3   [Info]
Hypothesis test results:
μ : mean of Variable
H0 : μ = 1876928.8
HA : μ < 1876928.8
Std. Dev. = 9749089
Variable n Sample Mean Std. Err. Z-Stat P-value
Tickets Sold 79 3170930.2 1096858.2 1.1797345 0.8809

 

5.  Conclusion:

 

P>α, or .8809>.05.  I do not reject the Ho (null hypothesis).  There is not sufficient evidence that the average in-theater ticket sales is less than 9,749,089.  The p-value is significantly higher than α.

Confidence Interval:

1.  Level: 95%

X-bar: 3,170,930.2

Std Dev: 9,749,089

n = 79

C-Level: .95

Calculator: Test #7 result: (1021127.56, 5320732.5)

Result 22: One sample Z statistics with data-Phase4#2   [Info]
95% confidence interval results:
μ : mean of Variable
Std. Dev. = 9749089
Variable n Sample Mean Std. Err. L. Limit U. Limit
Tickets Sold 79 3170930.2 1096858.2 1021127.56 5320732.5

2.  Explanation of the Confidence Interval:

I am 95% confident the mean ticket sales 3,170,930.2 is between 1,021,127.56 and 5,320,732.5.  The population mean does in fact fall within the confidence interval. 

3.  95% confidence means:

All sample means lie within 1.96 standard deviations of the population mean.  Additionally, 2.5% of the sample means lie in each tail.  Additionally, a 95% level of confidence implies that if 100 different confidence intervals are constructed, I would expect 95 of the intervals to include the mean of 3,170,930.2.  

4. Confidence in my result: 

I am confident in my results.  As mentioned, the population mean falls within the confidence interval.  Based on all of the tests performed on this set of data, I have no reason not to feel comfortable with the results. 

Data set 1. Annual Movie Data 2008 Random Sampling.txt   [Info]
To analyze this data, please sign in.

Data set 2. StatProj1.xls   [Info]
To analyze this data, please sign in.

HTML link:
<A href="https://www.statcrunch.com/5.0/viewreport.php?reportid=11070">2008 Money-making movies</A>

Comments
Want to comment? Subscribe
Already a member? Sign in.
By po3449
Feb 15, 2010

Sorry about that, statcrunch kept giving me an error message!
By po3449
Feb 15, 2010

Bill, you want to discuss the skewness of the distribution (outliers) and why that might. Is there a particular genre that is more popular than others or production company? Are the results as you expected?
By po3449
Feb 15, 2010

Bill, you want to discuss the skewness of the distribution (outliers) and why that might. Is there a particular genre that is more popular than others or production company? Are the results as you expected?
By po3449
Feb 15, 2010

Bill, you want to discuss the skewness of the distribution (outliers) and why that might. Is there a particular genre that is more popular than others or production company? Are the results as you expected?
By po3449
Jan 18, 2010

Be sure to put the units of measure associated with your quantitative variables (i.e. millions)

Always Learning