As a number of us already heard, we would like to investigate if the full moon has an influence on the natality.
Hypothesis:
Moon (especially when it is full in the sky) has an influence on births.
We are going to investigate the truthfullness of this hypothesis. If it is true, we are expecting a change in the natality around the full moon, especially at its peak.
For this analysis, we took our data from the Kaggle page FiveThirtyEight Births Dataset. Its provides 2 datasets (files) giving information about birth number for each day:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | |
---|---|---|---|---|---|---|---|---|---|
0 | 1994 | 1 | 1 | 6 | 8096 | (1994, 1, 1) | 1994-01-01 | January | Saturday |
1 | 1994 | 1 | 2 | 7 | 7772 | (1994, 1, 2) | 1994-01-02 | January | Sunday |
2 | 1994 | 1 | 3 | 1 | 10142 | (1994, 1, 3) | 1994-01-03 | January | Monday |
3 | 1994 | 1 | 4 | 2 | 11248 | (1994, 1, 4) | 1994-01-04 | January | Tuesday |
4 | 1994 | 1 | 5 | 3 | 11053 | (1994, 1, 5) | 1994-01-05 | January | Wednesday |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3647 | 2003 | 12 | 27 | 6 | 8646 | (2003, 12, 27) | 2003-12-27 | December | Saturday |
3648 | 2003 | 12 | 28 | 7 | 7645 | (2003, 12, 28) | 2003-12-28 | December | Sunday |
3649 | 2003 | 12 | 29 | 1 | 12823 | (2003, 12, 29) | 2003-12-29 | December | Monday |
3650 | 2003 | 12 | 30 | 2 | 14438 | (2003, 12, 30) | 2003-12-30 | December | Tuesday |
3651 | 2003 | 12 | 31 | 3 | 12374 | (2003, 12, 31) | 2003-12-31 | December | Wednesday |
3652 rows × 9 columns
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | |
---|---|---|---|---|---|---|---|---|---|
0 | 2000 | 1 | 1 | 6 | 9083 | (2000, 1, 1) | 2000-01-01 | January | Saturday |
1 | 2000 | 1 | 2 | 7 | 8006 | (2000, 1, 2) | 2000-01-02 | January | Sunday |
2 | 2000 | 1 | 3 | 1 | 11363 | (2000, 1, 3) | 2000-01-03 | January | Monday |
3 | 2000 | 1 | 4 | 2 | 13032 | (2000, 1, 4) | 2000-01-04 | January | Tuesday |
4 | 2000 | 1 | 5 | 3 | 12558 | (2000, 1, 5) | 2000-01-05 | January | Wednesday |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5474 | 2014 | 12 | 27 | 6 | 8656 | (2014, 12, 27) | 2014-12-27 | December | Saturday |
5475 | 2014 | 12 | 28 | 7 | 7724 | (2014, 12, 28) | 2014-12-28 | December | Sunday |
5476 | 2014 | 12 | 29 | 1 | 12811 | (2014, 12, 29) | 2014-12-29 | December | Monday |
5477 | 2014 | 12 | 30 | 2 | 13634 | (2014, 12, 30) | 2014-12-30 | December | Tuesday |
5478 | 2014 | 12 | 31 | 3 | 11990 | (2014, 12, 31) | 2014-12-31 | December | Wednesday |
5479 rows × 9 columns
We have daily births count from 1st January 1994 until 31st December 2014: 7669 days (20 years).
With those numbers, the result will be precise enough to conclude on the matter.
Since the two datasets are overlapping each other for 4 years (2000, 2001, 2002, 2003), one could think of merging them and have a unique bigger dataset.
However, after looking at the data, we see that they disagree on a few hundred births each days, so they can not so easily be merged:
For the CDC:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | |
---|---|---|---|---|---|---|---|---|---|
0 | 2000 | 1 | 1 | 6 | 8843 | (2000, 1, 1) | 2000-01-01 | January | Saturday |
1 | 2000 | 1 | 2 | 7 | 7816 | (2000, 1, 2) | 2000-01-02 | January | Sunday |
2 | 2000 | 1 | 3 | 1 | 11123 | (2000, 1, 3) | 2000-01-03 | January | Monday |
3 | 2000 | 1 | 4 | 2 | 12703 | (2000, 1, 4) | 2000-01-04 | January | Tuesday |
4 | 2000 | 1 | 5 | 3 | 12240 | (2000, 1, 5) | 2000-01-05 | January | Wednesday |
For the SSA:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | |
---|---|---|---|---|---|---|---|---|---|
0 | 2000 | 1 | 1 | 6 | 9083 | (2000, 1, 1) | 2000-01-01 | January | Saturday |
1 | 2000 | 1 | 2 | 7 | 8006 | (2000, 1, 2) | 2000-01-02 | January | Sunday |
2 | 2000 | 1 | 3 | 1 | 11363 | (2000, 1, 3) | 2000-01-03 | January | Monday |
3 | 2000 | 1 | 4 | 2 | 13032 | (2000, 1, 4) | 2000-01-04 | January | Tuesday |
4 | 2000 | 1 | 5 | 3 | 12558 | (2000, 1, 5) | 2000-01-05 | January | Wednesday |
Because of this difference, datasets will stay as they are, and analysis are going to be made for each dataset.
Here we are going to look at the overall sum of births for each year.
According to the precedent graph, yearly birth rate has increased until 2007, with quite a drop in the years after. Can it be related to the subprime crisis?
More about that subject:
Here we are going to look at the overall sum of births for each month, independant of the year. Since all months does not have the same number of days (without even speaking of February!) Data will be brought back do average births in a day for each months.
Average daily birth for CDC: 108743
Average daily birth for SSA: 170246
Something interesting: there is more birth during the summer (July, August, September) than other months (around +4%/+6%). Giving the gestational time, this tells that the fertility increases around November/December.
Would it mean that people are more eager to have a baby around Thanks Giving/Christmas? Or simply take less precautions around those cheerful moments?
These questions has to be answered in a dedicated analysis; here, there are just assumptions.
Daily birth mean for CDC: 10877
Daily birth mean for SSA: 11351
New interestant things!
Here data shows that there is a clear increase of natality between Tuesdays and Fridays.
Saturdays there is -20%/-25% babies, and it is down to -28%/-44% for Sundays (compared to the average)!
How can we explain that?
Again, these are just assumptions, and would require additional analysis, and discussion with specialists.
After trying different method to get all the full moon between 1994 and 2014, library Pylunar has been used for conveninence. Different reading sources has been check to randomly verify the data.
In short, we added the column is_full_moon
to the dataset:
CDC data (random records): Shape: (3652, 10) - extract:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | is_full_moon | |
---|---|---|---|---|---|---|---|---|---|---|
2011 | 1999 | 7 | 5 | 1 | 8413 | (1999, 7, 5) | 1999-07-05 | July | Monday | False |
3464 | 2003 | 6 | 27 | 5 | 12678 | (2003, 6, 27) | 2003-06-27 | June | Friday | False |
1726 | 1998 | 9 | 23 | 3 | 12805 | (1998, 9, 23) | 1998-09-23 | September | Wednesday | False |
1571 | 1998 | 4 | 21 | 2 | 12213 | (1998, 4, 21) | 1998-04-21 | April | Tuesday | False |
2508 | 2000 | 11 | 13 | 1 | 11129 | (2000, 11, 13) | 2000-11-13 | November | Monday | False |
SSA data (random records): Shape: (5479, 10) - extract:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | is_full_moon | |
---|---|---|---|---|---|---|---|---|---|---|
1598 | 2004 | 5 | 17 | 1 | 12172 | (2004, 5, 17) | 2004-05-17 | May | Monday | False |
2515 | 2006 | 11 | 20 | 1 | 14053 | (2006, 11, 20) | 2006-11-20 | November | Monday | False |
3001 | 2008 | 3 | 20 | 4 | 13614 | (2008, 3, 20) | 2008-03-20 | March | Thursday | False |
4939 | 2013 | 7 | 10 | 3 | 12915 | (2013, 7, 10) | 2013-07-10 | July | Wednesday | False |
5415 | 2014 | 10 | 29 | 3 | 12102 | (2014, 10, 29) | 2014-10-29 | October | Wednesday | False |
Full moons can overlap multiple days (usually it is some hours around the peak). So to prevent having inconsistent data, second days have been removed, and only first days are kept.
Now we know which days are full moons, and which are not, the result is close!
Other additional columns are added:
NB: Data from the first 14 days and the last 14 days have been removed to avoid edge cases.
CDC data (random records): Shape: (3567, 13) - extract:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | is_full_moon | days_from_previous_full_moon | days_until_next_full_moon | days_to_full_moon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2874 | 2002 | 1 | 17 | 4 | 12077 | (2002, 1, 17) | 2002-01-17 | January | Thursday | False | 18 | 11 | -11 |
3149 | 2002 | 10 | 24 | 4 | 12447 | (2002, 10, 24) | 2002-10-24 | October | Thursday | False | 3 | 27 | 3 |
2289 | 2000 | 5 | 31 | 3 | 12725 | (2000, 5, 31) | 2000-05-31 | May | Wednesday | False | 13 | 16 | 13 |
1318 | 1997 | 9 | 15 | 1 | 11584 | (1997, 9, 15) | 1997-09-15 | September | Monday | False | 28 | 1 | -1 |
2099 | 1999 | 11 | 20 | 6 | 8630 | (1999, 11, 20) | 1999-11-20 | November | Saturday | False | 27 | 3 | -3 |
SSA data (random records): Shape: (5365, 13) - extract:
year | month | date_of_month | day_of_week | births | date | date_str | month_str | day_of_week_str | is_full_moon | days_from_previous_full_moon | days_until_next_full_moon | days_to_full_moon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3166 | 2008 | 11 | 4 | 2 | 13244 | (2008, 11, 4) | 2008-11-04 | November | Tuesday | False | 21 | 9 | -9 |
4132 | 2011 | 7 | 15 | 5 | 13097 | (2011, 7, 15) | 2011-07-15 | July | Friday | True | 0 | 0 | 0 |
2330 | 2006 | 7 | 7 | 5 | 14580 | (2006, 7, 7) | 2006-07-07 | July | Friday | False | 26 | 4 | -4 |
261 | 2000 | 9 | 29 | 5 | 13045 | (2000, 9, 29) | 2000-09-29 | September | Friday | False | 16 | 14 | -14 |
59 | 2000 | 3 | 7 | 2 | 12420 | (2000, 3, 7) | 2000-03-07 | March | Tuesday | False | 17 | 13 | -13 |
Before looking into the core chart, a verification must be done: consistency in the number of days around full moons should be validated: if some days are more present (should not be the case because the time range is long enough and the edges have been removed) than others, it could give inconsistent conclusions.
That graph shows that we have the exact same number of 1-day before full moon than the number of full moons and the number of 5-days after full moon, for each dataset.
Since we saw that the day of the week has an influence on the births, we also have to verify if by chance, full moon are correctly spread around the week: if we have by chance only full moons on week ends, the result would then be biased, let's make sure we do not have that.
Full moon are pretty well distributed across the week.
Here it comes!
If our hypothesis is true, something special should be present around full moons, let's see!
We observe no special behavior on previous chart indicating that full moon have an influence on natality.
Full moons have no influence whatsoever on natality, it is just a myth