Birth Full Moon Relation

Goal of this analysis

As a number of us already heard, we would like to investigate if the full moon has an influence on the natality.

Hypothesis:

Moon (especially when it is full in the sky) has an influence on births.

We are going to investigate the truthfullness of this hypothesis. If it is true, we are expecting a change in the natality around the full moon, especially at its peak.

Data origin

For this analysis, we took our data from the Kaggle page FiveThirtyEight Births Dataset. Its provides 2 datasets (files) giving information about birth number for each day:

year month date_of_month day_of_week births date date_str month_str day_of_week_str
0 1994 1 1 6 8096 (1994, 1, 1) 1994-01-01 January Saturday
1 1994 1 2 7 7772 (1994, 1, 2) 1994-01-02 January Sunday
2 1994 1 3 1 10142 (1994, 1, 3) 1994-01-03 January Monday
3 1994 1 4 2 11248 (1994, 1, 4) 1994-01-04 January Tuesday
4 1994 1 5 3 11053 (1994, 1, 5) 1994-01-05 January Wednesday
... ... ... ... ... ... ... ... ... ...
3647 2003 12 27 6 8646 (2003, 12, 27) 2003-12-27 December Saturday
3648 2003 12 28 7 7645 (2003, 12, 28) 2003-12-28 December Sunday
3649 2003 12 29 1 12823 (2003, 12, 29) 2003-12-29 December Monday
3650 2003 12 30 2 14438 (2003, 12, 30) 2003-12-30 December Tuesday
3651 2003 12 31 3 12374 (2003, 12, 31) 2003-12-31 December Wednesday

3652 rows × 9 columns

year month date_of_month day_of_week births date date_str month_str day_of_week_str
0 2000 1 1 6 9083 (2000, 1, 1) 2000-01-01 January Saturday
1 2000 1 2 7 8006 (2000, 1, 2) 2000-01-02 January Sunday
2 2000 1 3 1 11363 (2000, 1, 3) 2000-01-03 January Monday
3 2000 1 4 2 13032 (2000, 1, 4) 2000-01-04 January Tuesday
4 2000 1 5 3 12558 (2000, 1, 5) 2000-01-05 January Wednesday
... ... ... ... ... ... ... ... ... ...
5474 2014 12 27 6 8656 (2014, 12, 27) 2014-12-27 December Saturday
5475 2014 12 28 7 7724 (2014, 12, 28) 2014-12-28 December Sunday
5476 2014 12 29 1 12811 (2014, 12, 29) 2014-12-29 December Monday
5477 2014 12 30 2 13634 (2014, 12, 30) 2014-12-30 December Tuesday
5478 2014 12 31 3 11990 (2014, 12, 31) 2014-12-31 December Wednesday

5479 rows × 9 columns

We have daily births count from 1st January 1994 until 31st December 2014: 7669 days (20 years).

With those numbers, the result will be precise enough to conclude on the matter.

Exploratory analysis of datasets

Since the two datasets are overlapping each other for 4 years (2000, 2001, 2002, 2003), one could think of merging them and have a unique bigger dataset.

However, after looking at the data, we see that they disagree on a few hundred births each days, so they can not so easily be merged:

For the CDC:

year month date_of_month day_of_week births date date_str month_str day_of_week_str
0 2000 1 1 6 8843 (2000, 1, 1) 2000-01-01 January Saturday
1 2000 1 2 7 7816 (2000, 1, 2) 2000-01-02 January Sunday
2 2000 1 3 1 11123 (2000, 1, 3) 2000-01-03 January Monday
3 2000 1 4 2 12703 (2000, 1, 4) 2000-01-04 January Tuesday
4 2000 1 5 3 12240 (2000, 1, 5) 2000-01-05 January Wednesday

For the SSA:

year month date_of_month day_of_week births date date_str month_str day_of_week_str
0 2000 1 1 6 9083 (2000, 1, 1) 2000-01-01 January Saturday
1 2000 1 2 7 8006 (2000, 1, 2) 2000-01-02 January Sunday
2 2000 1 3 1 11363 (2000, 1, 3) 2000-01-03 January Monday
3 2000 1 4 2 13032 (2000, 1, 4) 2000-01-04 January Tuesday
4 2000 1 5 3 12558 (2000, 1, 5) 2000-01-05 January Wednesday

Because of this difference, datasets will stay as they are, and analysis are going to be made for each dataset.

Year analysis

Here we are going to look at the overall sum of births for each year.

According to the precedent graph, yearly birth rate has increased until 2007, with quite a drop in the years after. Can it be related to the subprime crisis?

More about that subject:

Month analysis

Here we are going to look at the overall sum of births for each month, independant of the year. Since all months does not have the same number of days (without even speaking of February!) Data will be brought back do average births in a day for each months.

Average daily birth for CDC: 108743

Average daily birth for SSA: 170246

Something interesting: there is more birth during the summer (July, August, September) than other months (around +4%/+6%). Giving the gestational time, this tells that the fertility increases around November/December.

Would it mean that people are more eager to have a baby around Thanks Giving/Christmas? Or simply take less precautions around those cheerful moments?

These questions has to be answered in a dedicated analysis; here, there are just assumptions.

Day of week analysis

Daily birth mean for CDC: 10877

Daily birth mean for SSA: 11351

New interestant things!

Here data shows that there is a clear increase of natality between Tuesdays and Fridays.

Saturdays there is -20%/-25% babies, and it is down to -28%/-44% for Sundays (compared to the average)!

How can we explain that?

  • This PubMed article states that it is because of caesarean avoidance on week ends (they can be planned) for cost reasons,
  • We can also think on mothers reasons: they may have more activities in the week-ends with their families, friends, and may be less prone to give birth.

Again, these are just assumptions, and would require additional analysis, and discussion with specialists.

Add full moon data

After trying different method to get all the full moon between 1994 and 2014, library Pylunar has been used for conveninence. Different reading sources has been check to randomly verify the data.

In short, we added the column is_full_moon to the dataset:

CDC data (random records): Shape: (3652, 10) - extract:

year month date_of_month day_of_week births date date_str month_str day_of_week_str is_full_moon
2011 1999 7 5 1 8413 (1999, 7, 5) 1999-07-05 July Monday False
3464 2003 6 27 5 12678 (2003, 6, 27) 2003-06-27 June Friday False
1726 1998 9 23 3 12805 (1998, 9, 23) 1998-09-23 September Wednesday False
1571 1998 4 21 2 12213 (1998, 4, 21) 1998-04-21 April Tuesday False
2508 2000 11 13 1 11129 (2000, 11, 13) 2000-11-13 November Monday False

SSA data (random records): Shape: (5479, 10) - extract:

year month date_of_month day_of_week births date date_str month_str day_of_week_str is_full_moon
1598 2004 5 17 1 12172 (2004, 5, 17) 2004-05-17 May Monday False
2515 2006 11 20 1 14053 (2006, 11, 20) 2006-11-20 November Monday False
3001 2008 3 20 4 13614 (2008, 3, 20) 2008-03-20 March Thursday False
4939 2013 7 10 3 12915 (2013, 7, 10) 2013-07-10 July Wednesday False
5415 2014 10 29 3 12102 (2014, 10, 29) 2014-10-29 October Wednesday False

Full moons can overlap multiple days (usually it is some hours around the peak). So to prevent having inconsistent data, second days have been removed, and only first days are kept.

Now we know which days are full moons, and which are not, the result is close!

Other additional columns are added:

  • The distance (in number of days) from previous full moon
  • The distance (in number of days) from next full moon
  • The distance (in number of days) from closest full moon

NB: Data from the first 14 days and the last 14 days have been removed to avoid edge cases.

CDC data (random records): Shape: (3567, 13) - extract:

year month date_of_month day_of_week births date date_str month_str day_of_week_str is_full_moon days_from_previous_full_moon days_until_next_full_moon days_to_full_moon
2874 2002 1 17 4 12077 (2002, 1, 17) 2002-01-17 January Thursday False 18 11 -11
3149 2002 10 24 4 12447 (2002, 10, 24) 2002-10-24 October Thursday False 3 27 3
2289 2000 5 31 3 12725 (2000, 5, 31) 2000-05-31 May Wednesday False 13 16 13
1318 1997 9 15 1 11584 (1997, 9, 15) 1997-09-15 September Monday False 28 1 -1
2099 1999 11 20 6 8630 (1999, 11, 20) 1999-11-20 November Saturday False 27 3 -3

SSA data (random records): Shape: (5365, 13) - extract:

year month date_of_month day_of_week births date date_str month_str day_of_week_str is_full_moon days_from_previous_full_moon days_until_next_full_moon days_to_full_moon
3166 2008 11 4 2 13244 (2008, 11, 4) 2008-11-04 November Tuesday False 21 9 -9
4132 2011 7 15 5 13097 (2011, 7, 15) 2011-07-15 July Friday True 0 0 0
2330 2006 7 7 5 14580 (2006, 7, 7) 2006-07-07 July Friday False 26 4 -4
261 2000 9 29 5 13045 (2000, 9, 29) 2000-09-29 September Friday False 16 14 -14
59 2000 3 7 2 12420 (2000, 3, 7) 2000-03-07 March Tuesday False 17 13 -13

Core analysis

Before looking into the core chart, a verification must be done: consistency in the number of days around full moons should be validated: if some days are more present (should not be the case because the time range is long enough and the edges have been removed) than others, it could give inconsistent conclusions.

That graph shows that we have the exact same number of 1-day before full moon than the number of full moons and the number of 5-days after full moon, for each dataset.

Since we saw that the day of the week has an influence on the births, we also have to verify if by chance, full moon are correctly spread around the week: if we have by chance only full moons on week ends, the result would then be biased, let's make sure we do not have that.

Full moon are pretty well distributed across the week.

Here it comes!

If our hypothesis is true, something special should be present around full moons, let's see!

Conclusion

We observe no special behavior on previous chart indicating that full moon have an influence on natality.

Full moons have no influence whatsoever on natality, it is just a myth