Modelling values by overlaying existing distributions

by JDraper   Last Updated May 15, 2019 16:19 PM

Imagine I have two datasets, one has values of a dependent variable, time spend walking, along with many other independent variables such as for instance gender, age group, day of week and dog owner etc. The other as a lot less data, just the independent variables and the number of people within these groups who were recorded as walking for a particular combination of gender, age group etc.

My question:

If I use the first dataset to construct various distribution of walking times for each level of the data e.g. one distribution for males, one for females, one for Wednesdays etc. (provided sufficient data points – I’ve read 100 is a good rule of thumb)

Would I then be able to use these distributions to obtain distributions and subsequently (via something like inverse transform sampling) a collection of data points for each row of the second dataset. e.g. could I get a distribution of walking times for females, aged 40-50, on a Wednesday who owned a dog by overlaying the underlying distributions?

Many thanks, J

Related Questions

Updated June 21, 2015 08:08 AM

Updated May 07, 2017 07:19 AM

Updated August 08, 2017 12:19 PM

Updated June 07, 2015 08:08 AM

Updated May 29, 2015 07:08 AM