Should be do-able Not sure I'll have time to do any of this stuff today as we've got company tonight and I have work (I just keep popping on for a few mins when waiting to hear back from the office!), but if not it'll be tomorrow I expect
Log in to view your messages, post comments, update your blog or tracker.
158 posts
Page 6 of 11
Work???!! I wouldn't let that get in the way... but I suppose we'll let you use the excuse of having guests
Moogie wrote: At the moment it loops through the data, adds up the losses (start weight - current weight) and divides by the cumulative number of weeks (calculated from users start dates to last weigh in date). Not quite sure how I'd do a median with that, as I said I'm no mathematician!
I consulted with a Maths undergraduate and I think the code could work like this:
- calculate the daily loss for each user for each day (by simple averaging between the appropriate dates), and record each daily loss as an entry. This will give you a big dataset but shouldn't be a problem for a computer. To give an example: a member who has provided three data points - 1st Jan 75kg, 8th Jan 74kg, 24th Jan 73kg. The weight loss for days 1-7 is 1/7kg per day, for days 8-23 is 1/16kg per day. So this one user wold contribute to the overall dataset a total of 23 entries: 7 entries of 0.142 and 16 entries of 0.0625.
- once you have the dataset of all the values for all the users (or filtered just for those you want to study) sort these values into numerical order (smallest to largest or largest to smallest it doesn't matter)
- take the middle value i.e. if there are 99 results you use the 50th
- if there are an even number of values you just take a simple average of the two middle values i.e. for 100 results you use (50th+51st)/2
The same system could be used for subsets (filtered results) and it could also be used for carorees's idea that we have results for the 1st week of diet, 2nd week of diet etc. - to do this we just make sure that the original dataset also includes (in another 'column' of the array) for each entry the day of the diet (where the first day for that member, whatever the actual date, is 1): then we can easily filter the data for days 1-7, 8-14 etc.
That's the theory, I think! How to code it is another matter, but I have some experience with php if any help is needed.
Thanks for that dominic. It could end up a bit intensive in terms of server load but sounds possible at least I'll have to see if/when I have time to make those modifications.
The more simple option for me to discount vastly different/likely erroneous user data would be to check the bmi & w:h ratio of a user when calculating their current loss, and simply not add in their data if it seems absurd (ie, BMI or w:h beyond or below reasonable expectations). We'll see.
I'll probably end up trying your way if only because I'm a sucker for requests and trying to please people!
The more simple option for me to discount vastly different/likely erroneous user data would be to check the bmi & w:h ratio of a user when calculating their current loss, and simply not add in their data if it seems absurd (ie, BMI or w:h beyond or below reasonable expectations). We'll see.
I'll probably end up trying your way if only because I'm a sucker for requests and trying to please people!
Thanks for your reply Moogie, I think this progress tracker is really amazing. If the data can be handled 'properly' the results it can give might be not only anecdotally fascinating but (I think) have real scientific validity. That's because it will be based on a large 'population' (241 people already!) and you have built in quite a good range of filters (type of diet, age, starting BMI etc.)
I agree it would be really helpful for individuals entering their data if you could get the system to pre-vet figures and refuse data entries that are obviously wrong, giving the member the chance to provide correct data instead. But my feeling is that this is not so important for the overall dataset because a few 'ridiculous' pieces of data will have no effect on the overall figures with the median-based approach (whereas they can skew the present mean-based figures).
If it was possible, it would be really interesting to get the views of someone who has actually worked in this area for research i.e. statistical analysis of population data. I am sure this is all very well-worn ground for those in the know. If anyone is reading this and has the knowledge or knows where to get it, please chime in!
I agree it would be really helpful for individuals entering their data if you could get the system to pre-vet figures and refuse data entries that are obviously wrong, giving the member the chance to provide correct data instead. But my feeling is that this is not so important for the overall dataset because a few 'ridiculous' pieces of data will have no effect on the overall figures with the median-based approach (whereas they can skew the present mean-based figures).
If it was possible, it would be really interesting to get the views of someone who has actually worked in this area for research i.e. statistical analysis of population data. I am sure this is all very well-worn ground for those in the know. If anyone is reading this and has the knowledge or knows where to get it, please chime in!
Someone posted on Facebook that the tracker was showing a very tiny weekly loss for women aged 25-35 who were overweight. I checked and it came out at 0.06 kg/week which doesn't sound right. Looking at the pie charts it looks like one person at least contributing data in this sub group has been on the diet for over a year and I wondered if perhaps that was a mistake (i.e. starting date 7/1/12 entered instead of 7/1/13). Don't know if there's a way of checking this? Would have to identify the user and pm then to check I suppose. Alternatively if we could filter by time on diet too that would help.
hoping to make those additions including a graph of medians in the next couple of days. will also set the tracker up to default to todays date for the start date in the hope it will prevent users selecting the wrong year.
have put more checks in place on tracker signup to ensure bad data us minimised. will also add this to the data update page.
have put more checks in place on tracker signup to ensure bad data us minimised. will also add this to the data update page.
Moogie wrote: hoping to make those additions including a graph of medians in the next couple of days...
Wow! I'm well impressed! Thank you.
Well, I did only say *hoping* Might end up taking me longer, but that post of yours was really helpful so I already have the code planned out in my head more or less.
Just one thing, if we're calculating the median daily loss as per your post, should I then multiply by 7 to get the median weekly loss? I think that will chart better than daily medians, better looking numbers. Makes so much sense to divide the loss by the number of days since the previous weigh in, that didn't even occur to me and I was wondering how it could be calculated when everyone enters weigh ins at different intervals!
Just one thing, if we're calculating the median daily loss as per your post, should I then multiply by 7 to get the median weekly loss? I think that will chart better than daily medians, better looking numbers. Makes so much sense to divide the loss by the number of days since the previous weigh in, that didn't even occur to me and I was wondering how it could be calculated when everyone enters weigh ins at different intervals!
Don't worry I won't hold you to the 'couple of days'
In my post I was thinking about how the data would be calculated internally not about how it would be displayed. I then thought (after I posted) that the easiest might be to record weight changes in the dataset in grams per day (or are they grammes? It's getting too late to think straight...), as integers, because it requires less storage and would be much easier/faster for the sorting algorithm than using decimals.
But I agree when it comes to displaying the data it makes more sense to display losses (or gains!) per week in kg or lbs, because that is how we all seem to talk about them.
Whether you want to multiply by 7 as you store the data or as you display is mostly a matter of what is convenient for you, I think!
Good luck and happy fasting tomorrow (for many of us here I think!)
In my post I was thinking about how the data would be calculated internally not about how it would be displayed. I then thought (after I posted) that the easiest might be to record weight changes in the dataset in grams per day (or are they grammes? It's getting too late to think straight...), as integers, because it requires less storage and would be much easier/faster for the sorting algorithm than using decimals.
But I agree when it comes to displaying the data it makes more sense to display losses (or gains!) per week in kg or lbs, because that is how we all seem to talk about them.
Whether you want to multiply by 7 as you store the data or as you display is mostly a matter of what is convenient for you, I think!
Good luck and happy fasting tomorrow (for many of us here I think!)
I didn't think you meant to display it in grams per day Was just making sure that the idea of multiplying by 7 to get the weekly amount wasn't somehow going to mess up the median drastically!
Really want to start coding it now, but I've left it a bit late tonight and mustn't get my brain too thinky at bedtime. So, instead I've done something else useful and PM'd the half dozen or so users whose start dates were listed as pre August last year to confirm whether the info is correct or not.
Really want to start coding it now, but I've left it a bit late tonight and mustn't get my brain too thinky at bedtime. So, instead I've done something else useful and PM'd the half dozen or so users whose start dates were listed as pre August last year to confirm whether the info is correct or not.
Personally I prefer the idea of holding a 'real' figure of (say) grams lost per day in the dataset because if one is ever looking at the raw data (which only you would, I guess) it means something real. But it doesn't affect the median calculation if the underlying figures are all multiplied by 7 or indeed any number you like.
Apart from the median, it would also be possible to get a standard deviation from the dataset and I think that would give a measure of how reliable it was. Very, I think, once it is working with 100+ population and lots of dates. Even though some of the individual figures may be wrong, because they are outliers.
Time to stop thinking and start sleeping...
Apart from the median, it would also be possible to get a standard deviation from the dataset and I think that would give a measure of how reliable it was. Very, I think, once it is working with 100+ population and lots of dates. Even though some of the individual figures may be wrong, because they are outliers.
Time to stop thinking and start sleeping...
Good plan. Sleepy time I'm not going to try to remember what a standard deviation is but you can tell me all about it tomorrow!
I won't actually be storing the grams lost per day data I don't think, just generating it on the fly - otherwise I'll have to set it up to recheck the data periodically in case anyone has updated/amended a set of weigh in stats.
Bedtime now. All this stuff can keep me nice and busy on my fast tomorrow if work doesn't send much through (bet they will now)
I won't actually be storing the grams lost per day data I don't think, just generating it on the fly - otherwise I'll have to set it up to recheck the data periodically in case anyone has updated/amended a set of weigh in stats.
Bedtime now. All this stuff can keep me nice and busy on my fast tomorrow if work doesn't send much through (bet they will now)
Hi
I don't really need to post a reply. But I cannot seem to PM Moogie. Yes I was one of those that put in the wrong date!!!
Sorry.
I don't really need to post a reply. But I cannot seem to PM Moogie. Yes I was one of those that put in the wrong date!!!
Sorry.
Thanks for letting me know! I don't think it lets users who haven't posted yet send PMs, in case they're spammers. You should be able to message now. I'll fix the year on your start date for you!
158 posts
Page 6 of 11
Similar Topics |
---|
Who is online
Users browsing this forum: No registered users and 1 guest