Post Reply 
Any Statistics/Maths Nerds out there? (SleepyHead)
Author Message
jedimark Offline

Moderator - SleepyHead

Posts: 263
Joined: Mar 2012

Machine: ResMed AirSense 10
Mask Type: Nasal pillows
Mask Make & Model: Swift FX
Humidifier: H5i & Climateline
CPAP Pressure: 7-16
CPAP Software: SleepyHead

Other Comments: Author of SleepyHead, sleepyhead*AT*jedimark.net

Sex: Male
Location: Bundaberg, Australia

Post: #1
Any Statistics/Maths Nerds out there? (SleepyHead)
An old problem to do with summary only data has raised it's ugly head, and it's time I find a better way to deal with it in SleepyHead.

This doesn't just affect ResMed machines, but for simplicity sake, I'm just referencing ResMed data here.

STR.edf provides a table of summary data for each day, with up to 1 years worth of information.

For each day, it provides 50th Percentile, 95th Percentile and Maximum statistics for Leak, Pressure and various other data channels.

For days where no PRD/BRP .edf files are available, this is all the data there is available for these channels.

Each STR.edf record also provides a list of mask on/off times which can be used to rebuild a days session times. (close sessions are merged in an annoying way, but it's enough)

SleepyHead uses a per session model to store data indexes, and not per day. It's rather difficult to make data only available once-per-day data fit SleepyHead's per session storage model.. Where data isn't available to recalculate indexes correctly, I have to use a hack to divide the percentile figures between sessions, so the daily calculation gives back the correct answer.

In SleepyHead statistics page, in each calculation where it looks over a time period, it calculates percentiles without having to read the entire data using a daily sum of an index containing a frequency count of each possible value per session (and time weights).

For summary only days, it is impossible to generate this frequency data, because only the median, 95th and max is known.. If I had access to minimum values for all channels (some are obviously zero), I do know curve fitting algorithms exist that could help generate a rough estimate, but that's a messy solution I'd rather avoid.

So statistics page summaries are kind of inaccurate where summary only data is present until something is done about this issue.

This stuff is the reason I had to lock ResMed data's day splitting down to Noon split without close session sorting. Otherwise the way I divide the daily values between sessions would fail and give incorrect results. I know no other way to do this. :/

I'm half wondering what cheat ResScan uses, and I say cheat, because I highly doubt the same minds that thought up ResMed's summary file format mess could achieve a classy solution to this.

If any maths gurus can grok what I mean, and have any suggestions on this, I'd be very grateful.

I don't expect to find a perfect solution to this problem.. what I'm looking for is a compromise that the majority of users will be happy with. :-/
06-30-2014 07:17 AM
Find all posts by this user Post Reply Quote this message in a reply

Donate to Apnea Board
Sleepster Offline
Wiki Editor
Moderators

Posts: 4,989
Joined: Feb 2012

Machine: ResMed AirCurve10 VAuto
Mask Type: Full face mask
Mask Make & Model: F&P Simplus
Humidifier: HumidAir and SlimLine Hose
CPAP Pressure: MaxI 13.6 | MinE 5.2 | PS 4.4
CPAP Software: ResScan SleepyHead

Other Comments: Diagnosed Nov 2011. Conquered aerophagia.

Sex: Male
Location: Houston, Texas

Post: #2
RE: Any Statistics/Maths Nerds out there?
I'm not sure I understand the problem, but I'll take a stab at what I think might be a solution. If I've missed the mark completely just let me know. It won't be the first time I've found a wrong solution to the wrong problem. Wink

Let's say you have three sessions:

Session 1 lasts 3 hours and has a 95th percentile of 8.
Session 2 lasts 4 hours and has a 95th percentile of 5.
Session 3 lasts 2 hours and has a 95th percentile of 9.

We take a weighted average:

3*8 + 4*5 + 2*9
3 + 4 + 2

24 + 20 + 18
3 + 4 + 2

62
9

For an overall 95th percentile of about 6.9.

Sleepster
Apnea Board Moderator
www.ApneaBoard.com


INFORMATION ON APNEA BOARD FORUMS OR ON APNEABOARD.COM SHOULD NOT BE CONSIDERED AS MEDICAL ADVICE. ALWAYS SEEK THE ADVICE OF A PHYSICIAN BEFORE SEEKING TREATMENT FOR MEDICAL CONDITIONS, INCLUDING SLEEP APNEA. INFORMATION POSTED ON THE APNEA BOARD WEB SITE AND FORUMS ARE PERSONAL OPINION ONLY AND NOT NECESSARILY A STATEMENT OF FACT.
06-30-2014 09:45 AM
Find all posts by this user Post Reply Quote this message in a reply
Sleepster Offline
Wiki Editor
Moderators

Posts: 4,989
Joined: Feb 2012

Machine: ResMed AirCurve10 VAuto
Mask Type: Full face mask
Mask Make & Model: F&P Simplus
Humidifier: HumidAir and SlimLine Hose
CPAP Pressure: MaxI 13.6 | MinE 5.2 | PS 4.4
CPAP Software: ResScan SleepyHead

Other Comments: Diagnosed Nov 2011. Conquered aerophagia.

Sex: Male
Location: Houston, Texas

Post: #3
RE: Any Statistics/Maths Nerds out there?
Also, let me know if you want me to move this thread to the main forum where more nerds might see it. Smile

Sleepster
Apnea Board Moderator
www.ApneaBoard.com


INFORMATION ON APNEA BOARD FORUMS OR ON APNEABOARD.COM SHOULD NOT BE CONSIDERED AS MEDICAL ADVICE. ALWAYS SEEK THE ADVICE OF A PHYSICIAN BEFORE SEEKING TREATMENT FOR MEDICAL CONDITIONS, INCLUDING SLEEP APNEA. INFORMATION POSTED ON THE APNEA BOARD WEB SITE AND FORUMS ARE PERSONAL OPINION ONLY AND NOT NECESSARILY A STATEMENT OF FACT.
06-30-2014 09:46 AM
Find all posts by this user Post Reply Quote this message in a reply

Donate to Apnea Board
diamaunt Offline

Members-b

Posts: 436
Joined: May 2012

Machine: S9 VPAP Auto 36006
Mask Type: Full face mask
Mask Make & Model: mirage quattro/pilairo q
Humidifier: none
CPAP Pressure: 9-20ish
CPAP Software: SleepyHead

Other Comments:

Sex: Male
Location: texas

Post: #4
RE: Any Statistics/Maths Nerds out there?
(06-30-2014 09:46 AM)Sleepster Wrote:  Also, let me know if you want me to move this thread to the main forum where more nerds might see it. Smile
not just nerds, MATH nerds.. (hello robysue Wink)
06-30-2014 11:34 AM
Find all posts by this user Post Reply Quote this message in a reply
jedimark Offline

Moderator - SleepyHead

Posts: 263
Joined: Mar 2012

Machine: ResMed AirSense 10
Mask Type: Nasal pillows
Mask Make & Model: Swift FX
Humidifier: H5i & Climateline
CPAP Pressure: 7-16
CPAP Software: SleepyHead

Other Comments: Author of SleepyHead, sleepyhead*AT*jedimark.net

Sex: Male
Location: Bundaberg, Australia

Post: #5
RE: Any Statistics/Maths Nerds out there?
I probably should have posted this in main.. :-}
06-30-2014 11:45 AM
Find all posts by this user Post Reply Quote this message in a reply

Donate to Apnea Board
jedimark Offline

Moderator - SleepyHead

Posts: 263
Joined: Mar 2012

Machine: ResMed AirSense 10
Mask Type: Nasal pillows
Mask Make & Model: Swift FX
Humidifier: H5i & Climateline
CPAP Pressure: 7-16
CPAP Software: SleepyHead

Other Comments: Author of SleepyHead, sleepyhead*AT*jedimark.net

Sex: Male
Location: Bundaberg, Australia

Post: #6
RE: Any Statistics/Maths Nerds out there?
(06-30-2014 09:45 AM)Sleepster Wrote:  I'm not sure I understand the problem, but I'll take a stab at what I think might be a solution. If I've missed the mark completely just let me know. It won't be the first time I've found a wrong solution to the wrong problem. Wink

Let's say you have three sessions:

Session 1 lasts 3 hours and has a 95th percentile of 8.
Session 2 lasts 4 hours and has a 95th percentile of 5.
Session 3 lasts 2 hours and has a 95th percentile of 9.

We take a weighted average:

3*8 + 4*5 + 2*9
3 + 4 + 2

24 + 20 + 18
3 + 4 + 2

62
9

For an overall 95th percentile of about 6.9.

I use this method already in the Overview graphs legend calculations, which need to be calculated quickly within a single frame. It sort of feels like cheating.

This method does not give the same result as when you combine all the samples overall for those 3 sessions, for a particular data channel, rank them in order (according to duration weights where necessary) and take the sample closest to the 95.0th percentile.

I guess the question arises, does the average person really give a crap whether it's a true percentile given, or a weighted average for a multi-day time period?

Does it still give a statistically as valid answer to someone who wants to know what their 95th pressure was for example, a 3 month period?

Perhaps the weighted average is what the user really wants to see in the statistics page?

Perhaps I'm overthinking it... but I just don't want it to intentionally give a "wrong" answer. :/
(This post was last modified: 06-30-2014 12:21 PM by jedimark.)
06-30-2014 12:20 PM
Find all posts by this user Post Reply Quote this message in a reply
SuperSleeper Offline

Administrators

Posts: 9,961
Joined: Feb 2012

Machine: PR System One REMstar Auto (DS560)
Mask Type: Nasal pillows
Mask Make & Model: ResMed Mirage Swift II
Humidifier: none
CPAP Pressure: 12.5 - 18.5 cmH20 (auto range)
CPAP Software: SleepyHead

Other Comments: Have diabetes Type II

Sex: Male
Location: Illinois, USA

Post: #7
RE: Any Statistics/Maths Nerds out there?
(06-30-2014 11:45 AM)jedimark Wrote:  I probably should have posted this in main.. :-}

Thread has been moved to the Main Forum. Thanks Mark.

SuperSleeper
Apnea Board Administrator
www.ApneaBoard.com


INFORMATION ON APNEA BOARD FORUMS OR ON APNEABOARD.COM SHOULD NOT BE CONSIDERED AS MEDICAL ADVICE. ALWAYS SEEK THE ADVICE OF A PHYSICIAN BEFORE SEEKING TREATMENT FOR MEDICAL CONDITIONS, INCLUDING SLEEP APNEA. INFORMATION POSTED ON THE APNEA BOARD WEB SITE AND FORUMS ARE PERSONAL OPINION ONLY AND NOT NECESSARILY A STATEMENT OF FACT.

06-30-2014 01:42 PM
Find all posts by this user Post Reply Quote this message in a reply

Donate to Apnea Board
robysue Offline
Wiki Editor
Advisory Members

Posts: 1,226
Joined: Oct 2013

Machine: PR Dreamstation BiPAP Auto
Mask Type: Nasal pillows
Mask Make & Model: Swift FX
Humidifier: PR Dreamstation humidfier
CPAP Pressure: min EPAP = 4; max IPAP = 9;
CPAP Software: SleepyHead EncoreBasic EncorePro

Other Comments: Papping since September 2010

Sex: Female
Location: Buffalo, NY

Post: #8
RE: Any Statistics/Maths Nerds out there? (SleepyHead)
(06-30-2014 11:34 AM)diamaunt Wrote:  
(06-30-2014 09:46 AM)Sleepster Wrote:  Also, let me know if you want me to move this thread to the main forum where more nerds might see it. Smile
not just nerds, MATH nerds.. (hello robysue Wink)

The problem JediMark is facing is that ResMed itself is doing something really kludgy with the statistics AND its kudge is not documented.

From a math point of view, there's just no sound mathematical way of approximating the 95% percentile and median (50%) of a large data set composed of several disjoint subsets of data when all you know is the 95%, median, and the size of each of the subsets.

Yes, you can find a weighted average of the 95% percentiles (or the medians) and hope that it's "close enough". (And for many data sets, that estimate may indeed be close enough.) But the distribution of the data in each of the subsets is pretty important: If the data sets all look pretty much the same, then averaging the 50% and 95% will probably give you a decent enough estimate for the 50% and 95% of the whole data set. But if the data varies substantially from subset to subset, then some pretty wild things can happen.

As an example: Here's a collection of three data subsests where averaging the percentiles may not be a good idea for estimating the percentiles for the whole set:

Let's suppose we have one data point for each minute, and we have the following data sets

Data Set 1:
  • 3 hours of data (180 data points) with this distribution:
      160 points have value 6 (i.e. 2:40 minutes of data have value 6)
      10 points have value 7
      5 points have value 8
      3 points have value 9
      2 points have value 10
    The median for Data Set 1 is 6, and since 0.95*180 = 171 and the 171st number on our list is a 8, the 95% for Data Set 1 equals 8.

Data Set 2:
  • 2 hours of data (120 data points) with this distribution:
      50 points have value 6 (i.e. 50 minutes of data have value 6)
      60 points have value 7
      3 points have value 8
      4 points have value 9
      3 points have value 10
    The median for Data 2 is 7, and since 0.95*120 = 114 and the 114th number on our list is a 9, the 95% for Data Set 2 is equals 9.

Data Set 3:
  • 3:20 or 3.3333 hours of data (200 data points) with this distribution:
      65 points have value 6 (i.e. 50 minutes of data have value 0)
      30 points have value 7
      25 points have value 8
      20 points have value 9
      12 points have value 10
      12 points have value 11
      10 points have value 12
      10 points have value 13
      10 points have value 14
      6 points have value 15
    The median for Data 3 is 8, and and since 0.95*200 = 190 and the 190th number on our list is an 14, the 95% for Data Set 3 equals 14.

Weighted average of the medians vs. true median of the large data set
There are 180+120+200 = 500 minutes = 8.333 hours in the data.
So the weighted average of the medians is:

Weighted average of the 50% numbers
= (6*3 + 7*2 + 8*3.333)/(3 + 2 + 3.333)
= (18 + 14 + 26.664)/8.333
= 58.664/8.333
= 7.04

But of the 500 numbers in the large data set, 160+50+65 = 275 of them are 6's. Hence the median of the large data set is 6.0. Given the fact that the largest number in the data set is a 15, that means averaging the medians overestimates the size of the true median, perhaps significantly.


Weighted average of the 95% vs. true 95% of the large data set
We compute the weighted average of the 95% as folllows:

Weighted average of the 95% numbers
= (8*3 + 9*2 + 14*3.333)/8.333
= (24 + 18 + 46.662)/8.333
= 88.662/8.333
= 10.64

Now lets find the true 95% for the large data set. Since the large data set contains 500 numbes, the 95% of the large data set is the 475th number on the whole list, and when we aggregate the data for Data Sets 1,2, and 3 together into the one large list, here are the tallies:

Large data set looks like this
  • 8:20 or 8.3333 hours of data (500 data points) with this distribution:
      275 points have value 6 (i.e. 50 minutes of data have value 6)
      100 points have value 7
      33 points have value 8
      27 points have value 9
      17 points have value 10
      12 points have value 11
      10 points have value 12
      10 points have value 13
      10 points have value 14
      6 points have value 15

And the 475th number on this list is a 13, so the 95% for the whole, large data set is 13.

And when we compare the difference between:
    Weighted average of the 95% numbers = 10.64
and
    True 95% for large data set = 13
we've clearly got a problem using the weighted average of the 95% numbers as an estimate for the 95% number since the weighted average of the 95% numbers seriously underestimates the true 95% for the large set.

A final comment: Is this kind of an example even relevant to CPAP data? Well the answer to that question is really to consider why averaging the 95% numbers for these three data sets fails so miserably at finding the 95% for the whole data set: The problem is that the upper tail of data (the highest data numbers) all belong to ONE of the data subsets.

One situation where this can occur is in the pressure curves for an APAP: If some sessions have little or no supine (or REM) sleep, the pressure may never get very high. But if one session contains a significant amount of REM or supine sleep, the pressure numbers for that session maybe high enough where the 95% for the entire night is only reached during that session. And averaging the 95% for the individual sessions may wind up underestimating the 95% for the whole night, perhaps significantly.

Other strategies?
You can try to fix the "weighted average of the percentiles" by taking into account that you do have both the 50% and 90% numbers for each session (subset of data). And the max. (but not the min for some data?) So you have maybe 4 things to try to work with in coming up with a "fit" model for the missing data. That's not really much to work with given the size of the data sets.

And that's why JediMark is stuck.

Unfortunately, I'm not a statistician. So I really don't know what they do when faced with this kind of a situation.

The real question is just how does a Resmed machine calculate these numbers when the SD card is not in place and it's only recording the summary data?

My own guess is they fake it with something that's "good enough" when the data sets are similar (such as a weighted average of averages), and simply don't worry about the fact that this can lead to garbage statistical data when there's some wide variation in a few of the sessions ..

Questions about SleepyHead?
See my Guide to SleepyHead
(This post was last modified: 06-30-2014 04:50 PM by robysue.)
06-30-2014 03:03 PM
Find all posts by this user Post Reply Quote this message in a reply
Sleepster Offline
Wiki Editor
Moderators

Posts: 4,989
Joined: Feb 2012

Machine: ResMed AirCurve10 VAuto
Mask Type: Full face mask
Mask Make & Model: F&P Simplus
Humidifier: HumidAir and SlimLine Hose
CPAP Pressure: MaxI 13.6 | MinE 5.2 | PS 4.4
CPAP Software: ResScan SleepyHead

Other Comments: Diagnosed Nov 2011. Conquered aerophagia.

Sex: Male
Location: Houston, Texas

Post: #9
RE: Any Statistics/Maths Nerds out there?
(06-30-2014 12:20 PM)jedimark Wrote:  Perhaps I'm overthinking it... but I just don't want it to intentionally give a "wrong" answer. :/

Well, I guess you could call it the weighted average of the percentiles so your conscience wouldn't bother you. Smile

Let's see if I've at least stated the problem correctly:

Quote:Let's say you have three sessions:

Session 1 lasts 3 hours and has a 95th percentile of 8.
Session 2 lasts 4 hours and has a 95th percentile of 5.
Session 3 lasts 2 hours and has a 95th percentile of 9.

What is the 95th percentile for the entire 9-hour period that spans those three sessions?

Solution
During the 1st session 95% of the time the readings were below 8. 95% of 3 hours is 2.85 hours.

During the 2nd session 95% of the time the readings were below 5. 95% of 4 hours is 3.8 hours.

During the 3rd session 95% of the time the readings were below 9. 95% of 2 hours is 1.9 hours.

So for 2.85 hours the readings were below 8.
For 3.8 hours the readings were below 5.
For 1.9 hours the readings were below 9.

2.85 + 3.8 + 1.9 = 8.55 hours, which is of course simply 95% of the total time of 9 hours.

I don't believe there's any way to solve this. I would ask for partial credit but I believe my professor would prefer to see a proof of the fact that the solution doesn't exist.

Let's see, what are the professor's office hours, and just where can she be found during those hours? She's probably in the lounge taking a nap. I saw a CPAP machine in there earlier.

Sleepster
Apnea Board Moderator
www.ApneaBoard.com


INFORMATION ON APNEA BOARD FORUMS OR ON APNEABOARD.COM SHOULD NOT BE CONSIDERED AS MEDICAL ADVICE. ALWAYS SEEK THE ADVICE OF A PHYSICIAN BEFORE SEEKING TREATMENT FOR MEDICAL CONDITIONS, INCLUDING SLEEP APNEA. INFORMATION POSTED ON THE APNEA BOARD WEB SITE AND FORUMS ARE PERSONAL OPINION ONLY AND NOT NECESSARILY A STATEMENT OF FACT.
06-30-2014 03:04 PM
Find all posts by this user Post Reply Quote this message in a reply

Donate to Apnea Board
robysue Offline
Wiki Editor
Advisory Members

Posts: 1,226
Joined: Oct 2013

Machine: PR Dreamstation BiPAP Auto
Mask Type: Nasal pillows
Mask Make & Model: Swift FX
Humidifier: PR Dreamstation humidfier
CPAP Pressure: min EPAP = 4; max IPAP = 9;
CPAP Software: SleepyHead EncoreBasic EncorePro

Other Comments: Papping since September 2010

Sex: Female
Location: Buffalo, NY

Post: #10
RE: Any Statistics/Maths Nerds out there?
(06-30-2014 03:04 PM)Sleepster Wrote:  
(06-30-2014 12:20 PM)jedimark Wrote:  Perhaps I'm overthinking it... but I just don't want it to intentionally give a "wrong" answer. :/

Well, I guess you could call it the weighted average of the percentiles so your conscience wouldn't bother you. Smile

Let's see if I've at least stated the problem correctly:

Quote:Let's say you have three sessions:

Session 1 lasts 3 hours and has a 95th percentile of 8.
Session 2 lasts 4 hours and has a 95th percentile of 5.
Session 3 lasts 2 hours and has a 95th percentile of 9.

What is the 95th percentile for the entire 9-hour period that spans those three sessions?

Solution
During the 1st session 95% of the time the readings were below 8. 95% of 3 hours is 2.85 hours.

During the 2nd session 95% of the time the readings were below 5. 95% of 4 hours is 3.8 hours.

During the 3rd session 95% of the time the readings were below 9. 95% of 2 hours is 1.9 hours.
NO. You can't say these things.

What you can say is this:

During session 1 the readings were AT or below 8 for 2.85 hours. You don't know if they were BELOW 8 for 2 of those hours and AT 8 for .85 more hours. Or perhaps they were AT 8 for the whole 2.85 hours.

Quiz time:

Let's say Session 1 lasts 3 hours and the readings increase by 0.5 units. Suppose we have the following (known) data:

Pressure is at 6.0 for 0.8 hours.
Pressure is at 6.5 for 0.2 hours.
Pressure is at 7.0 for 0.4 hours.
Pressure is at 7.5 for 0.6 hours.
Pressure is at 8.0 for 1.0 hours.

What's the median, 90%, and 95% levels? And how do you find them?

What if Session 1 lasted 3 hours and the readings increase by 0.5 units, but we have this distribution of the (known) data:

Pressure is at 6.0 for 0.2 hours.
Pressure is at 6.5 for 0.4 hours.
Pressure is at 7.0 for 0.4 hours.
Pressure is at 7.5 for 0.7 hours.
Pressure is at 8.0 for 1.3 hours.

What are the median, 90%, and 95% levels now? And how do you find them?

Finally, what if Session 1 lasted 3 hours and the readings increase by 0.5 units, but we have this distribution of the (known) data:

Pressure is at 6.0 for 0.0 hours.
Pressure is at 6.5 for 0.0 hours.
Pressure is at 7.0 for 0.2 hours.
Pressure is at 7.5 for 1.4 hours.
Pressure is at 8.0 for 1.4 hours.

What are the median, 90%, and 95% levels now? And how do you find them?

I have to go pick up my hubby. Enjoy the quiz.

Questions about SleepyHead?
See my Guide to SleepyHead
06-30-2014 05:09 PM
Find all posts by this user Post Reply Quote this message in a reply
Post Reply 


Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  ResScan Data not showing Statistics SteveC 1 201 08-26-2016 12:49 PM
Last Post: justMongo
  ResScan Data not showing Statistics gbouten 3 296 07-12-2016 10:54 AM
Last Post: gbouten
  SleepyHead Statistics page not showing all data. Rcgop 10 574 07-09-2016 05:17 PM
Last Post: OpalRose
  Sleepyhead CPAP statistics rtbrd 0 301 04-10-2016 09:46 AM
Last Post: rtbrd
  I think there's a problem with IPAP statistics in ResScan Asjb 2 385 03-24-2016 08:17 AM
Last Post: Asjb
  No leak statistics with Sleepyhead jegerd 20 3,727 07-12-2014 11:39 AM
Last Post: diamaunt

Forum Jump:

Who's Online (Complete List)