_R_R_D___P_D_P_C_A_L_C(1)                       rrdtool                      _R_R_D___P_D_P_C_A_L_C(1)

NNAAMMEE
     rrd_pdpcalc - PDP inner calculation logics with an example by Tianpeng Xia

DDEESSCCRRIIPPTTIIOONN
     This  article explains how PDP are calculated in a detailed yet easy-to-un‐
     derstand way, with an example.

RReeffrreesshhiinngg ssoommee bbaassiiccss aabboouutt PPDDPP
   FFuunnddaammeennttaall kknnoowwlleeddggee
     If you have not read the tutorials or man pages either on the official site
     or those by others, then I strongly encourage you to do so.  As said in the
     description, this article will only explain how a PDP  is  calculated,  but
     not  the definition of it.  So please read the following materials to get a
     basic understanding of PDP:

     <http://rrdtool.vandenbogaerdt.nl/process.php> - By Alex van den  Bogaerdt.
     This  article  explained  PDP in a very detailed and clear way, however, it
     does not explain the "normalization process" in  its  "Normalize  interval"
     section  in  the  right way( as opposed to the official version I confirmed
     with @oetiker himself). The flaw can be easily seen in the bar charts, dis‐
     cussed in the "Calculation logics" section.

     <https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html> - This one is on the
     official site. Actually it's the manual page for "rrdcreate",  and  it  re‐
     veals  what's  under  the  hood  with regard to PDP calculation in its "The
     HEARTBEAT and the STEP" section.

     The text graph by Don Baarda provides a vivid explanation  on  how  UUNNKKNNOOWWNN
     data  are  produced  and how heartbeat value can influence in the sampling.
     Unfortunately, it fails to give a clear method by  which  PDPs  are  calcu‐
     lated.

     <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html> - Another detailed
     official tutorial by Alex van den Bogaerdt. Similarly, it only provides ex‐
     amples with data evenly and exactly distributed according to the step set.

     If  you don't like doing experiments or care about the inner mechanics that
     much, you can just stop here and give more attention to more practical top‐
     ics like graph exports or command manual. But if you are the sort of people
     like me who just care as much about the calculation logics, please read on.

CCaallccuullaattiioonn llooggiiccss
     Here begins the core part of this article. In the following content of this
     section, I would like to give two versions of calculation methods,  one  by
     Alex van den Bogaerdt and the other by @eotiker.

     To provide an ASCII-friendly explanation, I will explain both versions with
     the char below instead of a real image.

       |
       |    (v1)
       | _______                        (v4)  (v5)
       | |     |           (v3)        ____________
       | |     |        ______________|     ||   |
       | |     |        |            ||     ||   |
       | |     |        |            ||     ||   |
       | |     |   (v2) |            ||     ||   |
       | |     |________|            ||     ||   |
      --------------------------------------------->
       0 1     3        7            17     20   21

     The  X  axis means time slots( each second denotes one slot) and the Y axis
     means the value.

     Let's make everything a little clearer:

     - The step is 5

     - each PDP gets updated only if a value arrives at or after the  last  slot
     of the PDP, for instance, the last slot of the PDP from 16 to 20 is 20

     -  The heartbeat is 20, so the samples during the entire 7-17 period is not
     discarded

     - At second 3, the first value comes in as v1, and so on

     - Second 0 is the origin, and it does not count as a sample

   BBooggaaeerrddtt vveerrssiioonn
     As    can    be    seen    on    this    page:    <http://rrdtool.vandenbo‐
     gaerdt.nl/process.php>,  after  all  the  primary  data  are transformed to
     rates( except for GAUGE, of course), they have to go through  a  nnoorrmmaalliizzaa‐‐
     ttiioonn  pprroocceessss  if they are not distributed exactly according to the step or
     on well-defined boundaries in time, in the words of the author.

     What does that mean? Basically, if all the kknnoowwnn (as opposed to an  uunnkknnoowwnn
     value)  data  make up at least 50% of all slots during a period, then a PDP
     is calculated from them.

     This version seems to go well until we reach the bar chart part.

     According to the ASCII bar chart, we have the following results:

     From second 1 on, the PDP of each period( 1-5,6-10, ...) is computed by av‐
     eraging all the values within it.

     So: - the PDP from 1 to 5 is (v1*3+v2*2)/5

     - the PDP from 6 to 10 is (v2*2+v3*3)/5

     - the PDP from 11 to 15 is (v3*5)/5, since all the values in slots 11,  12,
     13, 14 and 15 are the same, which is v3

     - ...

   TThhee ooffffiicciiaall vveerrssiioonn(( aallssoo @@ooeettiikkeerr vveerrssiioonn))::
     Using the same chart, this version suggests the following:

     - the PDP from 1 to 5 is (v1*3+v2*2)/5

     - the PDPs from 6 to 10 and 11 to 15 are the SSAAMMEE, which is (v2*2+v3*8)

     - ...

   AA CCoommppaarriissoonn aanndd ssoommee eexxppllaannaattiioonn
     So  we  have seen the above two versions and their PDPs from 6 to 10 and 11
     to 15 do not comply with each other.

     Why is that?

     Because the difference between the official version  and  Bogaerdt  version
     stems from the way they do the calculation for PDP(6-10) and PDP(11-15).

     Let's discuss this in more detail using the above bar chart.

     _B_o_g_a_e_r_d_t_'_s _v_e_r_s_i_o_n_,

     PDPs are aallwwaayyss ccoommppuutteedd iinnddiivviidduuaallllyy no matter how values arrive.

     For  example, the value at slot 17 comes after the last slot of PDP(11-15).
     Also, the immediate previous value before slot 17 is at 7.  All  the  slots
     from  7  to  17  are  assigned v3. Since each PDP is computed individually,
     PDP(6-10) is (v2*2+v3*3)/5 while the PDP(11-15) is (v3*5)/5.

     _T_h_e _o_f_f_i_c_i_a_l _v_e_r_s_i_o_n

     PDPs are aallwwaayyss ccoommppuutteedd iinn tteerrmmss ooff tthhee sstteeppss wwhhiicchh tthhee nneexxtt uuppddaattee ssppaannss,
     be it 1 step, 2 steps or n steps; in other words, PDPs may be computed  ttoo‐‐
     ggeetthheerr.

     For  example,  the update at slot 17 spans PDP(6-10) and PDP(11-15) because
     the iimmmmeeddiiaattee previous value is at 7 and 7 is within 6 and 10 , and  17  is
     after 15. PDP(1-5) and PDP(16-20) are not included since the update at slot
     7 has already triggered the calculation for PDP(1-5) and the update at slot
     17 comes before the last slot of PDP(16-20) which is 20.

     That's  the  reason  why  PDP(6-10)  and  PDP(11-15)  have  the same value,
     (v2*2+v3*8).

AAnn eexxaammppllee
     If you are still confused, don't worry, an example is here to help you.

     Let's get our hands dirty with some commands

      rrdtool create target.rrd --start 1000000000  --step 5 DS:mem:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:10
      rrdtool update target.rrd 1000000003:8 1000000006:1 1000000017:6 \
      1000000020:7 1000000021:7 1000000022:4 \
      1000000023:3 1000000036:1 1000000037:2 \
      1000000038:3 1000000039:3 1000000042:5
      rrdtool fetch target.rrd AVERAGE --start 1000000000 --end 1000000045

     Basically, the above codes contain 3 commands: create,  update  and  fetch.
     First  create  a  new  rrd  file, and then we feed in some data and last we
     fetch all the PDPs from the rrd.

   FFooccuuss oonn ssiinnggllee sstteeppss
     In order to provide a detailed explanation, each the calculation process of
     each PDP is provided.

     Below is the output of the commands above:

      1000000005: 5.2000000000e+00
      1000000010: 5.5000000000e+00
      1000000015: 5.5000000000e+00
      1000000020: 6.6000000000e+00
      1000000025: 1.7333333333e+00
      1000000030: 1.7333333333e+00
      1000000035: 1.7333333333e+00
      1000000040: 2.8000000000e+00
      1000000045: nan
      1000000050: nan

     NOTE: 1000000005 means the PDP from 1000000001 to 1000000005,  and  so  on.
     For  concision  and readability, we use only the last two digits, so 05 de‐
     notes 1000000005. We choose the type of the data source  as  gauge  because
     original  values  will be treated as rates, no additional transformation is
     needed, see this article <http://rrdtool.vandenbogaerdt.nl/process.php> for
     detail.

     05: 5.2 = (8*3+1*2)/5

     10: 5.5 = (1*1+6*9)/10

     15: the same as the previous one

     20: 6.6 = (6*2+7*3)/5

     25: 1.73333 = (7+4+3+1*12)/15

     ...

     45: nan, as the last value is at 42,which does not trigger the  calculation
     for PDP(41-45)

     50:  nan,  why  this  unknown  PDP  is  shown  is explained in this article
     <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html>

SSUUMMMMAARRYY
     All that said, I hope you get a clear understanding of the  inner  calcula‐
     tion "magic" for PDPs.

   OOtthheerr RReeffeerreenncceess
     •   A  great  PowerShell  shell  script  for  generating  ASCII bar charts:
         <https://gallery.technet.microsoft.com/scriptcenter/Sam‐
         ple-Script-to-Generate-59c80d4c>

     •   <https://stackoverflow.com/questions/18924450/rrd-wrong-values>

1.10.0                             2026-05-23                     _R_R_D___P_D_P_C_A_L_C(1)
