Getting Started

Computing the Cubage

The cubage function calculates the volume of a tree (cubage) using different cubing methods. It can handle both single trees and multiple trees in a dataset. The available cubing methods are Smalian, Newton and Huber.

Cubing a Simple Tree

When cubing a single tree, you need to provide vectors of diameters (d) and heights (h) measured at different points along the tree stem. Diameters should be in centimeters, and heights should be in meters. The diameter at breast height (DBH) and total tree height (Ht) are essential inputs.

using ForestMensuration

# Diameters at different heights (cm)
d = [9.0, 7.0, 5.8, 5.1, 3.8, 1.9, 0.0]

# Corresponding heights (m)
h =  [0.3, 1.3, 3.3, 5.3, 7.3, 9.3, 10.8]

# Calculate cubage using the Smalian method
cubage(Smalian, h, d)
1×11 DataFrame
Rowvtv0vcvrvndbhhthcaffnffqf
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
10.02292540.001908520.02087510.00.0001417647.010.89.30.5515780.4882670.719286


  • vt: Total volume
  • v0: Volume of the stump
  • vc: Commercial bole volume
  • vr: Residual volume above commercial limit
  • vn: Volume of the top (cone)
  • dbh: Diameter at breast height
  • ht: Total tree height
  • hc: Commercial height
  • aff: Artificial form factor
  • nff: Natural form factor
  • qf: Form quotient


Including Bark Thickness

With the bark thickness value, it is possible to calculate the bark factor and total and commercial volumes without bark. Note: the provided thickness should be the 'single thickness' in centimeters. The function will convert it into 'double thickness'.

# Bark thickness at corresponding heights (cm)
bark = [0.9, 0.5, 0.3, 0.2, 0.2, 0.1, 0.0]

# Define a commercial diameter limit
diameter_limit = 4.0

# Calculate cubage using the Newton method, including bark thickness and diameter limit
cubage(Newton, h, d, bark, diameter_limit)
1×17 DataFrame
Rowvtv0vcvrvndbhhthcaffnffqfkvtwbv0wbvcwbvrwbvnwb
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
10.01302330.001908520.01097310.00.0001417647.010.86.992310.3133380.2773720.7192860.9325150.01132490.001659620.009542010.00.000123276


Additional columns include:

  • k: Bark factor
  • : vtwb, v0wb, vcwb, vrwb, vnwb: Corresponding volumes without bark

Cubing Multiple Trees

To calculate cubage for multiple trees, organize your data in a DataFrame with columns for tree identifiers, heights, diameters, and optionally bark thickness.

using ForestMensuration
using DataFrames

# Sample data for multiple trees
data = DataFrame(
    tree = [148, 148, 148, 148, 148, 148, 148, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222],
    h = [0.3, 1.3, 3.3, 5.3, 7.3, 9.3, 10.8, 0.3, 1.3, 3.3, 5.3, 7.3, 9.3, 11.3, 13.3, 15.3, 17.3, 19.5],
    d = [9.0, 7.0, 5.8, 5.1, 3.8, 1.9, 0.0, 16.0, 12.0, 11.6, 10.1, 9.4, 8.2, 7.0, 6.0, 4.0, 2.0, 0.0],
    bark = [0.9, 0.5, 0.3, 0.2, 0.2, 0.1, 0.0, 1.2, 0.5, 0.3, 0.3, 0.2, 0.2, 0.3, 0.0, 0.0, 0.0, 0.0]
)

# Define a commercial diameter limit
diameter_limit = 2.5

# Calculate cubage for each tree using the Huber method
cubage(Huber, :tree, :h, :d, data, diameter_limit)
2×12 DataFrame
Rowtreevtv0vcvrvndbhhthcaffnffqf
Int64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
11480.0233530.001908520.02069840.0006043640.0001417647.010.88.668420.5618670.4973750.719286
22220.1065030.006031860.09939210.000848230.00023038312.019.516.80.4829180.4935540.660833


Including Bark Thickness for Multiple Trees

Additionally, bark thickness values can be provided to calculate bark factors and volumes without bark.

# Calculate cubage including bark thickness
cubage(Huber, :tree, :h, :d, :bark, data, diameter_limit)
2×18 DataFrame
Rowtreevtv0vcvrvndbhhthcaffnffqfkvtwbv0wbvcwbvrwbvnwb
Int64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
11480.0233530.001908520.02069840.0006043640.0001417647.010.88.668420.5618670.4973750.7192860.9325150.02030740.001659620.0179990.0005255460.000123276
22220.1065030.006031860.09939210.000848230.00023038312.019.516.80.4829180.4935540.6608330.9652380.09922670.005619780.0926020.0007902820.000214644

Fitting Linear Regressions

The regression function automatically generates and evaluates multiple regression models based on the provided data. It explores various transformations of the dependent and independent variables, creating a comprehensive set of models for analysis.

Adjusting a Hypsometric Relationship

In forestry, hypsometric relationships model the relationship between tree height (h) and diameter at breast height (dbh). The regression function generates numerous models to find the best fit.

using ForestMensuration
using DataFrames

# Sample dataset with tree heights and diameters
data = DataFrame(
    plot = ["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
            "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
            "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D"],
    h = [20.9, 19.6, 13.2, 23.3, 19.2, 16.2, 8.3, 19.7, 11.0, 24.0, 25.8, 28.2, 24.2, 26.2, 28.3,
         14.4, 14.9, 15.6, 8.2, 22.1, 16.7, 22.3, 19.5, 15.9, 16.7, 24.5, 21.7, 23.8,
         20.8, 17.7, 19.3, 16.7, 22.2, 18.6, 6.9, 22.3, 8.7, 22.1, 21.0, 23.5,
         19.5, 19.7, 18.2, 13.9, 12.3, 14.5, 12.3, 18.6, 18.0, 17.4, 24.3, 22.8, 23.2, 23.5, 25.2],
    dbh = [31.5, 30.0, 26.5, 31.0, 29.0, 26.5, 14.5, 28.8, 19.0, 31.5, 32.5, 33.8, 32.5, 33.3, 36.0,
           24.0, 28.0, 23.0, 15.5, 31.0, 27.0, 29.0, 28.0, 26.0, 29.0, 30.0, 29.0, 30.5,
           25.0, 26.8, 27.5, 26.0, 26.0, 25.8, 10.8, 27.0, 16.5, 26.5, 27.0, 26.3,
           26.0, 25.5, 25.0, 23.5, 22.0, 23.0, 23.0, 26.0, 25.5, 27.5, 26.5, 26.5, 27.8, 26.0, 27.0]
)

# Perform regression analysis between height and diameter
models = regression(:h, :dbh, data);
# Alternative print of fitted models
models_eq = ModelEquation.(models)
492-element Vector{ModelEquation}:
 h = -38.303464 - 0.00127861 * dbh ^ 2 + 5.17003 * log(dbh) ^ 2 + 1893.09 * dbh ^ -2
 h = -24.957989 + 1.99422 * dbh - 0.0155284 * dbh ^ 2 + 1427.45 * dbh ^ -2
 h = 33.797445 - 374.073 * dbh ^ -1
 h = -121.169643 + 39.5447 * log(dbh) + 220.379 * dbh ^ -1 + 1600.96 * dbh ^ -2
 h = 57.276168 - 1334.96 * dbh ^ -1 + 8652.01 * dbh ^ -2
 h = -46.598571 + 20.1454 * log(dbh)
 h = -87.32823 + 0.00156365 * dbh ^ 2 + 31.0905 * log(dbh) + 2359.99 * dbh ^ -2
 h = 199.7007 - 2.22874 * dbh - 160.35 * log(dbh) + 37.6057 * log(dbh) ^ 2
 h = -159.803271 + 48.056 * log(dbh) + 564.396 * dbh ^ -1
 h = 14.630413 + 7.59575 * dbh - 0.0547852 * dbh ^ 2 - 14.7528 * log(dbh) ^ 2
 ⋮
 dbh ^ 2 / (h - 1.3) = -196.548669 + 13.2052 * log(dbh) ^ 2 + 3216.97 * dbh ^ -1 - 18169.5 * dbh ^ -2
 dbh ^ 2 / (h - 1.3) = 29.509715 + 0.0156314 * dbh ^ 2
 dbh ^ 2 / (h - 1.3) = 19.926992 + 0.788533 * dbh
 dbh ^ 2 / (h - 1.3) = 35.023912 + 1.29219 * log(dbh) ^ 2 - 202.01 * dbh ^ -1
 dbh ^ 2 / (h - 1.3) = -16.994315 + 0.000112617 * dbh ^ 2 + 17.7219 * log(dbh)
 dbh ^ 2 / (h - 1.3) = 47.295535 + 0.00514874 * dbh ^ 2 - 258.43 * dbh ^ -1
 dbh ^ 2 / (h - 1.3) = 157.069799 - 5.12466 * dbh + 0.0801834 * dbh ^ 2 - 975.019 * dbh ^ -1
 dbh ^ 2 / (h - 1.3) = -437.04287 + 0.0645113 * dbh ^ 2 + 341.086 * log(dbh) - 63.8074 * log(dbh) ^ 2
 dbh ^ 2 / (h - 1.3) = 713.371614 + 5.46491 * dbh - 221.457 * log(dbh) - 2429.6 * dbh ^ -1

This generates 240 different models combining various transformations of h and dbh.

#number of fitted regressions
length(models)
492

Regression Selection Criteria

After fitting the models, you can evaluate and rank them based on specific criteria using the criteria_table function.

# Evaluate models based on all criteria
best_models = criteria_table(models)
10×9 DataFrame
Rowmodelrankadjr2syxrmsemaeaicnormalitysignificance
ModelEqu…Int64Float64Float64Float64Float64Float64Float64Float64
1h = -35.332287 + 4.82761 * log(dbh) ^ 2 + 1744.93 * dbh ^ -270.72967613.9472.583061.9932364.4711.01.0
2h = -96.165886 + 34.0209 * log(dbh) + 2613.1 * dbh ^ -2150.72964913.94772.583192.0052364.4761.01.0
3h = -159.803271 + 48.056 * log(dbh) + 564.396 * dbh ^ -1180.72950513.95142.583881.99526364.5051.01.0
4h = -57.37389 + 5.95902 * log(dbh) ^ 2 + 326.761 * dbh ^ -1230.72903713.96352.586112.00102364.61.01.0
5log1p(h) = 4.926569 - 65.7614 * dbh ^ -1 + 375.735 * dbh ^ -2280.72642914.03052.598532.00519365.1271.01.0
6log(h) = 4.98045 - 69.1379 * dbh ^ -1 + 389.578 * dbh ^ -2340.72540914.05672.603372.01038365.3321.01.0
7log(h) = -1.489449 + 1.34799 * log(dbh)350.72960313.94892.608132.05679367.5331.01.0
8h = 57.276168 - 1334.96 * dbh ^ -1 + 8652.01 * dbh ^ -2420.72480514.07212.606232.08089365.4531.01.0
9log(h) = 0.718653 + 0.111317 * dbh - 0.00105888 * dbh ^ 2450.72376914.09862.611132.04218365.6591.01.0
10log_minus(h - 1.3) = 5.064047 - 73.9976 * dbh ^ -1 + 407.009 * dbh ^ -2480.7235714.10372.612072.02274365.6991.01.0
# Evaluate models based on Adjusted R², Standard Error and chosing the 5 bests
best_5_models = criteria_table(models, :adjr2, :syx, best=5)
5×4 DataFrame
Rowmodelrankadjr2syx
ModelEqu…Int64Float64Float64
1h = -35.332287 + 4.82761 * log(dbh) ^ 2 + 1744.93 * dbh ^ -220.72967613.947
2h = -96.165886 + 34.0209 * log(dbh) + 2613.1 * dbh ^ -240.72964913.9477
3log(h) = -1.489449 + 1.34799 * log(dbh)60.72960313.9489
4h = -159.803271 + 48.056 * log(dbh) + 564.396 * dbh ^ -180.72950513.9514
5log1p(h) = -1.130284 + 1.25502 * log(dbh)100.72933413.9559


Selecting the Best Model

To select the best model based on the combined ranking you can simply use the criteria_selection function:

# Select the top model
top_model = criteria_selection(models)

# View the model equation
ModelEquation(top_model)
h = -35.332287 + 4.82761 * log(dbh) ^ 2 + 1744.93 * dbh ^ -2

Plotting the Regression

You can visualize the regression model using the plot_regression function.

# Plot the top model
plot_regression(top_model)
Example block output


Prediction

The prediction function. allows you to generate predicted values from a regression model on the original scale of the dependent variable. This is particularly useful when the model involves transformations of the dependent variable (e.g., logarithmic transformations). The function automatically applies the appropriate inverse transformations and corrections, such as the Meyer correction factor for logarithmic models.

# Returns the predicted values from the model on the original scale
h_pred = prediction(top_model)
55-element Vector{Float64}:
 23.886535658137483
 22.453059822566104
 18.99950646590383
 23.411978901717674
 21.481320689389754
 18.99950646590383
  7.489654855515589
 21.28547158864369
 11.355391771886561
 23.886535658137483
  ⋮
 15.428102129764762
 18.49503039939057
 17.98822155030051
 20.000749135189402
 18.99950646590383
 18.99950646590383
 20.29897925585609
 18.49503039939057
 19.501465152261932

The prediction! function extends this by adding the predicted values directly to your DataFrame. It creates new columns for the predicted and actual values, combining observed measurements with model predictions where data may be missing. This is especially useful in forest inventory datasets where certain tree attributes might not be measured for every tree, and predictions need to be filled in for these gaps.

# Automatically adds predicted and actual height columns to the provided DataFrame.
# This combines observed heights and predicted heights for trees with missing or unmeasured heights.
prediction!(top_model, data)

# Firsts values of dataset
println(data[1:10, :])
10×5 DataFrame
 Row │ plot    h        dbh      h_prediction  h_real
     │ String  Float64  Float64  Float64?      Float64
─────┼─────────────────────────────────────────────────
   1 │ A          20.9     31.5      23.8865      20.9
   2 │ A          19.6     30.0      22.4531      19.6
   3 │ A          13.2     26.5      18.9995      13.2
   4 │ A          23.3     31.0      23.412       23.3
   5 │ A          19.2     29.0      21.4813      19.2
   6 │ A          16.2     26.5      18.9995      16.2
   7 │ A           8.3     14.5       7.48965      8.3
   8 │ A          19.7     28.8      21.2855      19.7
   9 │ A          11.0     19.0      11.3554      11.0
  10 │ A          24.0     31.5      23.8865      24.0

Adjusting a Qualitative (Dummy) Hypsometric Relationship

If your data includes categorical variables (e.g., different plots or species), you can include them in the regression analysis.

# Perform regression including 'plot' as a categorical variable
qualitative_models = regression(:h, :dbh, data, :plot)

# Select the best model
top_qual_model = criteria_selection(qualitative_models, :adjr2, :syx, :aic)

# View the model equation
ModelEquation(top_qual_model)
1 / √(h - 1.3) = 0.570482 - 0.0440831 * dbh + 0.000398895 * dbh ^ 2 + 0.0534853 * log(dbh) ^ 2 - 0.00397926 * plot: B - 0.025622 * plot: C - 0.0245227 * plot: D


Plotting the Qualitative Regression

# Plot the top qualy model
plot_regression(top_qual_model)
Example block output

Site Classification

Frequency and Statistical Functions

Creating Frequency Tables

The frequency_table function creates frequency distributions for a vector of values, which is useful for analyzing the distribution of diameters or heights in your data.

# Frequency table for diameters using Sturges' formula for class intervals
frequency_table(data.dbh)
6×7 DataFrame
RowLIXiLSfiFifriFri
Float64Float64Float64Int64Int64Float64Float64
110.012.515.0111.818181.81818
215.017.520.0345.454557.27273
320.022.525.0263.6363610.9091
425.027.530.0273349.090960.0
530.032.535.0195234.545594.5455
635.037.540.03555.45455100.0


  • LI: Lower class limit
  • Xi: Class center
  • LS: Upper class limit
  • fi: Frequency count
  • Fi: Cumulative frequency
  • fri: Relative frequency (%)
  • Fri: Cumulative relative frequency (%)


Specifying Class Width

You can specify the class width (hi) to customize the intervals.

# Frequency table for heights with class width of 4 meters
frequency_table(data.h, 4)
6×7 DataFrame
RowLIXiLSfiFifriFri
Float64Float64Float64Int64Int64Float64Float64
18.010.012.0447.272737.27273
212.014.016.0599.0909116.3636
316.018.020.0122121.818238.1818
420.022.024.0143525.454563.6364
524.026.028.0175230.909194.5455
628.030.032.03555.45455100.0

Calculating Dendrometric Averages

he dendrometric_averages function computes various dendrometric averages of a forest stand, providing insights into the stand structure and growth patterns.

# Calculate dendrometric averages for the dataset
dendrometric_averages(data.dbh, area=0.05)
1×7 DataFrame
Rowd₋dgdwdzd₁₀₀d₊
Float64Float64Float64Float64Float64Float64Float64
121.840126.516426.918227.226.533.6231.1927


  • d₋: Lower Hohenadl's diameter
  • d̄: Mean diameter
  • dg: Quadratic mean diameter
  • dw: Weise's diameter (60th percentile)
  • dz: Diameter of the tree with central basal area
  • d₁₀₀: Mean diameter of the 100 largest trees per hectare (returns NaN if fewer than 100 trees)
  • d₊: Upper Hohenadl's diameter


Estimating Heights Using a Regression Model

If you have a regression model (e.g., top_model from earlier), you can estimate the corresponding heights for each calculated diameter.

# Estimate heights for dendrometric averages using the regression model
dendrometric_averages(top_model, area=0.05)
1×14 DataFrame
Rowd₋dgdwdzd₁₀₀d₊h₋hghwhzh₊h₁₀₀
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
121.840126.516426.918227.226.533.6231.192714.234119.01619.419519.701518.999525.861923.5952


Forest Inventory

In this example, we will perform a forest inventory using the simple_casual_sampling function to conduct simple random sampling. We will use a desired error margin of 10% (default) and present the results in English (default).

using ForestMensuration

# First, define the volumes measured in each sample plot
v = [381.7, 458.9, 468.2, 531.7, 474.1, 401.9, 469.1, 437.4, 435.3, 403.2, 397.1];

#Define the area of each plot
plot_area = 0.05  # hectares

#Define the total area of the forest
total_area = 10  # hectares

# Use the simple_casual_sampling function
simple_casual_sampling(v, plot_area, total_area)
15×3 DataFrame
RowParametersValuesUnits
StringFloat64String
1plot volume441.691m³/0.05ha
2coefficient of variation10.03%
3mean variance168.498(m³/0.05ha)²
4standard error12.9807m³/0.05ha
5absolute error28.9227m³/0.05ha
6relative error6.55%
7hectare volume8833.82m³ha⁻¹
8Total Volume88338.2
9confidence interval lower82553.6
10confidence interval upper94122.7
11population0.945finite
12measured plots11.0n
13required plots7.0n
14missing plots0.0n
15possible plots200.0N


You can also set error threshold and change language

# Setting the error to 5% and the language to Portuguese-BR
simple_casual_sampling(v, plot_area, total_area, e=5, lg=:pt)
15×3 DataFrame
RowParâmetrosValoresUnidades
StringFloat64String
1volume por parcela441.691m³/0.05ha
2coeficiente de variação10.03%
3variância média168.498(m³/0.05ha)²
4erro padrão12.9807m³/0.05ha
5erro absoluto28.9227m³/0.05ha
6erro relativo6.55%
7volume por hectare8833.82m³ha⁻¹
8Volume Total88338.2
9intervalo de confiança inferior82553.6
10intervalo de confiança superior94122.7
11população0.945finita
12parcelas medidas11.0n
13parcelas necessárias17.0n
14parcelas faltantes6.0n
15parcelas possíveis200.0N