Study area

This study was conducted in two areas: Areas 1 (St. 1 and 2) and 2 (St. 3) in Ise Bay, Japan (Fig. 1). Ise Bay is a primary location of NRFPs in Japan [26]. Both areas are sharper than 20 m, and the sea bottom consists of silt, coarse sand, and sand [34]. Area 1 was substantially noisier than Area 2. Drone video footage recorded from one flight was defined as a ‘unit’. The average number of boats (primarily for fishing) per unit in each area was calculated as the total number of boats in all units divided by the number of units in each area. The water depth in both study areas was under 50 m. Water temperature and salinity were 8.8–26.2 ℃ and 21.6–31.4, respectively, in Ise Bay from 2006 to 2015 [33].

Calibration and accuracy of the drone distance measurement

To maximize the accuracy of distance measurements, the distortion of the camera attached to the drone was calibrated (DJI Co. Ltd, Air 2s, and Phantom 4 Advanced, Shenzhen, China) using a Single Camera Calibrator application in MATLAB (The MathWorks, Natick, MA), and the measurement errors were calibrated using length-known scales, following the methods of Burnett et al. [35]. To calibrate the camera distortion, a 4 K video was captured while rotating the X- and Y-axes of the chequerboard at various angles. Fifty frames were extracted from the video and imported into the Single Camera Calibrator application to estimate a three-parameter polynomial model that optimally estimates the lens distortion for a given camera model and to calibrate the distortion.

The scale, located at St. 3, was measured aerially by drone to calculate the measurement error. Scales of 5, 10, 50, 100, and 200 m were video-recorded five times using a drone from each height (100, 200, 300, and 400 m). After selecting a good image from the video, a black circle was marked on each end of the scale, and pixel size between the centres of the two black circles was automatically measured using the Image Processing Toolbox in MATLAB. To calculate the measurement length of the scales, the following formula from Torres et al. [36] was used:

$$GSD = \frac{Altitude \left(m\right)}{Focal length \left(mm\right)} \times \frac{Sensor width \left(mm\right)}{Image width \left(pix\right)}$$

1

$$Length \left(m\right)= GSD \times Number of pixels measured \left(pix\right)$$

2

where GSD represents the Ground Sampling Distance, and pix represents the mean number of pixels.

The measurement error at each scale length was calculated as follows:

$$Measurement error \left(m\right)= Length measured \left(m\right) - Scale length \left(m\right)$$

3

Because the measurement error increased linearly with the scale length, back-calculation using a linear relationship was conducted to minimise the measurement error.

Sampling rate for image selection from the video

The sampling rate of IIDs of NRFPs has trade-offs between accuracy and study effort; a higher sampling rate results in more accurate IIDs but consumes more time for analysis, and vice versa. The efficient sampling rate (‘sampling interval’) was preliminarily assessed by comparing the distribution of IIDs of NRFPs at sampling intervals of 1, 5, 10, 15, 20, 30, 40, and 60 s. Five over 3 min video footages of NRFPs at St. 3 on 10 and 15 June 2021, using Phantom 4 Advanced, were used. The first, second, and third quartiles of IID distributions with a 1 s sampling interval were set as correct, and the absolute error of each sampling interval and quartile was calculated using the following equation:

Absolute error = | each quartile at each sampling interval \(-\)each quartile at a 1-s sampling interval | (4)

To discover longer sampling intervals with similar accuracy to 1-s sampling, the sampling interval with lower absolute errors in all three quartiles and a longer sampling interval was sought. Results indicated that the 15-s interval had lower absolute errors in all quantiles despite its longer intervals (Fig. 2). Therefore, the sampling interval was set to select an image from the video at 15-s intervals.

Data collection

## Behavioral states and inter-individual distances

A drone (Air 2s) was used to record a video of NRFPs from 8 February to 11 March 2022 in Area 1 and from 24 May to 13 July 2022 in Area 2. Porpoise neonates were stranded from late April to mid-July in Ise Bay [37], and the pregnancy period was approximately one year [26], which suggests that the drone survey in Area 2 was during a reproductive season, whereas in Area 1, it was not. A neutral-density filter (ND64/PL; Freewell Industry Co. Ltd., Kowloon, Hong Kong) was attached to the drone camera. The surveys were conducted under ideal conditions: without rain and strong winds and with a Beaufort Sea State of less than 3. The drones were removed from the same location in each area. The altitudes were calculated above sea level, considering the tide level.

First, NRFPs were sought while maintaining a 150-m drone altitude, and their behavioural states were recorded for 1 min. Behavioural states were classified into five categories according to Lusseau [38]: socialising, milling, foraging, resting, and travelling (Table 1). Behavioural states were recorded during a 1-min observation period at 150 m if behavioural states changed during this period and if several porpoises exhibited different behavioural states. After recording their behaviour, the drone recorded at an altitude of 400 m until the battery or recording time expired (approximately 10 min). To minimise the possibility of recording the same NRFPs between the units, the drone was flown to different points during the same time period (e.g. morning and evening).

Table 1

Definitions of behavioral state; foraging, miling, resting, socialising and travelling.

Behavioral state | Definition |

Foraging | Individuals dive for long intervals and perform “steep dives”, arching their back at the surface to increase their speed of descent. Individuals turn counterclockwise, which is foraging behaviors stated in Amano et al. [61]. |

Miling | Individuals often change of direction. Dive intervals are variable but short. |

Resting | Individuals are staying at the same position without waving their flukes. |

Socialising | Interactive behavioural events are observed such as body contact and synchronous swimming. |

Travelling | Individuals are moving steadily in a constant direction. Swimming with short, relatively constant dive intervals. |

## Sound recordings

Sound recordings were conducted from a boat at the Fisheries Research Laboratory, Mie University (ZAGA2, 21 ft) in Area 1 during the daytime on 9 February 2022 for ambient noise recordings and 1 March 2022 for sound recordings of NRFPs. Ambient noise was recorded at five different points when no fishing boats were present within 500 m. The pulsed sounds of NRFPs were recorded using a hydrophone array consisting of two acoustic data loggers: an A-tag (ML-200-AS2: Marine Micro Technology, Saitama, Japan) and a sound recorder (SoundTrap; ST300-HF: Ocean Instruments, Auckland, New Zealand). The A-tag had two ultrasonic hydrophones with a sensitivity of 201 dB re 1 V/ 1 *µ*Pa between 100 and 160 kHz ± 5 dB [39], and the SoundTrap had a hydrophone with a sensitivity of 172 dB re. 1 µPa clipping level (high gain) between 20 and 150 kHz ± 3 dB with 16-bit resolution. Two acoustic data loggers were fixed to a rope (14 m) at a distance of 1 m, and the sound recorder was placed between the two dataloggers at 0.5 m distances each. The boat was stopped when NRFPs were found. The array was suspended vertically with a terminal weight (2 kg). The hydrophone of the upper data logger was positioned approximately 1.5–3 m below the water surface.

## Data analysis

## Behavioral and distance data

A unit comprised several behavioural states; therefore, the total number of each behavioural state was averaged using the number of units per area. Fisher’s exact test was used to investigate whether the total number of each behavioural state in all units in each area differed between Areas 1 and 2.

IIDs were measured at 15-s sampling intervals. Data was omitted if land or invisible areas due to reflection from the sun were excluded from the analysis. To classify IID distribution into several Gaussian distributions, Gaussian mixture modelling (GMM) was applied using the R package ‘mixtools’ (version 1.2.0) [40]. The distribution was converted into a common logarithmic scale before applying the GMM. The IIDs were fitted to the Gaussian mixture model using the Expectation-Maximization Algorithm. The estimated parameters were *λi* (mixing ratio), *µi* (average), and *σi* (SD). The number of Gaussian distribution components that should be applied to the IIDs was determined by selecting the lowest Bayesian Information Criterion of the models using 1–20 at each study site. IIDs less than 2 m were discarded from the GMM analysis because of the software measurement limitation.

Pseudo-distributions of IIDs were expected owing to the same height and angle of the drone camera. To discard pseudo-distributions, a null model of the distribution was created by calculating the distances between two sets of random points of the viewing angle that were randomly determined 100,000 times on the x- (0–631 m) and y-axis (0–355 m). Owing to technological limitations, one porpoise was always in the centre at the observation starting point from an altitude of 400 m, which was the expected bias of IIDs between porpoises. To overcome this problem, measurements were initiated 4 min after the drone rose from an altitude of 150 m. The average swimming speed of the Yangtze finless porpoises was 0.89 m/s [41]; therefore, the porpoises could exit from the angle of view within 4 min.

After fitting the GMM, the data were re-transformed into frequency distributions with a 5-m bin size. The distributions of Areas 1 and 2 were compared to the null model using the Kolmogorov–Smirnov test with a Bonferroni correction.

## Acoustic analysis

Underwater sounds for ambient noise level measurements were recorded at five points in Area 1. From these five recordings, five 5-s duration sounds were randomly extracted without any artificial sounds and the ambient noise level was measured. The sound pressure levels (SPL; dB re 1 µPa RMS) of the ambient noise in the 1/3rd -octave bands with center frequencies of 125 kHz were calculated using MATLAB.

The distance from the phonating NRFP to the recorder was estimated using two A-tags. The angle was calculated using the time-of-arrival differences between the two A-tag hydrophones. The distance from the recorder and the intersection of the two angles calculated from the two A-tags (sound sources) were estimated. The sound velocity in the water was calculated from the salinity and water temperature using the Medwin equation [42]. Salinity and water temperature were measured before the sound recordings using a portable conductivity meter (CM-31P: DKK-TOA Corporation, Japan). The criteria for identifying on-axis clicks were the same as those of Kyhn et al. [43]: 1) recorded on all five hydrophones, 2) the maximum amplitude in an echolocation, and 3) the amplitude of the click was stronger than any reflections from the bottom or surface. The bandpass function in the Signal Processing Toolbox in MATLAB was used to pass from 112 to 140 kHz with a steepness of 0.85.

The apparent sound source levels (ASLs) of the peak-to-peak sounds were estimated according to Møhl et al. [44]. ASLs were used because the possibility that the sounds were indirectly and accurately produced by the sound recorder could not be eliminated. ASLs were calculated using the sonar equation: ASL = received level (RL) + transmission loss (TL). RL is the sound level recorded using a sound recorder. TL was estimated as follows; TL = 20 log (R) + Rα. R is the estimated distance from the sound source to the sound recorder, and α is the absorption coefficient that depends on the sound frequency. The α at the 100 kHz signal was calculated from the salinity and water temperature using the method described by Francois & Garrison [45]. The detectable distance was estimated to be 25 m from the result of Morisaka et al. [46], which was estimated to be 50 m (double the distance to our system) using a system with a 2.2 m distance between hydrophones. Other studies estimated 65 m recorded by 1.5-m apart hydrophones [43] and 60 m recorded by 2-m apart hydrophones [47]; therefore, the estimation of this study was conservative compared to other studies. Therefore, only clicks whose distances were estimated to be less than 25 m were used for further analyses.

The active space, which is the distance at which individuals of a species can detect conspecific signals, was estimated using the following equation [48]: SE = ASL – TL – NL – DT + DI + 10 × log10 (T × W). SE indicates signal excess. Animals could detect a signal with a 50% probability at zero SE. ASL is the apparent source level of the sound of NRFPs, and NL is the average noise level of the five study points. The detection threshold (DT) was set to 68 dB according to Wang et al. [49]; this value was the average detection threshold of the four finless porpoises at 128 kHz. Kyhn et al. [50] suggested that the directivity index (DI) in Phocoenidae was similar; therefore, the DI was regarded as 25 dB. T × W refers to the time-bandwidth product of -10 dB duration (T) and − 3 dB bandwidth (W) of each pulsed signal. The − 10 dB duration was defined as between − 10 dB points on the envelope and calculated as the absolute value of the analytical waveform according to Kyhn et al. [51].