Default ingest actions
During ingest, all files in the request will be analyzed, and different actions will be performed based on that analysis. Types of files detected:
- Based on ffprobe output, if there are video streams in the file, it is categorized as VIDEO
- Based on ffprobe output, if there are audio streams in the file, it is categorized as AUDIO
- Based on ffprobe output, if there are subtitle streams in the file, it is categorized as SUBTITLE
- If the mime type is
text/plain
, the file is parsed and categorized accordingly: - If the file starts with
Scenarist_SCC V1.0
, it is categorized as SUBTITLE, with subtitle formatscc
- If the first line of the file contains words
Lambda
,V4
andDF
, it is categorized as SUBTITLE, with subtitle formatcap
- If the file starts with
WEBVTT
, it is categorized as SUBTITLE, with subtitle formatwebvtt
- If the file starts with
#EXTM3U
, it is categorized as VIDEO, with video formathls
- Else if mime type is
application/stl
, it is categorized as SUBTITLE, with subtitle formatstl
- If the mime type is xml, the file is parsed and categorized accordingly:
- If the start tag is
taskReport
, it is categorized asBATON
- If the start tag is
MPD
, it is categorized as VIDEO, with video formatdash
- If the start tag is
tt
, it is categorized as SUBTITLE, with subtitle formatttml
Default actions for files categorized as VIDEO
:
- Extract sprite map for thumbnails
- Extract a single representative poster
- Waveform analysis
- The video file parameters will be checked for if it can be accurately played in a browser, if not a transcode job is started to create a proxy that can be played in browsers. The parameters checked are:
- Container format is mp4 or mov
- Video stream format is h264
- Audio format is aac
- If any of the parameter checks above fail, a proxy will be created
- If the number of audio streams are > 1, audio extraction will be performed
Default actions for files categorized as SUBTITLE
:
- Subtitle cues will be extracted as timespans on the asset, supported formats are:
- ttml
- webvtt
- scc
- cap
- srt
- stl
Files categorized as BATON
will be parsed, and events will be added as timespans on the asset.
Manifest ingest
For manually ingesting files, it is fine for the user to manually select all the files that should be ingested as a single asset. However, for automatic ingest, a mechanism for defining what assets belong together is needed. The way this is done in AV, is by creating manifest files that contain a definition of an asset, complete with metadata and files. The format of this manifest is defined by AssetInputDto
. This is the same format as is used by the frontend when doing manual ingest, but in this scenario the data is provided as a JSON file on a storage.
Lets look at an example of a manifest:
{
"metadata": [
{
"key": "title",
"value": "This is my awesome asset"
},
{
"key": "source",
"value": "Netflix"
},
{
"key": "production_date",
"value": "2019-10-15"
}
],
"files": [
{
"fileName": "my_awesome_video.mp4"
},
{
"fileName": "/subtitles/my_awesome_video/english.vtt",
"metadata": [
{
"key": "language",
"value": "en-US"
}
]
},
{
"fileName": "/subtitles/my_awesome_video/croatian.vtt",
"metadata": [
{
"key": "language",
"value": "hr-HR"
}
]
},
{
"fileName": "audio/my_awesome_video/croatian.wav",
"metadata": [
{
"key": "language",
"value": "hr-HR"
}
]
},
{
"fileName": "audio/my_awesome_video/french.wav",
"metadata": [
{
"key": "language",
"value": "fr-FR"
}
]
},
{
"type": "MARKER",
"fileName": "marker-import.csv",
"container": {
"format": "csv"
},
"metadata": [
{
"key": "timespan_type",
"value": "Manual"
}
]
},
{
"type": "BATON",
"fileName": "my_awesome_baton.xml"
}
]
}
Lets start with metadata, this will simply be set as metadata fields on the resulting asset. The title
field should always be set, or finding the asset might be difficult for the users.
Next, the list of files to ingest into the asset is defined. In this ingest the following files are defined:
- A video file
my_awesome_ingest.mp4
. As the filename does not start with/
, this file is assumed to be located in the same folder as the manifest file. - Subtitles in VTT format (english and croatian). Since these filename start with a
/
, it is assumed that they are referenced from the root of the storage where the manifest is located. - External audio files (croatian and french). These filenames does not start with a
/
, but reference sub directories, and as such are assumed to be in subfolders of where the manifest is located. - Markers to import (
marker-import.csv
). This can be used to ingest markers into the asset, the format of the marker file must be supplied in thecontainer.format
field. This example also adds the timespan type to use in the manifest file instead of inside the marker file. - A baton file is also imported here.
Customer specific manifest file formats
In addition to the base format supported, customer specific file formats can also be supported. To set this up the following needs to be added to the runner:
- Configuration field
av.runner.manifest_reader_location
must be set to a directory - In the directory defined above, a script should be added that takes the incoming file and outputs a json file with the base format
- In the ingest job a metadata field with
manifest_format
should be set to the filename of the script, or set to a format specifierX
, and configuration field on the runnerav.runner.manifest_reader.X
should be set to the filename of the script
If the above is true, then the given script will be run during ingest with script input_file output_file
, which must output a valid json file in the base format to the given output file. The input file will be the manifest file that is being ingested, downloaded to local disc, i.e. the script does not need to support fetching the file from a remote location.
Automatic ingest
Automatic ingest is configured on a per storage basis. There are six configuration fields that define the behavior of automatic ingest:
auto_ingest:enabled
- set totrue
if automatic ingest should be performedauto_ingest:template_id
- the job template to use for ingest from this storage, defaults toingest
auto_ingest:include
- multivalued metadata field that is used to only ingest files that match the given globauto_ingest:exclude
- multivalued metadata field that is used to exclude files from ingest that match the given globauto_ingest:manifest:enabled
- if set totrue
, all files that are matched by the inclusion/exclusion rules are assumed to be manifest filesauto_ingest:manifest:format
- the format of the manifest file, only used for customer specific manifest file formatsauto_ingest:manifest:include
- multivalued metadata field that is used to only ingest files as manifests that match the given globauto_ingest:manifest:exclude
- multivalued metadata field that is used to exclude files from manifest ingest that match the given glob
Inclusion/exclusion rules
If automatic ingest is enabled on a storage without specifying what files should be ingested, all files will be ingested with one ingest job per file. To limit what files should be imported, inclusion and exclusion rules can be defined. Both of these are multivalued, so multiple filename formats can be added both to inclusion and exclusion rules. Exclusion rules will win against inclusion rules, if a file match an exclusion rule it will not be ingested automatically.
These files use the glob format, for example:
auto_ingest:include=**/manifests/*.json
auto_ingest:exclude=backup/**
This will only try to ingest files located in manifests
directories ending with .json
, except for files located below the root backup/
directory.
The manifest inclusion/exclusion rules work the same, and are completely separate from the basic inclusion/exclusion rules and overrule those.