During ingest, all files in the request will be analyzed, and different actions will be performed based on that analysis. Types of files detected:
text/plain
, the file is parsed and categorized accordingly:Scenarist_SCC V1.0
, it is categorized as SUBTITLE, with subtitle format scc
Lambda
, V4
and DF
, it is categorized as SUBTITLE, with subtitle format cap
WEBVTT
, it is categorized as SUBTITLE, with subtitle format webvtt
#EXTM3U
, it is categorized as VIDEO, with video format hls
application/stl
, it is categorized as SUBTITLE, with subtitle format stl
taskReport
, it is categorized as BATON
MPD
, it is categorized as VIDEO, with video format dash
tt
, it is categorized as SUBTITLE, with subtitle format ttml
Default actions for files categorized as VIDEO
:
Default actions for files categorized as SUBTITLE
:
Files categorized as BATON
will be parsed, and events will be added as timespans on the asset.
For manually ingesting files, it is fine for the user to manually select all the files that should be ingested as a single asset. However, for automatic ingest, a mechanism for defining what assets belong together is needed. The way this is done in AV, is by creating manifest files that contain a definition of an asset, complete with metadata and files. The format of this manifest is defined by AssetInputDto
. This is the same format as is used by the frontend when doing manual ingest, but in this scenario the data is provided as a JSON file on a storage.
Lets look at an example of a manifest:
{
"metadata": [
{
"key": "title",
"value": "This is my awesome asset"
},
{
"key": "source",
"value": "Netflix"
},
{
"key": "production_date",
"value": "2019-10-15"
}
],
"files": [
{
"fileName": "my_awesome_video.mp4"
},
{
"fileName": "/subtitles/my_awesome_video/english.vtt",
"metadata": [
{
"key": "language",
"value": "en-US"
}
]
},
{
"fileName": "/subtitles/my_awesome_video/croatian.vtt",
"metadata": [
{
"key": "language",
"value": "hr-HR"
}
]
},
{
"fileName": "audio/my_awesome_video/croatian.wav",
"metadata": [
{
"key": "language",
"value": "hr-HR"
}
]
},
{
"fileName": "audio/my_awesome_video/french.wav",
"metadata": [
{
"key": "language",
"value": "fr-FR"
}
]
},
{
"type": "MARKER",
"fileName": "marker-import.csv",
"container": {
"format": "csv"
},
"metadata": [
{
"key": "timespan_type",
"value": "Manual"
}
]
},
{
"type": "BATON",
"fileName": "my_awesome_baton.xml"
}
]
}
Lets start with metadata, this will simply be set as metadata fields on the resulting asset. The title
field should always be set, or finding the asset might be difficult for the users.
Next, the list of files to ingest into the asset is defined. In this ingest the following files are defined:
my_awesome_ingest.mp4
. As the filename does not start with /
, this file is assumed to be located in the same folder as the manifest file./
, it is assumed that they are referenced from the root of the storage where the manifest is located./
, but reference sub directories, and as such are assumed to be in subfolders of where the manifest is located.marker-import.csv
). This can be used to ingest markers into the asset, the format of the marker file must be supplied in the container.format
field. This example also adds the timespan type to use in the manifest file instead of inside
the marker file.In addition to the base format supported, customer specific file formats can also be supported. To set this up the following needs to be added to the runner:
av.runner.manifest_reader_location
must be set to a directory manifest_format
should be set to the filename of the script, or set to a format specifier X
, and configuration field on the runner av.runner.manifest_reader.X
should be set to the filename of the scriptIf the above is true, then the given script will be run during ingest with script input_file output_file
, which must output a valid json file in the base format to the given output file. The input file will be the manifest file that is being ingested, downloaded to local disc, i.e. the script does not need to support fetching the file from a remote location.
Automatic ingest is configured on a per storage basis. There are six configuration fields that define the behavior of automatic ingest:
auto_ingest:enabled
- set to true
if automatic ingest should be performedauto_ingest:template_id
- the job template to use for ingest from this storage, defaults to ingest
auto_ingest:include
- multivalued metadata field that is used to only ingest files that match the given globauto_ingest:exclude
- multivalued metadata field that is used to exclude files from ingest that match the given globauto_ingest:manifest:enabled
- if set to true
, all files that are matched by the inclusion/exclusion rules are assumed to be manifest filesauto_ingest:manifest:format
- the format of the manifest file, only used for customer specific manifest file formatsauto_ingest:manifest:include
- multivalued metadata field that is used to only ingest files as manifests that match the given globauto_ingest:manifest:exclude
- multivalued metadata field that is used to exclude files from manifest ingest that match the given globIf automatic ingest is enabled on a storage without specifying what files should be ingested, all files will be ingested with one ingest job per file. To limit what files should be imported, inclusion and exclusion rules can be defined. Both of these are multivalued, so multiple filename formats can be added both to inclusion and exclusion rules. Exclusion rules will win against inclusion rules, if a file match an exclusion rule it will not be ingested automatically.
These files use the glob format, for example:
auto_ingest:include=**/manifests/*.json
auto_ingest:exclude=backup/**
This will only try to ingest files located in manifests
directories ending with .json
, except for files located below the root backup/
directory.
The manifest inclusion/exclusion rules work the same, and are completely separate from the basic inclusion/exclusion rules and overrule those.