GSOC 2015 Documentation

Transport Stream

Data structure in Transport Stream made for Multiprogram.

Decoder

In CCextractor we have single Decoder Initialization function,

 struct lib_cc_decode* init_cc_decode (struct ccx_decoders_common_settings_t *setting) 

where settings of decoder are passed in its parameter

struct ccx_decoders_common_settings_t
{
        LLONG subs_delay; // ms to delay (or advance) subs
        enum ccx_output_format output_format; // What kind of output format should be used?
        int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
        struct ccx_boundary_time extraction_start, extraction_end; // Segment we actually process
        int cc_to_stdout;
        int extract; // Extract 1st, 2nd or both fields
        int fullbin; // Disable pruning of padding cc blocks
        struct ccx_decoder_608_settings *settings_608; //  Contains the settings for the 608 decoder.
        ccx_decoder_dtvcc_settings_t *settings_dtvcc; //Same for cea 708 captions decoder (dtvcc)
        int cc_channel; // Channel we want to dump in srt mode
        unsigned send_to_srv;
        unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
        int program_number;
        enum ccx_code_type codec;
        void *private_data;
};

In settings subs_delay is deprecated and should not be used further, subs_delay parameter was added for transition period and need to be removed. same thing for send_to_srv, output_format.

If Demuxer has already initialized deocder like in case of teletext and dvbsub at the time of demuxing then private_data can be filled with context of specific format decoder and initialize value enum ccx_code_type codec.

How to use multiprogram extraction using CCExtractor?

For Command line user its simply passing -multiprogram argument to ccextractor

 cextractor -multiprogram <input ts filename> 

How to use OCR for extracting bitmap subtitle in text format?

For doing OCR on bitmap image of subtitles, compile code after enabling OCR in source code. While compiling ccextractor, follow OCR.txt in doc folder of ccextractor source code.

Background

Link List implemented in CCextractor is taken from Linux Source Code. Please note ccextractor Link list does not have same syntax as it is in linux source code but its similar and only changes that you will in CCextractor link list would be adaptation of Windows environment.

Implementation

For implementing link list you need head of link list using which you can always traverse update or delete the complete link list. Keep head of link List in safe location, most common mistake developer do is keeping head inside node structure and do memory leakage if loosing Head of link list.

for example we need multiple decoders to extract different Subtitles from different programs. so as discussed above Head of link list should not be kept in Decoder Context. It must be in its parent which cant die or deleted before its child. We will keep head of decoder link list in CCextractor library Context which contain all demuxer, decoder and encoder. Following would be syntax for keeping decoder List in Library Context.

struct lib_ccx_ctx
{
    struct list_head dec_ctx_head;
};

now next thing to do is initialize the head of link list, for decoder link list we would initialize it in initialization part of library

void init_libraries(void)
{
    INIT_LIST_HEAD(&ctx->dec_ctx_head);
}

Now in your decoder context put connector of link list, which will actually make your dec_ctx as node

struct lib_cc_decode
{
struct list_head list;
};

Put your decoder node back in link list, in below code I have assumed that you allocated decoder Context and saved in variable dec_ctx and your library with head of your link list is saved in ctx variable.

list_add_tail( &(dec_ctx->list), &(ctx->dec_ctx_head) );

One of the reason that people prefer Array because traversing, searching, updating and deleting them is very easy, now that excuse wont work, Now only reason to use Array would be contiguous memory allocation.

Traversing Link List for searching and updating some parameter

list_for_each_entry(dec_ctx, &ctx->dec_ctx_head, list, struct lib_cc_decode)
{
    //Access your parameter here
    print(dec_ctx->program_number);
}

there is different code for traversing link list when you might delete complete node of link list while traversing. also mind that delete link of node from link list first after then delete or free your memory allocated for node

list_for_each_entry_safe(dec_ctx, dec_ctx1, &lctx->dec_ctx_head, list, struct lib_cc_decode)
{
    list_del(&dec_ctx->list);
    free(dec_ctx);
}

Clone my repository from git hub in any directory of gsoc server

 git clone --depth https://github.com/anshul1912/ccextractor.git 

Run CCextractor with multiprogram in argument

 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -multiprogram -o a.srt 

In file specified in above command there are 6 program with closed caption. but ccextractor tries to extract Closed caption from all 8 programs therefore it makes 2 empty files. you would have following file in directory where command was executed.

[[email protected] linux]$ ls a_*
a_1.srt  a_2.srt  a_3.srt  a_4.srt  a_5.srt  a_6.srt  a_7.srt  a_8.srt

From same file extract subtitles individually using -pn argument. you can use following command to extract each program.

 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -pn 1 -o pn_1.srt 
 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -pn 2 -o pn_2.srt 
 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -pn 3 -o pn_3.srt 
 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -pn 4 -o pn_4.srt 
 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -pn 5 -o pn_5.srt 
 ccextractor /repository/newRepository/TestFiles/General/Closedcaption_atsc_multiprog.ts -pn 6 -o pn_6.srt 

Now check the difference between files generated by ccextractor with multiprogram and pn. for example you can use command line tool like following.

 diff a_1.srt pn_1.srt 

Now CCextractor have feature to extract Closed Caption from all channels simultaneously. Its not just about extracting Closed caption from all channel but also converting them to desirable format.

Now there is no need of multiple Capture Device to capture single live closed caption. Use this wonderful openSource software and save your hard earn money to donate in technology and make this software more wonderful.

This Multiprogram Closed caption Extraction works for DVBSub, Teletext and Closed Caption. which means whichever is your country Multi programs Extraction would work.

If in some peoples country multiprogram subtitles extraction is still not working, we would say invest here and help this newer generation to do more wonderful and innovative work instead of discovering or inventing Wheel

My Graduation would be completed this year, so I would seek the opportunity to be mentor or co-mentor in CCExtractor. CCExtractor is great tool which can be used for various multimedia application, so my development would continue in this project at least till C has its charm. Since the part of work that I have done in CCextractor was done with atmost care according to my knowledge, therefore I would try to remove any bug in part of my code reported by someone else or encountered by me.

  • public/gsoc/anshul_maheshwari_gsoc15.txt
  • Last modified: 2016/08/02 20:06
  • (external edit)