AI systems are trained on diverse forms of data, and understanding these types is key to knowing what you will be annotating. We categorize data in two main ways: by its structure and by its format.
1.2.1 Structured, Unstructured, and Multimodal Data
1. Structured Data
Structured data is highly organized information arranged in a predictable format, typically in tables with rows and columns, like a spreadsheet. Each piece of information has a clear label and fits into a specific category.
Examples:
- A customer database with columns for Name, Phone Number, Email, Age, Location
- Banking transactions with Date, Amount, Account Number, Transaction Type
- Weather data with Temperature, Humidity, Wind Speed, Date, Location
2. Unstructured Data
Unstructured data doesn’t fit neatly into tables. It’s free-form and doesn’t have a predefined organization. Unstructured data lacks a fixed format, making it more challenging to process but rich in information.
Examples:
- Photos and images
- Social media posts and emails
- Audio recordings of conversations
- Videos of events
- Documents and reports written in natural language
3. Multimodal Data
Multimodal data combines multiple types of data together. Modern AI applications often need to understand several types of information simultaneously to work effectively.
Examples:
- A video (combines moving images + audio)
- An Instagram post (combines an image + text caption + location data)
- An e-learning platform (combines text lessons + video tutorials + quiz data)
- A news article with embedded images, videos, and text
Comparison Table:
| Data Type | Organization | Examples | Typical Annotation Tasks |
| Structured | Highly organized, follows strict format | Database tables, spreadsheets, forms | Categorizing entries, flagging anomalies, verifying accuracy |
| Unstructured | No predefined format | Images, text documents, audio, video | Object detection, transcription, sentiment labeling, content tagging |
| Multimodal | Combination of multiple formats | Videos with audio, social media posts, multimedia articles | Cross-modal annotation, synchronized labeling across formats |
1.2.2 Images, Text, Audio, and Video
As an annotator, you will work with data in these primary formats:
1. Image Data
Still photographs, satellite pictures, X-rays, drone footage frames. Annotation involves drawing boundaries or classification.
Common Annotation Tasks: Drawing boxes around objects, classifying the scene, or outlining specific shapes (e.g., marking potholes on road images).
2. Text Data
Sentences, emails, books, articles, chats, social media posts. Annotation involves identifying themes or tagging parts of speech.
Common Annotation Tasks: Labeling sentiments in product reviews, identifying names of people and places in news articles, or translating phrases between English and local languages.
3. Audio Data
Voice recordings, music, background sounds. Annotation involves transcription (typing out the words) or labeling sounds.
Common Annotation Tasks: Transcribing spoken words (e.g., converting a marketplace negotiation in Yoruba to text), or identifying sound types (e.g., car horn vs. street vendor call).
4. Video Data
A sequence of images over time. Annotation often involves tracking objects frame-by-frame, combining image and audio tasks.
Common Annotation Tasks: Tracking an object’s movement across frames (e.g., following a football player during a match) or labeling activities (e.g., “person carrying water,” “vehicle stopping”).
Comparison Table:
| Feature | Image Data | Text Data | Audio Data | Video Data |
| Dimensionality | 2D (Spatial) | 1D (Sequential/Linear) | 1D (Time-series) | 3D (Spatial + Temporal) |
| Common Formats | .jpg, .png, .tiff, .dicom | .txt, .csv, .json, .html | .wav, .mp3, .flac | .mp4, .avi, .mov |
| Key Attributes | Pixel intensity & color. | Semantics & Syntax. | Frequency & Amplitude. | Continuity & Frame rate. |
| Annotation Tools | Polygons, Bounding Boxes, Keypoints. | Highlighters, Drop-down tags. | Waveform editors, Text editors. | Interpolation tools, Timelines. |
| Main Challenge | Occlusion (objects hidden behind others). | Sarcasm, slang, and cultural context. | Background noise and overlapping speech. | Motion blur and “ID switching” of objects. |
Practical Implication for Data Annotators: Knowing these data types helps you understand the tools you will use. Structured data often means data entry or validation, while image, audio, and video (unstructured/multimodal data) are where most professional annotation roles lie, requiring specialized labeling software.
