Mock Data Gen with Machine Learning Module - 01/12/2023 02:56 EST

  • Estado: Closed
  • Premio: $500
  • Propuestas recibidas: 1
  • Ganador: td7x

Resumen del concurso

*Summary*

This is a software engineering contest that leverages machine learning to solve developer experience inconveniences in creating mock data for testing and for demos within the JavaScript ecosystem.

Winning submissions will include a GitHub repo of the software, complete with documentation and CICD using GitHub Workflows.

Employer reserves all rights to the software created under this contest but will redistribute the software under an Open Source license. All dependencies must have permissive OSI approved licenses and the software must be runnable offline, without dependence on an external web service or datastore and without dependency on specialized hardware.

*Problem*

Simple faker or charade libraries can be used for mock data in software development but the use can be labor intensive because they require a developer to select the correct method and to identify the input parameters for each data field. Developers have enough cognitive overhead and need a fake data solution that can use existing data models/schemas with zero configuration to create the fake data.

*Solution*

A NodeJS module that produces semantically accurate fake data from an arbitrary data model or schema with zero configuration. We are primarily a Typescript/NodeJS shop and describe the requirements from that perspective but welcome submissions that are Rust based and that compile to WASM are more than welcomed. Runtime portability such as in-browser, Bun, Cloudflare, etc is preferred but NodeJS is required.

Data model handlers for GraphQL SDL and JSONSchema are required. Extra preference will be given to submissions with additional handlers for TypeScript type definitions and protobufs.

Various fake data handlers should be supported. Required is a handler that accepts a single field name from the data model and returns semantically correct mock data consistent with the larger data model. Extra preference will be given to submissions with additional handlers that accept a GraphQL request shape (returning a GraphQL response shape) and a handler that does not accept an argument and returns an object for the data model (that could be stringified into JSON).

It is expected that this software will utilize existing generators such as FakerJs, ChanceJs, CasualJs and RandExpJs just as other higher level tools do:

- https://github.com/json-schema-faker/json-schema-faker
- https://github.com/MedAli5543/graphql-fake-data-generator
- https://github.com/danibram/mocker-data-generator

Unlike these existing tools, this software will not statically code and thus limit itself to individual basic field types and require significant configuration for non-basic field types. How we overcome this limit is the crux of what makes this software different. Perhaps NLP string or vector comparisons can be used to select the correct generator function from the field name with only unmatched requests using an LLM. LangChain seems like a quite attractive pattern and tech for this.



*Code Standards*

Code will be written in strict TypeScript with strong typing and be compatible with Bun, Deno, and NodeJS. Code will be "Clean" and robust. OOP patterns are to be avoided in favor of "strategic" functional programming use. eslint-plugin-functional/recommended is great, using additional fp libs such as fp-ts or Ramda is not required. In general:
- Small composable functions.
- No nested code.
- Avoid if statements. Branches are only ok in the simplest and unavoidable use cases. Simple clean ternaries are fine.
- Along with avoiding branching, absolutely no try/catch.
- Never throw.
- No control loops.
- No unbounded iterators.
- Use maps rather than a switch or if/else.
- Functions should be small, pure, and composable.
- Separate configuration from code.
- Use arrow function syntax.
- Avoid async/await as one can accidentally block the event loop.

*Testing*

Fine grain testing of LangChain does not seem completely straight forward but there are current improvements to its testability and the LangSmith debugger should probably be used. Code should be decoupled so that mocks can be avoided. Vitest or Jest should be with fast-check as well as static assertions. Strict TDD is not required but preferred. Writing tests through the development and not at the end is required. The important thing is that testable code is cleaner, simpler, more robust. Tested code is easier to change.

The test suit should also prove the software works.

Habilidades recomendadas

Tablero de aclaración pública

  • tokibul2
    tokibul2
    • 5 meses atrás

    Hi,
    Do you know freelancer.com? Also, Do you know they are scammer?

    I earned 1000 GBP and 200+ USD by providing my service on this platform. But when I requested a payment withdrawal they closed my account. Blocked me and I couldn't chat or create any ticket.

    So, I created this account for help me to get my account balance in my bank account.

    what do you think about this scammer (freelancer.com) giving me my earnings in my account?

    [ They will just block this account. Because this is their only way of earning by taking hard-working payment from poor freelancers. In my words, they are a Beggar. ]

    Check this screenshot for more : https://drive.google.com/drive/folders/1tKtg5TC4-_6q_uG73rHNmUNhezqkRiaC?usp=sharing

    • 5 meses atrás
  • dataexpert18
    dataexpert18
    • 5 meses atrás

    Can you explain on which data you want to apply machine learning and what outcome you expect from machine learning?

    • 5 meses atrás
    1. dutco7
      Organizador del concurso
      • 5 meses atrás

      Hello Zafar, Im not sure how to explain it better than in the description. The generative model may need to use the data model for fine tuning or perhaps zero shot would work. The mock data gen function will accept a field name and return the semantically correct, generated data.

      • 5 meses atrás

Cómo comenzar con los concursos

  • Publica tu concurso

    Publica tu concurso Fácil y rápido

  • Recibe montones de propuestas

    Consigue toneladas de propuestas De todo el mundo

  • Elige la mejor propuesta

    Elige la mejor propuesta ¡Descarga fácilmente los archivos!

Publica un concurso ahora o únete a nosotros hoy